How LLM Works

Interactive visualizations for understanding machine learning
Module 1: Linear Regression
Interactive visualizations for understanding regression mechanics
Basics
SSE
SSE Landscape Explorer
Explore the Sum of Squared Errors (SSE) loss landscape. Manipulate slope and intercept manually to see how the loss function changes. Visualize in 2D scatter plot and interactive 3D surfaces (normalized and raw geometry).
MSE
MSE Landscape Explorer
Same interactive tool using Mean Squared Error (MSE = SSE / n). The loss value is independent of dataset size, making it easier to compare models across different datasets.
RMSE
RMSE Landscape Explorer
Root Mean Squared Error visualization. RMSE returns units to original scale (dollars, meters, etc.), making it the de facto standard for reporting model accuracy in real-world applications.
Polynomial Regression
Polynomial
Polynomial Regression (y = ax² + bx + c)
Move beyond straight lines. Fit quadratic curves to data by adjusting three parameters (a, b, c). The 3D loss surface shows how error changes for different values of quadratic and linear coefficients.
Optimization
Gradient Descent
Gradient Descent (Normalized Surface)
Watch the optimization algorithm in action. See the "ball" roll down the normalized convex loss surface to find the optimal regression line. Adjust learning rate and observe convergence behavior.
Raw Geometry
Gradient Descent (Raw Surface)
Same gradient descent visualization but on the true (asymmetric) SSE surface. Observe how the real geometry differs from the idealized "bowl" often shown in textbooks.
Reference
Formulas
SSE vs MSE vs RMSE Formulas
A quick reference card with mathematical formulas for all three error metrics. Includes explanations of when and why to use each metric, with pros and cons clearly explained.
Links
Useful Resources
Curated external links to deepen your understanding: intuitive explanations, Python implementations from scratch, and in-depth articles on linear regression fundamentals.
Module 2: Learning Rate
Understanding the most important hyperparameter in optimization
Learning Rate
Learning Rate Explorer
Interactive visualization of the learning rate hyperparameter. Experiment with different α values to see how step size affects gradient descent: too small leads to slow convergence, too large causes divergence.
Module 3: Logistic Regression
From regression to classification with the Sigmoid function
Sigmoid
Sigmoid Function Explorer
Interactive visualization of the Sigmoid function — the heart of logistic regression. See how β₀ and β₁ parameters shape the probability curve, and explore equivalent notations used in statistics, ML, and linear algebra.
Loss Functions
Why Not MSE for Classification?
Compare SSE/MSE vs Log-Loss for logistic regression. See why MSE creates a non-convex surface with local minima, while Log-Loss (Binary Cross-Entropy) is convex and reliable.
Log-Loss
Deriving the Log-Loss Formula
Interactive derivation of the Log-Loss (Binary Cross-Entropy) formula. Understand why we use -log(ŷ) and -log(1-ŷ) to measure classification error and how it penalizes confident wrong predictions.
Optimization
Gradient Descent on Log-Loss
Watch gradient descent optimize logistic regression in real-time. See the ball roll down the convex Log-Loss surface while the sigmoid curve fits the data in the 2D view.
Interactive
Interactive Data Points Playground
Add, move, and delete data points to see how Logistic Regression adapts. Watch the Forward Pass (prediction) and understand how the Backward Pass (learning) updates parameters.
Formulas
Logistic Regression Formulas
Complete reference card for Logistic Regression: Sigmoid function, Log-Loss (Binary Cross-Entropy), gradient formulas, and classification decision rules. All formulas explained with use cases.