How LLM Works - AI & Machine Learning Education

SSE

SSE Landscape Explorer

Explore the Sum of Squared Errors (SSE) loss landscape. Manipulate slope and intercept manually to see how the loss function changes. Visualize in 2D scatter plot and interactive 3D surfaces (normalized and raw geometry).

MSE

MSE Landscape Explorer

Same interactive tool using Mean Squared Error (MSE = SSE / n). The loss value is independent of dataset size, making it easier to compare models across different datasets.

RMSE

RMSE Landscape Explorer

Root Mean Squared Error visualization. RMSE returns units to original scale (dollars, meters, etc.), making it the de facto standard for reporting model accuracy in real-world applications.

Polynomial

Polynomial Regression (y = ax² + bx + c)

Move beyond straight lines. Fit quadratic curves to data by adjusting three parameters (a, b, c). The 3D loss surface shows how error changes for different values of quadratic and linear coefficients.

Gradient Descent

Gradient Descent (Normalized Surface)

Watch the optimization algorithm in action. See the "ball" roll down the normalized convex loss surface to find the optimal regression line. Adjust learning rate and observe convergence behavior.

Raw Geometry

Gradient Descent (Raw Surface)

Same gradient descent visualization but on the true (asymmetric) SSE surface. Observe how the real geometry differs from the idealized "bowl" often shown in textbooks.

Formulas

SSE vs MSE vs RMSE Formulas

A quick reference card with mathematical formulas for all three error metrics. Includes explanations of when and why to use each metric, with pros and cons clearly explained.

Links

Useful Resources

Curated external links to deepen your understanding: intuitive explanations, Python implementations from scratch, and in-depth articles on linear regression fundamentals.

Learning Rate

Learning Rate Explorer

Interactive visualization of the learning rate hyperparameter. Experiment with different α values to see how step size affects gradient descent: too small leads to slow convergence, too large causes divergence.

Sigmoid

Sigmoid Function Explorer

Interactive visualization of the Sigmoid function — the heart of logistic regression. See how β₀ and β₁ parameters shape the probability curve, and explore equivalent notations used in statistics, ML, and linear algebra.

Loss Functions

Why Not MSE for Classification?

Compare SSE/MSE vs Log-Loss for logistic regression. See why MSE creates a non-convex surface with local minima, while Log-Loss (Binary Cross-Entropy) is convex and reliable.

Log-Loss

Deriving the Log-Loss Formula

Interactive derivation of the Log-Loss (Binary Cross-Entropy) formula. Understand why we use -log(ŷ) and -log(1-ŷ) to measure classification error and how it penalizes confident wrong predictions.

Optimization

Gradient Descent on Log-Loss

Watch gradient descent optimize logistic regression in real-time. See the ball roll down the convex Log-Loss surface while the sigmoid curve fits the data in the 2D view.

Interactive

Interactive Data Points Playground

Add, move, and delete data points to see how Logistic Regression adapts. Watch the Forward Pass (prediction) and understand how the Backward Pass (learning) updates parameters.

Formulas

Logistic Regression Formulas

Complete reference card for Logistic Regression: Sigmoid function, Log-Loss (Binary Cross-Entropy), gradient formulas, and classification decision rules. All formulas explained with use cases.