Learning Rate

Gradient Descent Explorer

Scenario

Learning Rate (α) 0.10

Iteration: 0

Parameter w: 0.00

Loss L(w): 0.00

Gradient: 0.00

How it works

Curve: The "Loss Landscape". We want to find the bottom (minimum loss).
Gradient: The slope. Tells us which way is "down".
Alpha (α): The step size.
- Small: Safe but slow.
- Good: Fast convergence.
- Large: Overshoots, unstable.

$$ w_{new} = w - \alpha \frac{\partial Loss}{\partial w} $$

Values

Converged!