A ball rolls down a quadratic surface as the learning rate changes.
From the chapter: Chapter 3: Calculus
Glossary: gradient descent, learning rate, momentum
People: boris polyak
References: Polyak, 1964
Transcript
Picture a ball balanced on the rim of a smooth bowl. Gravity pulls it toward the lowest point.
In machine learning, the bowl is the loss function. The ball's path is the trajectory of the model's parameters as they learn.
A small step size means a slow, safe descent. Many tiny jumps to reach the minimum.
A larger step size gets there faster, but can overshoot the minimum and bounce around the walls.
Push the step size too far and the ball flies out altogether. Every step makes things worse.
With momentum added, the ball accumulates velocity. It glides through long shallow valleys that vanilla gradient descent would crawl across.
That is gradient descent: a derivative, a step size, and the engine that trains every modern neural network.