Gradient descent on a quadratic bowl, Textbook of AI

A ball rolls down a quadratic surface as the learning rate changes.

From the chapter: Chapter 3: Calculus

Glossary: gradient descent, learning rate, momentum

People: boris polyak

References: Polyak, 1964

Picture a ball balanced on the rim of a smooth bowl. Gravity pulls it toward the lowest point.

In machine learning, the bowl is the loss function. The ball's path is the trajectory of the model's parameters as they learn.

A small step size means a slow, safe descent. Many tiny jumps to reach the minimum.

A larger step size gets there faster, but can overshoot the minimum and bounce around the walls.

Push the step size too far and the ball flies out altogether. Every step makes things worse.

With momentum added, the ball accumulates velocity. It glides through long shallow valleys that vanilla gradient descent would crawl across.

That is gradient descent: a derivative, a step size, and the engine that trains every modern neural network.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).