Newton's method finds roots through tangent lines, Textbook of AI

Approximate a curve by its tangent at a guess and use the tangent's root as the next guess.

From the chapter: Chapter 3: Calculus

Transcript

We have a function f and we want to find an x where f equals zero.

Pick a starting guess. Draw the tangent to the curve at that point. The tangent is a straight line. Find where the tangent crosses the x-axis.

That crossing becomes the next guess. Draw the tangent there. Find its crossing. Repeat.

For nice functions, the iterates converge to a true root extremely fast. The error roughly squares with each step. Five iterations from a decent start often gives twelve significant digits.

The update rule. New x equals old x minus f at x divided by f-prime at x.

For optimisation we want zeros of the gradient. Newton's method then becomes minus the inverse Hessian times the gradient. Each step uses curvature information, not just slope.

Newton's method converges quadratically near a minimum, vastly faster than gradient descent's linear convergence.

The catch. The Hessian is a square matrix in the number of parameters. For modern deep learning with billions of parameters, storing the Hessian is impossible, let alone inverting it.

Quasi-Newton methods like BFGS approximate the inverse Hessian cheaply. K-FAC and Shampoo extend the idea to neural networks.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).