Partial derivatives slice a surface, Textbook of AI

Hold one variable, slope along the other. Two partials make the gradient.

From the chapter: Chapter 3: Calculus

Glossary: partial derivative, gradient

Transcript

A function of two variables defines a surface. To take its derivative, we have to ask, with respect to which variable.

Hold y fixed. Slice the surface along x. The slope of that slice at a point is the partial derivative with respect to x.

Now hold x fixed. Slice along y. The slope of that slice is the partial derivative with respect to y.

Two partial derivatives, two slopes. Together, they form a vector: the gradient. Pointing in the direction of steepest ascent.

For a function of n variables, n partials, one gradient with n entries.

Partial derivatives are calculated like ordinary one-variable derivatives. Treat the other variables as constants. The chain rule still applies, the product rule still applies.

In neural networks, the loss is a function of millions of parameters. Each parameter has a partial derivative. Together, they form the gradient that drives every learning step.

Reverse-mode automatic differentiation, also known as backpropagation, computes all those partials in one efficient backward sweep.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).