Hold one variable, slope along the other. Two partials make the gradient.
From the chapter: Chapter 3: Calculus
Glossary: partial derivative, gradient
Transcript
A function of two variables defines a surface. To take its derivative, we have to ask, with respect to which variable.
Hold y fixed. Slice the surface along x. The slope of that slice at a point is the partial derivative with respect to x.
Now hold x fixed. Slice along y. The slope of that slice is the partial derivative with respect to y.
Two partial derivatives, two slopes. Together, they form a vector: the gradient. Pointing in the direction of steepest ascent.
For a function of n variables, n partials, one gradient with n entries.
Partial derivatives are calculated like ordinary one-variable derivatives. Treat the other variables as constants. The chain rule still applies, the product rule still applies.
In neural networks, the loss is a function of millions of parameters. Each parameter has a partial derivative. Together, they form the gradient that drives every learning step.
Reverse-mode automatic differentiation, also known as backpropagation, computes all those partials in one efficient backward sweep.