Glossary

Gradient

The Gradient of a scalar-valued function $f(x_1, \ldots, x_n)$ is the vector of all its partial derivatives: $\nabla f = (\partial f / \partial x_1, \ldots, \partial f / \partial x_n)$. It lives in the same space as the input and has a powerful geometric interpretation: it points in the direction of steepest increase of $f$, and its magnitude equals the rate of increase in that direction. Conversely, $-\nabla f$ points in the direction of steepest decrease—exactly the direction we wish to move when minimising a loss function.

For a neural network with millions of parameters $\boldsymbol{\theta}$, the gradient $\nabla_{\boldsymbol{\theta}} L$ is a million-dimensional vector. Each component tells us how much the loss would change if we nudged the corresponding parameter slightly. Computing this enormous gradient efficiently is the job of the backpropagation algorithm, which applies the chain rule systematically through the network's computational graph.

The gradient also connects to the directional derivative: the rate of change of $f$ in any direction $\mathbf{u}$ is given by $\nabla f \cdot \mathbf{u}$. The gradient thus encodes, in a single vector, all the directional information about how $f$ responds to small perturbations. For vector-valued functions, the analogue is the Jacobian matrix, whose rows are the gradients of each output component. Understanding gradients geometrically—as a vector field over parameter space, pointing "uphill" on the loss surface—is the key to building intuition for how neural networks learn.

Related terms: Derivative, Gradient Descent, Backpropagation, Jacobian

Discussed in:

Also defined in: Textbook of AI