A Partial Derivative of a multivariable function $f(x_1, \ldots, x_n)$ with respect to $x_i$, written $\frac{\partial f}{\partial x_i}$, is the derivative taken while treating all other variables as constants. It measures the rate of change of $f$ along the $i$-th coordinate axis alone. Partial derivatives are computed using exactly the same rules as ordinary derivatives, and they are the building blocks of the gradient vector.
In machine learning, every parameter in a model has its own partial derivative of the loss function, and collectively these form the gradient. For a neural network with a million weights, computing the loss's partial derivative with respect to each weight individually would be hopelessly inefficient; backpropagation computes them all simultaneously in a single pass by propagating partial derivatives through the computational graph.
Partial derivatives also arise in probability theory, where joint densities are integrated or differentiated with respect to individual variables, and in optimisation theory, where Karush–Kuhn–Tucker (KKT) conditions use partial derivatives to characterise constrained optima. Mastery of partial derivatives is a prerequisite to understanding every deep learning framework's autograd system, every variational inference derivation, and every theoretical analysis of gradient-based learning.
Related terms: Gradient, Derivative, Backpropagation
Discussed in:
- Chapter 3: Calculus — Gradients
Also defined in: Textbook of AI