The Derivative of a function $f$ at a point $x$ measures the instantaneous rate at which $f(x)$ changes as $x$ varies. Formally, it is the limit $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$. Geometrically, the derivative is the slope of the tangent line to the graph of $f$ at the point $(x, f(x))$. A positive derivative indicates the function is increasing; a negative derivative, decreasing; a derivative of zero, a local extremum or inflection point.
The standard rules of differentiation—power, product, quotient, and chain rules—allow complex expressions to be differentiated by combining simpler building blocks. Two derivatives of particular importance in machine learning are $\frac{d}{dx}e^x = e^x$ and $\frac{d}{dx}\ln x = \frac{1}{x}$, which appear in the softmax function, cross-entropy losses, and log-likelihoods. The sigmoid function has the elegant property $\sigma'(x) = \sigma(x)(1 - \sigma(x))$, making backpropagation through it especially efficient.
Derivatives are the foundation of optimisation-based machine learning. The gradient of a loss function tells us how the loss would change if we perturbed a parameter slightly; by stepping in the direction of the negative gradient, we reduce the loss. Second derivatives (captured by the Hessian matrix in higher dimensions) encode curvature and are used by Newton's method and other second-order optimisers. The concept of the derivative, in short, is the mathematical basis on which every neural network in the world is trained.
Related terms: Gradient, Chain Rule, Gradient Descent, Partial Derivative
Discussed in:
- Chapter 3: Calculus — Derivatives
Also defined in: Textbook of AI