Derivative, Glossary, Textbook of AI

The Derivative of a function $f$ at a point $x$ measures the instantaneous rate at which $f(x)$ changes as $x$ varies. Formally, it is the limit $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$. Geometrically, the derivative is the slope of the tangent line to the graph of $f$ at the point $(x, f(x))$. A positive derivative indicates the function is increasing; a negative derivative, decreasing; a derivative of zero, a local extremum or inflection point.

The standard rules of differentiation, power, product, quotient, and chain rules, allow complex expressions to be differentiated by combining simpler building blocks. Two derivatives of particular importance in machine learning are $\frac{d}{dx}e^x = e^x$ and $\frac{d}{dx}\ln x = \frac{1}{x}$, which appear in the softmax function, cross-entropy losses, and log-likelihoods. The sigmoid function has the elegant property $\sigma'(x) = \sigma(x)(1 - \sigma(x))$, making backpropagation through it especially efficient.

Derivatives are the foundation of optimisation-based machine learning. The gradient of a loss function tells us how the loss would change if we perturbed a parameter slightly; by stepping in the direction of the negative gradient, we reduce the loss. Second derivatives (captured by the Hessian matrix in higher dimensions) encode curvature and are used by Newton's method and other second-order optimisers. The concept of the derivative, in short, is the mathematical basis on which every neural network in the world is trained.

Interactive

A polynomial that hugs the curve. Order zero is a constant, order one is a tangent line, order two is a parabola. Each adds a derivative.

Video

Related terms: Gradient, Chain Rule, Gradient Descent, Partial Derivative

Discussed in:

Chapter 3: Calculus, Derivatives

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.