Integral, Glossary, Textbook of AI

The Integral of a function $f$ over an interval $[a, b]$, written

$$\int_a^b f(x)\,dx,$$

is the signed area under the curve of $f$ between $a$ and $b$. Geometrically the Riemann integral is defined as the limit of sums of rectangle areas $\sum_i f(x_i)\,\Delta x$ as the partition becomes arbitrarily fine; the more general Lebesgue integral generalises this to a much wider class of functions and underlies modern probability theory. The Fundamental Theorem of Calculus, due to Newton and Leibniz in the late seventeenth century, connects integration and differentiation: if $F'(x) = f(x)$, then

$$\int_a^b f(x)\,dx = F(b) - F(a).$$

Integrals and derivatives are, in a precise sense, inverse operations.

In artificial intelligence integrals arise primarily through probability theory. A continuous probability distribution is defined by a probability density function $p(x)$ satisfying $\int_{-\infty}^{\infty} p(x)\,dx = 1$. The expected value of a random variable is $\mathbb{E}[X] = \int x\, p(x)\,dx$, and the variance is $\mathrm{Var}(X) = \int (x - \mathbb{E}[X])^2 p(x)\,dx$. Marginalising out a latent variable in a probabilistic model requires integration: $p(x) = \int p(x, z)\,dz$. Information-theoretic quantities such as the differential entropy $H(X) = -\int p(x)\ln p(x)\,dx$ and the Kullback–Leibler divergence $D_\mathrm{KL}(p \,\|\, q) = \int p(x) \ln\!\frac{p(x)}{q(x)}\,dx$ are likewise integrals. The expected loss of a model over a data distribution, the risk that supervised learning seeks to minimise, is itself an integral $R(\theta) = \int \ell(\theta; x, y)\, p(x, y)\,dx\,dy$, approximated in practice by an empirical average.

Many integrals important in machine learning cannot be computed in closed form. The posterior of a Bayesian neural network, the evidence lower bound (ELBO) of a variational autoencoder, the partition function $Z = \int e^{-E(x)}\,dx$ of an energy-based model, and the marginal likelihood $\int p(D \mid \theta) p(\theta)\,d\theta$ used in Bayesian model comparison are all intractable in general. Two broad families of techniques approximate intractable integrals. Monte Carlo methods draw random samples $x^{(1)}, \ldots, x^{(N)}$ from a tractable distribution and approximate the integral by an empirical average; their error scales as $1/\sqrt{N}$ regardless of the dimension of $x$, beating the curse of dimensionality that cripples deterministic numerical quadrature in high dimensions. Variational methods replace the intractable target distribution with a tractable approximation and turn the integration problem into an optimisation problem. Modern probabilistic deep learning , diffusion models, normalising flows, neural ODEs, depends on a sophisticated toolkit for handling integrals that would have astonished the inventors of the calculus.

Video

Discussed in:

Chapter 6: ML Fundamentals, Mathematical Foundations

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.