Random Variable, Glossary, Textbook of AI

A random variable is a numerical function that assigns a value to each outcome of a random experiment. Formally, given a probability space $(\Omega, \mathcal{F}, P)$, a sample space $\Omega$, a $\sigma$-algebra of events $\mathcal{F}$, and a probability measure $P$, a random variable $X$ is a measurable function $X: \Omega \to \mathbb{R}$ (or more generally $\mathbb{R}^n$). Despite the name, a random variable is neither random nor a variable in the algebraic sense; it is a deterministic function whose argument is unobserved.

Discrete and continuous

Random variables are discrete if they take countably many values (e.g. the number of heads in ten coin flips, the number of cars passing a junction in an hour) and continuous if they take values in an uncountable set, typically an interval (e.g. tomorrow's temperature, a patient's blood pressure).

A discrete random variable is described by its probability mass function (PMF)

$$p(x) = P(X = x), \quad \sum_x p(x) = 1.$$

A continuous random variable is described by its probability density function (PDF) $f(x)$ satisfying

$$P(a \leq X \leq b) = \int_a^b f(x)\, dx, \quad \int_{-\infty}^{\infty} f(x)\, dx = 1.$$

The cumulative distribution function $F(x) = P(X \leq x)$ provides a unified description for both cases and is the canonical object for theoretical work; it is non-decreasing, right-continuous, with limits 0 and 1 at $\pm\infty$.

Summary statistics

Expectation (mean): $\mathbb{E}[X] = \sum_x x\, p(x)$ or $\int x\, f(x)\, dx$, the centre of mass of the distribution.
Variance: $\mathrm{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - \mathbb{E}[X]^2$, a measure of spread.
Standard deviation: $\sigma_X = \sqrt{\mathrm{Var}(X)}$, shares units with $X$.
Higher moments capture skewness (asymmetry, third standardised moment) and kurtosis (tail heaviness, fourth standardised moment).

Linearity of expectation, $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$, holds without independence and is a workhorse of probabilistic analysis.

History

The concept matured from informal use by Pascal, Fermat and the Bernoullis in the seventeenth and eighteenth centuries, was made rigorous by Andrey Kolmogorov's measure-theoretic axiomatisation of probability in 1933, and forms the bedrock of modern statistics.

Modern relevance

Random variables bridge abstract probability theory and the numerical quantities algorithms manipulate. In machine learning, feature vectors, labels, model parameters, loss values and predictions are all modelled as random variables or functions of them. The assumption that training examples are drawn independently and identically distributed (i.i.d.) from some underlying distribution is foundational to most of statistical learning theory: PAC bounds, generalisation error, the bias-variance decomposition, and the law of large numbers all rest on this scaffolding. Probabilistic programming languages (Stan, Pyro, NumPyro, Turing) make random variables first-class objects of programming.

Discussed in:

Chapter 5: Statistics, Probability and Statistics

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.