The Variance of a random variable measures how spread out its distribution is around the mean. Formally, $\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$. Unlike expectation, variance is not linear: $\text{Var}(aX + b) = a^2 \text{Var}(X)$. For independent random variables, variances add: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$. The standard deviation $\sigma = \sqrt{\text{Var}(X)}$ shares the units of $X$ and is often preferred for interpretability.
In machine learning, variance characterises uncertainty. A model whose predictions have high variance across different training sets is unstable and likely overfitting. The bias–variance decomposition splits a model's expected prediction error into three terms—irreducible noise, squared bias, and variance—providing a principled framework for understanding when increasing model complexity helps and when it hurts.
The covariance between two random variables, $\text{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$, generalises variance to the bivariate setting. Its normalised form, the Pearson correlation $\rho = \text{Cov}(X, Y) / (\sigma_X \sigma_Y)$, lies in $[-1, 1]$ and is scale-invariant. For a vector of random variables, the covariance matrix with entries $\Sigma_{ij} = \text{Cov}(X_i, X_j)$ is a symmetric positive semi-definite matrix that encodes all pairwise linear dependencies, and is central to PCA and Gaussian modelling.
Related terms: Expectation, Bias-Variance Tradeoff
Discussed in:
- Chapter 4: Probability — Expectation & Variance
Also defined in: Textbook of AI, Textbook of Medical Statistics