Also known as: mean, expected value
The Expectation (also called the mean or expected value) of a random variable $X$ is its probability-weighted average. For a discrete variable, $\mathbb{E}[X] = \sum_x x, p(x)$; for a continuous variable, $\mathbb{E}[X] = \int x, f(x),dx$. Intuitively, the expectation is the "centre of mass" of the distribution: if one were to repeat an experiment infinitely many times and average the outcomes, the result would converge to $\mathbb{E}[X]$ by the law of large numbers.
Expectation is a linear operator: $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$, regardless of whether $X$ and $Y$ are independent. This linearity underpins countless results in statistics and machine learning, including the unbiasedness of the sample mean as an estimator of the population mean.
In machine learning, loss functions are almost always defined as expectations. The expected risk is $\mathbb{E}[L(Y, f(X))]$, and training a model is the task of minimising an empirical estimate of this quantity. Stochastic gradient descent approximates the expected gradient by averaging over mini-batches; the law of large numbers ensures that with enough samples, the estimate becomes accurate. The REINFORCE gradient estimator in reinforcement learning, the evidence lower bound in variational inference, and the entropy of a distribution are all expectations of various quantities.
Related terms: Variance, Random Variable
Discussed in:
- Chapter 4: Probability — Expectation & Variance
Also defined in: Textbook of AI