The Bias–Variance Tradeoff is one of the most important conceptual frameworks in statistical learning theory. The expected prediction error of a model decomposes into three terms: irreducible noise, squared bias, and variance. Bias measures how far the average prediction (over many training sets) is from the truth—it reflects the model's systematic error. Variance measures how much predictions vary across different training sets—it reflects the model's sensitivity to the particular data it was trained on.
Simple models (e.g., linear regression with few features) tend to have high bias and low variance: they systematically miss patterns but produce stable predictions. Complex models (e.g., deep networks with millions of parameters) tend to have low bias and high variance: they can capture intricate patterns but overfit to noise. The optimal complexity balances these two sources of error, minimising their sum.
Regularisation techniques—L2 penalties, L1 sparsity, dropout, data augmentation, early stopping—all introduce a small amount of bias to achieve a large reduction in variance. Ensemble methods such as bagging and boosting navigate the tradeoff differently: bagging primarily reduces variance by averaging, while boosting primarily reduces bias by sequentially fitting residuals. The classical U-shaped curve of test error versus complexity is complicated by the "double descent" phenomenon in heavily overparameterised deep networks, but the bias–variance decomposition remains the essential starting point for reasoning about generalisation.
Related terms: Overfitting, Regularisation, Ensemble Methods
Discussed in:
- Chapter 5: Statistics — Bias–Variance Tradeoff
Also defined in: Textbook of AI, Textbook of Medical AI