Overfitting, Glossary, Textbook of AI

Overfitting occurs when a model fits the training data so closely that it captures not only the genuine patterns but also the noise and idiosyncrasies. An overfit model achieves low training error but high test error, it fails to generalise to new data drawn from the same distribution. Overfitting is the central concern of statistical learning theory and the reason the field has developed such an elaborate toolkit of regularisation techniques.

Overfitting arises when a model's capacity exceeds what the data can support. A linear regression with many more features than examples will memorise the training set but predict arbitrarily on new points. A deep neural network with millions of parameters can, if trained without regularisation, achieve zero training error even on randomly labelled data, yet such a model is useless in practice. Statistical learning theory quantifies this via concepts like VC dimension and PAC bounds, which relate generalisation error to training error, model capacity, and sample size.

Diagnosing overfitting requires a held-out validation set: if training error is low but validation error is high, the model is overfitting. Remedies include collecting more data, simplifying the model, adding regularisation (L1/L2, dropout, weight decay), data augmentation, early stopping, and ensembling. The opposite failure, underfitting, occurs when the model is too simple to capture the underlying patterns; both training and validation errors are high. Navigating between underfitting and overfitting is the central craft of applied machine learning.

Interactive

Overfitting and early stopping. Training loss keeps falling. Validation loss bottoms out, then rises. The gap is overfitting.

The bias-variance tradeoff. Underfitting is high bias, overfitting is high variance. The best model balances the two.

Learning curves diagnose under- and over-fitting. Plot training and validation error against dataset size. The shapes reveal what's wrong.

Video

Discussed in:

Chapter 6: ML Fundamentals, The ML Framework

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.