Mikhail Belkin, Daniel Hsu, Siyuan Ma, & Soumik Mandal (2019)
Proceedings of the National Academy of Sciences.
DOI: https://doi.org/10.1073/pnas.1903070116
Abstract. Introduces and empirically demonstrates the double-descent phenomenon. As a model's capacity grows past the interpolation threshold, the point at which it can fit the training data exactly, test error first rises (the classical regime) and then falls again, often to below the optimal classical level. The authors show double descent across decision trees, random feature models and small neural networks, challenging the textbook bias-variance trade-off. Subsequent work generalised the picture to dataset size, training time and many other axes of "complexity".
Tags: generalisation theory deep-learning