Curse of Dimensionality, Glossary, Textbook of AI

The Curse of Dimensionality refers to the exponential growth of difficulty as the number of dimensions increases, a phenomenon that affects machine learning in several distinct and counterintuitive ways. The term was coined by Richard Bellman and has become shorthand for the many ways high-dimensional spaces behave unlike our low-dimensional intuitions.

One manifestation is sparsity: the volume of a high-dimensional space grows exponentially, so a fixed amount of data becomes ever more sparse relative to the volume. With 10 samples per dimension in 1 dimension you have 10 samples; in 10 dimensions you would need 10 billion samples for the same density. This makes density estimation, nearest-neighbour search, and kernel methods increasingly difficult. Another manifestation is distance concentration: in high dimensions, pairwise Euclidean distances between points tend to converge, making all points approximately equidistant and degrading nearest-neighbour queries. A third is volume concentration: in a high-dimensional ball, almost all the volume is concentrated in a thin shell near the surface.

In machine learning, the curse affects classical non-parametric methods (KNN, kernel density estimation, decision trees) far more than parametric ones. Deep learning partially escapes the curse by learning low-dimensional representations of high-dimensional data, the manifold hypothesis holds that natural data lies on or near low-dimensional manifolds within high-dimensional spaces. Dimensionality reduction techniques (PCA, t-SNE, UMAP, autoencoders) explicitly exploit this. Monte Carlo integration famously escapes the curse too: its error scales as $1/\sqrt{N}$ regardless of dimension, while deterministic quadrature degrades exponentially.

Discussed in:

Chapter 7: Supervised Learning, K-Nearest Neighbours
Chapter 3: Calculus, Integrals

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.