Also known as: VC-dimension, Vapnik–Chervonenkis dimension
The VC dimension (Vapnik–Chervonenkis dimension), introduced by Vapnik and Chervonenkis in the 1960s and 70s, is a measure of the capacity of a hypothesis class, its richness or flexibility. The VC dimension of a class H is the largest cardinality d of a set of points S such that, for every possible labelling of S with ±1, there is a hypothesis in H that perfectly classifies S according to that labelling. The class is said to shatter S.
Examples: the class of half-planes in ℝ² has VC dimension 3 (any three points in general position can be shattered, but four cannot). The class of axis-aligned rectangles has VC dimension 4. The class of polynomials of degree d in one variable has VC dimension d + 1. Linear classifiers in ℝ^n have VC dimension n + 1.
The VC bound uses VC dimension to bound generalisation error: for a class of VC dimension d trained on m examples with empirical risk R̂, the true risk R is, with high probability, at most R̂ + O(√(d log m / m)). The bound formalises the trade-off between model capacity (high d allows fitting more functions) and generalisation (high d means more samples needed before empirical risk is a reliable estimate of true risk).
VC theory is one of the foundations of statistical learning theory and provides the theoretical justification for structural risk minimisation , choosing the model class with minimum capacity sufficient to fit the data. The bounds are typically loose for modern overparameterised neural networks, leading to active research on alternative complexity measures (Rademacher complexity, PAC-Bayes, neural tangent kernel norms, etc.).
Related terms: vladimir-vapnik, Statistical Learning Theory, Structural Risk Minimisation, Generalisation
Discussed in:
- Chapter 6: ML Fundamentals, Machine Learning Fundamentals