Support Vector Machine — Glossary

Also known as: SVM

A Support Vector Machine (SVM) seeks the hyperplane that separates two classes with the maximum margin—the largest possible distance between the hyperplane and the nearest data points from either class. These nearest points are the support vectors, and they alone determine the decision boundary. The max-margin principle is motivated by statistical learning theory: classifiers with larger margins enjoy tighter generalisation bounds.

Formally, the hard-margin SVM minimises $\frac{1}{2}|\mathbf{w}|^2$ subject to $y_i(\mathbf{w}^T \mathbf{x}_i + b) \geq 1$ for all $i$. When data are not perfectly separable, the soft-margin formulation introduces slack variables and a regularisation parameter $C$ controlling the tradeoff between margin width and training errors. The SVM objective can be solved efficiently as a quadratic program, and the Sequential Minimal Optimisation (SMO) algorithm dramatically accelerated SVM training in the late 1990s.

The power of SVMs is vastly amplified by the kernel trick. The dual formulation depends on data only through pairwise inner products $\mathbf{x}_i^T \mathbf{x}_j$. Replacing these with a kernel function $K(\mathbf{x}_i, \mathbf{x}_j)$ implicitly maps the data into a high-dimensional (possibly infinite-dimensional) feature space where linear separation may be possible. Popular kernels include polynomial and the RBF (Gaussian) kernel. SVMs dominated classification before deep learning and remain competitive for small-to-medium datasets, especially in high dimensions.

Related terms: Kernel Trick

Discussed in:

Chapter 7: Supervised Learning — Support Vector Machines

Also defined in: Textbook of AI