Glossary

Kernel Trick

The Kernel Trick is a mathematical technique that enables algorithms to operate in high-dimensional (even infinite-dimensional) feature spaces without ever explicitly computing the coordinates of data in that space. Instead, one only needs to compute pairwise inner products, which a kernel function $K(\mathbf{x}, \mathbf{z}) = \langle \phi(\mathbf{x}), \phi(\mathbf{z}) \rangle$ can provide directly.

The trick applies to any algorithm whose operations on data can be expressed entirely in terms of inner products. The classic example is kernel SVM, where the dual formulation depends on training data only through pairwise dot products. Replacing these dot products with a kernel function implicitly maps the data into a richer feature space where linear separation may be possible, without paying the computational cost of that mapping. Popular kernels include the polynomial kernel $K(\mathbf{x}, \mathbf{z}) = (\mathbf{x}^T\mathbf{z} + c)^d$, and the radial basis function (RBF) kernel $K(\mathbf{x}, \mathbf{z}) = \exp(-\gamma |\mathbf{x} - \mathbf{z}|^2)$, which corresponds to an infinite-dimensional feature space.

The kernel trick also powers Kernel PCA (nonlinear dimensionality reduction), Gaussian Processes (Bayesian non-parametric models), and many other methods. It was central to the kernel methods era of the 1990s and 2000s, when support vector machines dominated classification. The trick illustrates a beautiful mathematical principle: by abstracting over the representation of data and working only with similarity, one can achieve nonlinear power while retaining the tractability of linear algebra. Deep learning has largely supplanted kernel methods in most large-scale applications, but the kernel trick remains conceptually important and is finding renewed relevance in theoretical analyses of neural networks via the neural tangent kernel framework.

Related terms: Support Vector Machine, Principal Component Analysis

Discussed in:

Also defined in: Textbook of AI