Also known as: k-NN, KNN
K-Nearest Neighbours (KNN) is perhaps the simplest conceivable supervised learning method. To classify a new point, find the $k$ training examples closest to it and let them vote; for regression, average their target values. KNN is a lazy learner or instance-based method—it performs no explicit training phase, instead storing the entire training set and deferring all computation to prediction time.
The choice of $k$ governs the bias-variance tradeoff. With $k = 1$, the algorithm assigns each query the label of its single nearest neighbour, producing an irregular decision boundary prone to overfitting. As $k$ increases, the boundary smooths and variance decreases, at the cost of increased bias. The optimal $k$ is typically chosen by cross-validation.
KNN depends critically on the distance metric. Euclidean distance is the default, but features must be carefully normalised because unscaled features dominate the calculation. In high-dimensional spaces, all pairwise distances tend to converge—a manifestation of the curse of dimensionality—rendering nearest-neighbour queries meaningless. Spatial data structures (KD-trees, ball trees) accelerate search in low dimensions; approximate nearest-neighbour methods (LSH, HNSW, FAISS) sacrifice exactness for dramatic speed gains in high dimensions. KNN enjoys a beautiful asymptotic property: as $n \to \infty$, the 1-NN error rate is at most twice the Bayes error, the irreducible minimum for any classifier.
Related terms: Curse of Dimensionality
Discussed in:
- Chapter 7: Supervised Learning — K-Nearest Neighbours
Also defined in: Textbook of AI