K-Nearest Neighbours (KNN) is perhaps the simplest conceivable supervised-learning algorithm, and one of the oldest. To classify a new query point $\mathbf{x}^*$, find the $k$ training examples closest to it under a chosen distance metric and let them vote, the majority class wins; for regression, average their target values:
$$\hat{y}(\mathbf{x}^*) = \frac{1}{k} \sum_{i \in \mathcal{N}_k(\mathbf{x}^*)} y_i.$$
KNN is a lazy learner or instance-based method: it performs no explicit training phase, instead storing the entire training set and deferring all computation to prediction time. It is also a non-parametric method in the strict sense, the effective number of parameters grows with the data, and is a textbook example of memory-based learning. The technique was first analysed by Evelyn Fix and Joseph Hodges at the US Air Force School of Aviation Medicine in a 1951 technical report, and given a celebrated theoretical treatment by Thomas Cover and Peter Hart in 1967.
The choice of $k$ governs the classic bias–variance tradeoff. With $k = 1$ the algorithm assigns each query the label of its single nearest neighbour, producing an irregular, jagged decision boundary that perfectly fits the training data and is prone to overfitting. As $k$ increases the boundary smooths and variance decreases, at the cost of increased bias; in the limit $k = n$ every prediction is the global mean, the maximum-bias minimum-variance extreme. The optimal $k$ is typically chosen by cross-validation, often falling in the range 3–25 for tabular data.
KNN depends critically on the choice of distance metric. Euclidean distance $d(\mathbf{x}, \mathbf{x}') = \sqrt{\sum_j (x_j - x'_j)^2}$ is the default, but features must be carefully normalised because unscaled features with large numerical ranges dominate the calculation. Mahalanobis distance accounts for covariance structure; cosine similarity is preferred for high-dimensional sparse data such as text. In high-dimensional spaces all pairwise distances tend to converge to a similar value, a manifestation of the curse of dimensionality , rendering nearest-neighbour queries increasingly meaningless and motivating dimensionality reduction or learned embeddings as a preprocessing step.
Naive KNN search is $\mathcal{O}(nd)$ per query, prohibitive for large $n$. Spatial data structures such as KD-trees and ball trees accelerate exact search in low-to-moderate dimensions; approximate nearest-neighbour (ANN) methods including locality-sensitive hashing (LSH), HNSW graphs, and the FAISS and ScaNN libraries sacrifice exactness for dramatic speed gains in high dimensions and now underpin vector databases and the retrieval step of modern retrieval-augmented generation systems. KNN enjoys a useful asymptotic property proved by Cover and Hart: as $n \to \infty$ the 1-NN error rate is at most twice the Bayes error rate, the irreducible minimum for any classifier, a strong guarantee for so simple an algorithm.
Interactive
Video
Related terms: Supervised Learning, Curse of Dimensionality, Bias-Variance Tradeoff, Retrieval-Augmented Generation
Discussed in:
- Chapter 7: Supervised Learning, Classical Machine Learning