DBSCAN, Glossary, Textbook of AI

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) defines clusters as dense regions of feature space separated by regions of lower density. Unlike k-means and hierarchical clustering, it does not require specifying the number of clusters in advance, discovers clusters of arbitrary shape, and automatically identifies outliers as "noise" points that do not belong to any dense region.

The algorithm is governed by two parameters: $\varepsilon$, the radius of a point's neighbourhood, and minPts, the minimum number of points required for a region to be considered dense. A core point has at least minPts points within its $\varepsilon$-neighbourhood. A border point falls within the $\varepsilon$-neighbourhood of a core point but is not itself a core point. A noise point is neither. Clusters are formed by connecting core points whose $\varepsilon$-neighbourhoods overlap, transitively: two core points are in the same cluster if they are density-reachable through a chain of core points.

DBSCAN's principal strengths are its handling of non-convex clusters and its built-in noise detection. Its principal weakness is sensitivity to parameter choice, especially when clusters have varying densities, a single global $\varepsilon$ cannot capture both tight and loose clusters. OPTICS and HDBSCAN extend DBSCAN to handle varying densities and automate parameter selection, constructing a cluster hierarchy from which stable clusters can be extracted. These density-based methods are increasingly popular in modern data analysis pipelines.

Related terms: K-Means, Anomaly Detection

Discussed in:

Chapter 8: Unsupervised Learning, DBSCAN

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.