Hierarchical Clustering, Glossary, Textbook of AI

Hierarchical Clustering produces a nested sequence of partitions rather than a single flat grouping, organising the data into a tree-like structure called a dendrogram. This dendrogram encodes clustering solutions at every number of clusters simultaneously: cutting the tree at different heights yields different partitions, allowing multi-scale exploration without committing to $K$ in advance.

Agglomerative (bottom-up) clustering starts with each point as its own cluster and iteratively merges the two closest clusters until a single cluster remains. The behaviour is governed by the linkage criterion: single linkage (minimum pairwise distance, can discover elongated clusters but susceptible to chaining), complete linkage (maximum pairwise distance, produces compact clusters but sensitive to outliers), average linkage (mean distance), and Ward's method (merges that minimise the increase in total within-cluster variance, often the best default). Divisive (top-down) clustering starts with one cluster and recursively splits; it is less common due to higher computational cost.

The computational cost is typically $O(n^2 \log n)$ with priority-queue implementations, and $O(n^2)$ space for the distance matrix. This limits hierarchical clustering to datasets of at most tens of thousands of points. It is particularly valuable in domains with natural hierarchy, phylogenetics, document organisation, social network community detection, and when the dendrogram itself provides useful insight beyond a flat partition.

Interactive

A dendrogram emerges from agglomerative clustering. Start with each point its own cluster, merge the closest pair, repeat.

Video

Related terms: K-Means, DBSCAN

Discussed in:

Chapter 8: Unsupervised Learning, Hierarchical Clustering

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.