Glossary

Hierarchical Clustering

Hierarchical Clustering produces a nested sequence of partitions rather than a single flat grouping, organising the data into a tree-like structure called a dendrogram. This dendrogram encodes clustering solutions at every number of clusters simultaneously: cutting the tree at different heights yields different partitions, allowing multi-scale exploration without committing to $K$ in advance.

Agglomerative (bottom-up) clustering starts with each point as its own cluster and iteratively merges the two closest clusters until a single cluster remains. The behaviour is governed by the linkage criterion: single linkage (minimum pairwise distance, can discover elongated clusters but susceptible to chaining), complete linkage (maximum pairwise distance, produces compact clusters but sensitive to outliers), average linkage (mean distance), and Ward's method (merges that minimise the increase in total within-cluster variance—often the best default). Divisive (top-down) clustering starts with one cluster and recursively splits; it is less common due to higher computational cost.

The computational cost is typically $O(n^2 \log n)$ with priority-queue implementations, and $O(n^2)$ space for the distance matrix. This limits hierarchical clustering to datasets of at most tens of thousands of points. It is particularly valuable in domains with natural hierarchy—phylogenetics, document organisation, social network community detection—and when the dendrogram itself provides useful insight beyond a flat partition.

Related terms: K-Means Clustering, DBSCAN

Discussed in:

Also defined in: Textbook of AI