Learning Objectives
- Formulate the unsupervised learning problem as density estimation, clustering, dimensionality reduction, or generative modelling, and identify which family a given task belongs to
- Derive Lloyd's algorithm for k-means as coordinate descent on the within-cluster sum of squares, prove its monotone convergence, and implement k-means++ initialisation
- Derive the EM algorithm for Gaussian mixture models, recognise it as coordinate ascent on the evidence lower bound, and implement it from scratch
- Choose between agglomerative linkage criteria (single, complete, average, Ward) on the basis of cluster shape and noise sensitivity
- Apply DBSCAN, HDBSCAN, and spectral clustering, choosing parameters from k-distance plots and the eigengap heuristic
- Derive PCA from the maximum-variance, minimum-reconstruction-error, and SVD viewpoints, and implement probabilistic PCA
- Use kernel PCA, t-SNE, UMAP, autoencoders, Isomap, locally linear embedding, and Laplacian eigenmaps for non-linear dimensionality reduction
- Derive Latent Dirichlet Allocation and the collapsed Gibbs sampler, and apply it to a document corpus
- Evaluate unsupervised methods using internal indices (silhouette, Davies-Bouldin, Calinski-Harabasz), external indices (mutual information, ARI), and downstream-task performance
- Situate unsupervised learning relative to modern self-supervised learning: pretext tasks, contrastive losses, masked modelling
In this chapter
- 8.1 The unsupervised problem
- 8.2 Density estimation
- 8.3 K-means clustering
- 8.4 Hierarchical clustering
- 8.5 Gaussian mixture models and EM
- 8.6 Density-based clustering: DBSCAN and HDBSCAN
- 8.7 Spectral clustering
- 8.8 Principal component analysis
- 8.9 Kernel PCA
- 8.10 t-SNE
- 8.11 UMAP
- 8.12 Manifold learning: Isomap, LLE, Laplacian eigenmaps
- 8.13 Autoencoders
- 8.14 Topic modelling: latent Dirichlet allocation
- 8.15 Anomaly detection
- 8.16 Evaluation of unsupervised methods
- 8.17 The shift to self-supervised learning
- 8.18 Exercises
- 8.19 Further reading