Find the axis along which the data spreads most, then the next perpendicular axis, then the next.
From the chapter: Chapter 8: Unsupervised Learning
Glossary: principal component analysis, pca math
Transcript
A cloud of points in two dimensions, elongated and tilted.
PCA asks: along what direction does the cloud spread most.
Compute the covariance matrix. Find its eigenvectors. The first eigenvector points along the long axis of the cloud. The first principal component.
The second eigenvector is perpendicular. The second principal component.
The eigenvalues say how much variance lives along each axis. Big first eigenvalue, small second eigenvalue: most of the spread is along the first axis.
Project the data onto the first principal component. Two dimensions become one. The variance preserved is the first eigenvalue. The variance lost is the second.
In high dimensions the same procedure: find the top-k eigenvectors of the covariance matrix, project onto them, keep that much of the spread.
PCA is an orthogonal change of basis. It rotates the cloud so the new axes line up with its widest dimensions.
Used everywhere data are too high-dimensional to visualise: gene expression, images, embeddings. Used inside whitening, denoising, and as a sanity check for any other unsupervised method.
The core idea: principal components are eigenvectors of the data's covariance.