SVD finds the best low-rank approximation, Textbook of AI

Keep the top singular values and a matrix becomes a sum of rank-one terms.

From the chapter: Chapter 2: Linear Algebra

Glossary: singular value decomposition, low rank approximation

Transcript

The singular value decomposition writes any matrix M as U times Sigma times V transposed.

U and V are orthogonal. Sigma is diagonal, with non-negative singular values, ordered largest first.

Each singular value scales a rank-one piece: a column of U times a row of V transposed. The matrix is a sum of these pieces, weighted by the singular values.

Big singular values capture most of the matrix's structure. Small ones capture noise or fine detail.

Truncate. Keep only the top k singular values, set the rest to zero. The result is the best rank-k approximation of M, in least-squares terms.

For an image, k equals fifty captures most of what your eye sees. The same holds for a word embedding matrix, a recommendation matrix, or the activations inside a transformer.

Low-rank approximation is everywhere. PCA is SVD on a centred data matrix. LoRA fine-tunes large language models with rank-eight or rank-sixteen updates. Compression, denoising, and feature extraction all rely on the same principle: most of the signal lives in the top few singular vectors.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).