- Represent vectors algebraically and geometrically and compute vector norms, sums, and scalar multiples
- Use the dot product to measure similarity, projection, and angle between vectors
- Perform matrix operations (addition, multiplication, transpose, inverse) and interpret them as linear transformations
- Explain the meaning of eigenvalues and eigenvectors and their role in PCA, PageRank, and spectral methods
- Describe how embeddings map discrete objects (words, users, items) into continuous vector spaces where similarity becomes geometric
- Derive the singular value decomposition and use it for low-rank approximation, PCA, and least-squares solutions
- Compute matrix derivatives in numerator and denominator layout and apply them to standard machine-learning losses
- Recognise and avoid numerically unstable operations using condition numbers, QR, and SVD
A modern neural network is, at its core, a long pipeline of matrix multiplications interleaved with element-wise non-linearities. A search engine ranks pages by decomposing a term–document matrix into latent factors. A recommender embeds users and films as vectors and computes dot products to predict ratings. A diffusion model generates images by repeatedly applying a learned linear operator and a noise schedule. Strip the marketing away from any deployed AI system and you find linear algebra at its base.
This chapter is a working linear-algebra reference for the rest of the book. It assumes you have seen vectors, matrices, and determinants once before, perhaps in a first-year calculus or engineering course, and brings you up to the level required to read papers on attention, optimisation, and dimensionality reduction without slipping. We pay particular attention to what a matrix does to a vector, the geometric picture, because that intuition is what survives when notation gets dense in later chapters. We also place serious weight on numerical and computational realities. A theoretically correct algorithm that is unstable on float32 hardware is no use; a cleanly-derived gradient that costs $O(n^4)$ is no use either. AI is the field where linear algebra meets very large numbers and very tight floating-point budgets.
A note on style. We use bold lower-case for vectors ($\mathbf{x}$, $\mathbf{w}$) and bold upper-case for matrices ($\mathbf{A}$, $\mathbf{W}$). All vectors are column vectors unless stated otherwise. Transposes are written $\mathbf{A}^\top$; inverses $\mathbf{A}^{-1}$; the identity matrix $\mathbf{I}_n$ when its size matters and just $\mathbf{I}$ otherwise. We use $\mathbb{R}^n$ for the real $n$-dimensional vector space and treat its elements interchangeably as ordered tuples, column matrices, and points in space.
In this chapter
- 2.1 Why linear algebra is the language of AI
- 2.2 Vectors, vector spaces, and norms
- 2.3 Matrices and matrix multiplication
- 2.4 Linear maps, rank, and the four subspaces
- 2.4a Matrix factorisations: LU, Cholesky, QR
- 2.5 Determinants and trace
- 2.6 Eigenvalues and eigenvectors
- 2.7 The singular value decomposition
- 2.8 Principal component analysis via SVD
- 2.9 Matrix calculus and vector derivatives
- 2.10 Tensors, broadcasting, and einsum
- 2.11 Numerical stability
- 2.12 Hardware reality
- 2.13 How this chapter connects forward
- 2.14 Summary
- 2.15 Exercises
- 2.16 Further reading