Chapter Two

Linear Algebra

Learning Objectives
  1. Represent vectors algebraically and geometrically and compute vector norms, sums, and scalar multiples
  2. Use the dot product to measure similarity, projection, and angle between vectors
  3. Perform matrix operations (addition, multiplication, transpose, inverse) and interpret them as linear transformations
  4. Explain the meaning of eigenvalues and eigenvectors and their role in PCA, PageRank, and spectral methods
  5. Describe how embeddings map discrete objects (words, users, items) into continuous vector spaces where similarity becomes geometric
  6. Derive the singular value decomposition and use it for low-rank approximation, PCA, and least-squares solutions
  7. Compute matrix derivatives in numerator and denominator layout and apply them to standard machine-learning losses
  8. Recognise and avoid numerically unstable operations using condition numbers, QR, and SVD

A modern neural network is, at its core, a long pipeline of matrix multiplications interleaved with element-wise non-linearities. A search engine ranks pages by decomposing a term–document matrix into latent factors. A recommender embeds users and films as vectors and computes dot products to predict ratings. A diffusion model generates images by repeatedly applying a learned linear operator and a noise schedule. Strip the marketing away from any deployed AI system and you find linear algebra at its base.

This chapter is a working linear-algebra reference for the rest of the book. It assumes you have seen vectors, matrices, and determinants once before, perhaps in a first-year calculus or engineering course, and brings you up to the level required to read papers on attention, optimisation, and dimensionality reduction without slipping. We pay particular attention to what a matrix does to a vector, the geometric picture, because that intuition is what survives when notation gets dense in later chapters. We also place serious weight on numerical and computational realities. A theoretically correct algorithm that is unstable on float32 hardware is no use; a cleanly-derived gradient that costs $O(n^4)$ is no use either. AI is the field where linear algebra meets very large numbers and very tight floating-point budgets.

A note on style. We use bold lower-case for vectors ($\mathbf{x}$, $\mathbf{w}$) and bold upper-case for matrices ($\mathbf{A}$, $\mathbf{W}$). All vectors are column vectors unless stated otherwise. Transposes are written $\mathbf{A}^\top$; inverses $\mathbf{A}^{-1}$; the identity matrix $\mathbf{I}_n$ when its size matters and just $\mathbf{I}$ otherwise. We use $\mathbb{R}^n$ for the real $n$-dimensional vector space and treat its elements interchangeably as ordered tuples, column matrices, and points in space.

In this chapter

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.