Score matching learns the gradient of log density, Textbook of AI

The score points uphill on the data density; matching it lets you sample by stochastic ascent.

From the chapter: Chapter 14: Generative Models

Glossary: score matching, score based model

Transcript

We want to model a distribution over images. Direct probability density estimation in millions of dimensions is hopeless.

Score matching changes the goal. Instead of the density itself, learn the gradient of its logarithm. The score function. A vector at every point telling us which way the density is increasing.

Train a neural network to output this score. Given a noisy image, predict the gradient that would push it back toward higher data density.

The training objective. Add Gaussian noise to a real sample. Have the network predict minus the noise direction, scaled. This is implicit denoising.

At inference, walk uphill on the score. Start from random noise. Take a small step in the direction the network predicts. Add a touch of fresh noise for stochasticity. Repeat.

Over many steps, the trajectory drifts toward regions of high data density. Pure noise becomes a recognisable image.

Score-based models and diffusion models turn out to be the same thing, viewed from two angles. The diffusion view emphasises noise schedules; the score view emphasises density gradients. The mathematics is unified.

Both views give the same training loss and the same sampling procedure. Both produce the photorealistic images of modern generation.

The score is the internal compass; the noise schedule is the path.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).