Chapter Fourteen

Generative Models

Learning Objectives
  1. Distinguish explicit-density from implicit-density generative models and place every major architecture in a single taxonomy
  2. Derive the evidence lower bound, the reparameterisation trick and the closed-form Gaussian KL underpinning the variational autoencoder
  3. Derive the GAN minimax objective, prove the optimal discriminator, and connect the equilibrium to the Jensen–Shannon divergence
  4. Derive both the forward closed-form marginal and the simplified noise-prediction loss of a denoising diffusion probabilistic model
  5. Implement a working VAE on MNIST and a from-scratch DDPM with a U-Net noise predictor in PyTorch
  6. Explain classifier-free guidance, DDIM, latent diffusion and score-based SDEs as variations on a single theme
  7. Compare generative families on tractability of likelihood, sample quality, mode coverage, sampling cost and training stability

A discriminative model takes a picture and answers the question "what is this?" A generative model takes the answer "cat" and constructs a new picture that has never existed. The first compresses information; the second expands it. The first asks for a label; the second asks for a plausible world.

Generative modelling is the most ambitious task in machine learning. To generate convincingly we must learn, explicitly or implicitly, the entire distribution that the data was drawn from, every correlation, every constraint, every mode. A discriminative model can ignore everything irrelevant to the label. A generative model is allowed to ignore nothing. As Richard Feynman put it on his blackboard: what I cannot create, I do not understand.

This chapter develops the modern theory and practice of generative models. We move from a taxonomy of approaches through the variational autoencoder, the generative adversarial network, normalising flows, energy-based models, and finally the diffusion family that has reshaped the field since 2020. The treatment is mathematical: we derive the evidence lower bound from first principles, prove the optimality of the JS-divergence-minimising GAN discriminator, write down the closed-form forward marginal of a DDPM, and show why classifier-free guidance does what it does. The treatment is also practical: we implement a VAE on MNIST and train a denoising diffusion model from scratch in PyTorch.

In this chapter

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.