Latent space interpolation in a VAE, Textbook of AI

Walk a straight line between two latent codes and the decoded image morphs smoothly.

From the chapter: Chapter 14: Generative Models

Glossary: variational autoencoder, vae

Transcript

A variational autoencoder learns a smooth latent space.

The encoder takes an image and outputs a Gaussian distribution over latent codes: a mean and a variance.

We sample a latent vector. The decoder turns it back into an image.

Train so that reconstructions are good and the latent distributions stay close to a standard Gaussian.

Now take two real images. Encode each to its mean latent. Walk a straight line through the latent space, from the first mean to the second.

Decode each point along the line. The first reconstruction is image one. As we step along, the decoded image morphs gently. The middle is a smooth blend that looks like a plausible third image, never seen during training. The last reconstruction is image two.

This smoothness is what the variational training enforces. The Gaussian prior says: nearby latents must decode to similar images. The reconstruction loss says: the encoder's mean for an image must decode back to that image.

Together they organise the latent space into something continuous. Images are not isolated points in pixel space, they are inhabitants of a smooth manifold.

GANs and diffusion models make sharper images, but VAEs were the first to make latent-space interpolation a routine demonstration.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).