Glossary

Generative Adversarial Network

Also known as: GAN

A Generative Adversarial Network (GAN), introduced by Goodfellow et al. in 2014, recasts generative modelling as a two-player minimax game. A generator network $G$ takes a random noise vector $\mathbf{z}$ and produces a synthetic sample $G(\mathbf{z})$. A discriminator network $D$ receives either a real sample or a fake one from $G$ and must classify it as real or fake. The generator is trained to fool the discriminator; the discriminator is trained to distinguish them. At the Nash equilibrium of this game, $G$ produces samples indistinguishable from real data and $D$ outputs 0.5 everywhere.

The original GAN objective is $\min_G \max_D \mathbb{E}{x \sim p{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))]$. In practice, GAN training is notoriously unstable. If the discriminator becomes too strong, its gradients to the generator vanish. If the generator outpaces it, mode collapse can occur—the generator produces only a small subset of the data distribution. Numerous innovations tamed these issues: DCGAN (architectural guidelines), Wasserstein GAN (smoother loss), Spectral Normalisation (Lipschitz constraint), Progressive Growing (training at increasing resolutions), StyleGAN (disentangled latent space).

StyleGAN and its successors represented the high-water mark of GAN image synthesis, producing photorealistic face images that fooled human observers. Conditional GANs extend the framework with additional inputs: Pix2pix does paired image-to-image translation, CycleGAN does unpaired translation. GANs have been largely supplanted by diffusion models for image generation due to the latter's training stability and mode coverage, but GANs retain advantages in generation speed (single forward pass versus many denoising steps) and continue to influence generative modelling research.

Related terms: Diffusion Model, Variational Autoencoder

Discussed in:

Also defined in: Textbook of AI