Stable Diffusion, Glossary, Textbook of AI

Stable Diffusion is the open-source text-to-image diffusion model released by Stability AI in August 2022 in collaboration with Robin Rombach's CompVis group at LMU Munich. It is a latent diffusion model: rather than running diffusion in pixel space (computationally expensive at high resolution), Stable Diffusion runs diffusion in a learned compressed latent space produced by a VAE encoder, then decodes the result. The compute and memory savings made high-resolution image generation feasible on consumer GPUs.

The August 2022 release of Stable Diffusion under a permissive licence, with full weights, code and architecture documentation, was transformative. Within weeks an enormous ecosystem of open-source tools had emerged: AUTOMATIC1111's web UI, ComfyUI's node-based workflow editor, ControlNet for spatial conditioning, LoRA-based custom-character training, the Civitai model- sharing community, and many others. The release substantially democratised generative AI in a way that closed-API systems (DALL-E 2, Midjourney) did not.

Successive versions: Stable Diffusion 1.x (August 2022), the original release, ~1 billion parameters. Stable Diffusion 2.x (November 2022), improved architecture and training, but with more restrictive content filtering that hurt adoption. Stable Diffusion XL (SDXL) (July 2023), substantially larger and higher-quality. Stable Diffusion 3 (February 2024), Multimodal Diffusion Transformer (MMDiT) architecture; mixed reception over licensing changes.

The lineage from Stable Diffusion has produced a continuing stream of high-quality open-source image and video generation models, Flux (Black Forest Labs, the original Stable Diffusion team after departing Stability AI), AuraFlow, HunyuanDiT and many others.

Video

Related terms: Diffusion Model, CLIP

Discussed in:

Chapter 14: Generative Models, Generative Models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).