Batch Normalisation, Glossary, Textbook of AI

Also known as: batch norm, BN

Batch normalisation (often "batch norm" or BN), introduced by Sergey Ioffe and Christian Szegedy in 2015, is a normalisation technique that, during training, normalises each layer's pre-activation output to have zero mean and unit variance across the current mini-batch, then applies a learnable affine transform. At inference time, the batch statistics are replaced by running averages computed during training.

Batch norm dramatically accelerates training of deep networks, enables higher learning rates, reduces sensitivity to initialisation, and provides a mild regularisation effect. The mechanism by which it works is now the subject of long-running debate, Ioffe and Szegedy's original explanation in terms of "internal covariate shift" has been challenged by Santurkar et al. (2018) and others, but the empirical benefits are uncontested.

Batch norm has well-known limitations. It performs poorly with very small batches (because batch statistics become noisy estimates of population statistics), couples examples within a batch in ways that complicate distributed training, and behaves differently at train and inference time. Several alternatives have been developed:

Layer normalisation (Ba, Kiros, Hinton 2016), normalises across features within each example. Standard in Transformers. Instance normalisation , normalises per channel per example. Used in style-transfer architectures. Group normalisation (Wu and He 2018), normalises across groups of channels. Robust to batch-size changes. RMSNorm, a simpler layer norm variant; standard in modern LLMs (LLaMA, Mistral).

Batch norm remains the default for CNNs trained at large batch size; layer norm and RMSNorm are standard for Transformers and language models.

Interactive

Batch normalisation centres and rescales activations. Subtract the batch mean, divide by the batch standard deviation, scale and shift back.

Video

Related terms: Layer Normalisation

Discussed in:

Chapter 9: Neural Networks, Training Neural Networks

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.