Glossary

Dropout

Dropout, introduced by Srivastava et al. in 2014, is a powerful regularisation technique for neural networks. During each training forward pass, each neuron in a designated layer is independently "dropped"—set to zero—with probability $p$ (commonly 0.5 for hidden layers, 0.1–0.3 for input layers). Surviving activations are scaled by $1/(1-p)$ so the expected value remains unchanged (inverted dropout). At test time, no neurons are dropped; the full network is used.

Dropout forces the network to develop redundant, distributed representations that are robust to the loss of individual units. No single neuron can be relied upon, so the network must learn features that are useful in many combinations. This can be understood as training an exponentially large ensemble of sub-networks that share weights: each mini-batch sees a different random sub-network, and the final model is an implicit average.

Dropout is particularly effective in fully connected layers with many parameters. It is less commonly used in modern convolutional architectures, where batch normalisation provides much of the regularisation, and in transformers it is typically applied at moderate rates (0.1) inside attention and feed-forward sub-layers. Variants include DropConnect (dropping weights rather than activations), Spatial Dropout (dropping entire feature maps in CNNs), and Variational Dropout (using the same mask across time steps in RNNs).

Related terms: Regularisation, Overfitting, Ensemble Methods

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI