Residual Connection, Glossary, Textbook of AI

A Residual Connection (or skip connection) is an architectural shortcut that adds a layer's input directly to its output, so the layer learns a residual $F(\mathbf{x}) = H(\mathbf{x}) - \mathbf{x}$ rather than the full transformation $H(\mathbf{x})$. Introduced by He et al. in ResNet (2015), residual connections enabled training of networks with hundreds or even thousands of layers, far deeper than previously feasible, by providing a direct path for gradients to flow during backpropagation.

Before ResNet, practitioners encountered the degradation problem: beyond a certain depth, adding more layers actually increased training error, suggesting deeper networks were harder to optimise, not just more prone to overfitting. Residual connections addressed this directly. If the identity mapping is close to optimal, it is far easier for the network to push the residual $F(\mathbf{x})$ toward zero than to learn the identity through a stack of nonlinear layers. The identity path also provides a gradient highway: gradients can flow unchanged through the skip connection during backpropagation, dramatically mitigating the vanishing gradient problem.

Residual connections have become a ubiquitous design pattern. Every sub-layer in a Transformer is wrapped in a residual connection (followed by layer normalisation). U-Net segmentation uses long skip connections from encoder to decoder. DenseNet connects every layer to every subsequent layer within a block. Diffusion models' U-Net backbones rely on residual blocks. The key insight, letting information flow around transformations rather than through them, has proven one of the most important architectural innovations in deep learning.

Related terms: Vanishing Gradient, Backpropagation

Discussed in:

Chapter 9: Neural Networks, Network Architectures
Chapter 3: Calculus, The Chain Rule

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.