Glossary

Pooling

Pooling layers reduce the spatial dimensions of feature maps by summarising small neighbourhoods with a single statistic. The two most common variants are max pooling, which selects the maximum value within each region, and average pooling, which computes the mean. A typical configuration uses a 2×2 window with stride 2, which halves the spatial resolution and reduces the number of spatial positions by a factor of four.

Max pooling is the more widely used variant in classification networks. It retains the strongest activation in each local region, selecting the most prominent feature regardless of its exact location. This provides a degree of local translation invariance: small shifts that move a feature within the same pooling window do not change the output. This property is desirable for classification, where one cares what features are present but not precisely where.

In modern architectures, global average pooling (GAP) has largely replaced flattening plus dense layers before the final classifier. GAP averages each channel across the entire spatial extent, producing a vector of length equal to the number of channels. It eliminates many parameters and provides spatial regularisation. Pooling is not without controversy—some researchers argue it discards valuable spatial information and advocate replacing it with strided convolutions, which downsample while retaining learnable parameters. All-convolutional networks demonstrate pooling layers can be eliminated without loss of accuracy.

Related terms: Convolutional Neural Network, Convolution

Discussed in:

Also defined in: Textbook of AI