Glossary

Convolutional Neural Network

Also known as: CNN, ConvNet

A Convolutional Neural Network (CNN) is a neural network architecture specialised for data with spatial or grid-like structure, most notably images. Inspired by the hierarchical organisation of the mammalian visual cortex, CNNs introduce two powerful inductive biases: local connectivity (each output depends only on a small region of the input) and weight sharing (the same filter is applied at every spatial position). These biases encode translation equivariance and dramatically reduce the number of parameters compared to fully connected networks.

A typical CNN alternates convolutional layers (which detect local patterns), pooling layers (which downsample and provide local translation invariance), and nonlinear activations (typically ReLU). Deeper layers compose simple features into progressively more abstract ones: the first layer might detect edges, the next layer textures, the next layer object parts, and the final layers whole object categories. A classification head (global average pooling plus a dense layer) produces the final output.

Landmark CNN architectures trace the field's evolution: LeNet-5 (1998) demonstrated practical CNNs for handwritten digit recognition. AlexNet (2012) exploded onto the scene by winning ImageNet with a deep GPU-trained CNN using ReLU and dropout. VGG (2014) showed that depth with small 3×3 kernels works well. GoogLeNet introduced the inception module. ResNet (2015) added residual connections that enabled training of networks with hundreds of layers. EfficientNet used neural architecture search and compound scaling. Modern CNNs remain foundational for computer vision, often combined with attention in hybrid architectures.

Related terms: Convolution, Pooling, Deep Learning

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI