From LeNet to ResNet: depth grows, accuracy follows, Textbook of AI

LeNet-5, AlexNet, VGG, GoogLeNet, ResNet. Each year deeper, with new tricks.

Glossary: lenet, alexnet, resnet, vgg

Transcript

LeNet-5. 1998. Yann LeCun. Two convolutional layers, two pooling layers, three fully-connected layers. Trained to read handwritten digits on cheques. Around sixty thousand parameters.

AlexNet. 2012. Krizhevsky, Sutskever, Hinton. Eight layers. Sixty million parameters. ReLU activations. Dropout for regularisation. Trained on two GPUs. Won ImageNet by ten percentage points and started the deep learning revolution.

VGG. 2014. Simonyan and Zisserman. Sixteen and nineteen layers, all using three-by-three convolutions stacked. The simplest architecture, the deepest yet, a hundred and forty million parameters.

GoogLeNet, also called Inception. 2014. Twenty-two layers, careful use of one-by-one convolutions to control parameter count. Multiple parallel filter sizes inside an inception module.

ResNet. 2015. Kaiming He. Up to a hundred and fifty-two layers. The breakthrough was the residual connection: each block computes a small adjustment added to the input. Gradients flow directly through the shortcut. Deep networks suddenly trainable.

Top-five error on ImageNet across the same scale: LeNet impossible, AlexNet eighteen percent, VGG seven percent, GoogLeNet six percent, ResNet three point six percent.

After ResNet, the trick spread everywhere. Transformers use residuals around every attention and MLP block. Diffusion U-Nets use them in every stage. The residual is now the default.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).