MNIST, Fashion-MNIST, CIFAR-10/100, Glossary, Textbook of AI

MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 are four small-scale image datasets that have shaped pedagogy and methodology in deep learning for three decades. They are now too easy to be meaningful state-of-the-art benchmarks but remain ubiquitous as didactic tools, debugging baselines and ablation testbeds.

MNIST

The Modified National Institute of Standards and Technology dataset (LeCun, Cortes & Burges, http://yann.lecun.com/exdb/mnist/, 1998) is the canonical handwritten-digit dataset. 70,000 28×28 grayscale images of digits 0-9 (60k train + 10k test), assembled from NIST Special Database 1 (high-school students) and Special Database 3 (Census Bureau employees), normalised to fit a 20×20 box and centre-of-mass-aligned in a 28×28 frame.

MNIST trained the LeCun et al. 1989 convolutional network and the LeNet-5 of 1998, the first convolutional network to be deployed in production, reading roughly 10% of US bank cheques by the late 1990s. The MNIST learning curve was the default deep-learning didactic target through the early 2010s. State-of-the-art is now 0.13% test error (committee of CNNs, 2018), well past the noise floor of ambiguous digits.

Fashion-MNIST

Released by Zalando Research (Xiao, Rasul & Vollgraf, arXiv:1708.07747, 2017) as a drop-in MNIST replacement: 70,000 28×28 grayscale images of clothing items in 10 classes (T-shirt, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, ankle boot). Same scale as MNIST but substantially harder; SOTA is approximately 96.9% accuracy, leaving genuine room for method comparison. Adopted as the modern entry-level deep-learning benchmark.

CIFAR-10 and CIFAR-100

The Canadian Institute for Advanced Research datasets (Krizhevsky 2009 technical report) are subsets of the 80-million-tiny-images corpus assembled by Antonio Torralba at MIT.

CIFAR-10, 60,000 32×32 colour images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). 50k train + 10k test.
CIFAR-100, 60,000 32×32 colour images in 100 classes grouped into 20 superclasses.

CIFAR-10 SOTA is approximately 99.6% (ViT-large + heavy augmentation, 2024); CIFAR-100 SOTA is around 96.1%. The 32×32 resolution makes CIFAR ideal for academic compute budgets, a ResNet-18 trains in a few minutes on a consumer GPU, and it remains the standard benchmark for new training tricks (mixup, CutMix, randaugment), regularisers, and loss functions.

80-million-tiny-images withdrawal

The 80 Million Tiny Images parent corpus from which CIFAR was carved was withdrawn in 2020 after Birhane and Prabhu (Large image datasets: A pyrrhic win for computer vision?) showed it contained racist and misogynistic labels propagated from the WordNet 2008 vocabulary. CIFAR-10 and CIFAR-100, with their 110 hand-curated classes, do not inherit those problems and remain in active use.

Modern relevance

These datasets are not relevant to frontier-scale training but are essential for course teaching, fast iteration on optimiser and architecture proposals, gradient-flow debugging, and reproducible ablations that would be prohibitive on ImageNet. Almost every deep-learning textbook still teaches CNNs on MNIST and ResNets on CIFAR-10.

Related terms: ImageNet, MS COCO

Discussed in:

Chapter 9: Neural Networks, Computer Vision
Chapter 7: Supervised Learning, Deep Learning

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).