Transfer Learning, Glossary, Textbook of AI

Transfer Learning exploits the observation that features learned by a model trained on one task are often useful for related tasks. A network trained on a large, general-purpose dataset like ImageNet learns a hierarchy of features, edges and textures in early layers, object parts and semantic concepts in deeper layers, that transfer across a wide range of visual tasks. In practice, virtually all modern computer vision systems begin with a pretrained backbone rather than training from scratch.

The simplest form is feature extraction: remove the pretrained model's final classifier, freeze the remaining weights, and train a new classifier on top of the extracted features. This works well even with very small target datasets (dozens of examples per class). Fine-tuning goes further by unfreezing some or all of the pretrained layers and continuing to train them with a small learning rate on target data. A common strategy freezes early layers (which encode generic low-level features) and fine-tunes later layers (which encode task-specific high-level features).

Transfer learning has expanded beyond supervised ImageNet pretraining. Self-supervised pretraining methods like SimCLR, BYOL, DINO, and MoCo learn features from unlabelled images by solving pretext tasks. CLIP learns visual features aligned with natural language, enabling zero-shot transfer. In NLP, transfer learning is the dominant paradigm: BERT, GPT, and their successors are pretrained on huge text corpora and fine-tuned or prompted for specific tasks. Transfer learning has democratised state-of-the-art AI by making powerful pretrained models available to practitioners who could never train them from scratch.

Related terms: Fine-Tuning, Self-Supervised Learning, CLIP

Discussed in:

Chapter 11: CNNs, Transfer Learning

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.