Also known as: SSL
Self-Supervised Learning (SSL) is a paradigm in which a model generates its own supervisory signal from the structure of unlabelled data, sidestepping the need for human-annotated labels. The canonical example is next-token prediction in language modelling: given a sentence, the "label" is simply the next word, which is freely available in any text corpus. The model is trained as if it were supervised, but the supervision comes for free.
Self-supervised pretraining followed by task-specific fine-tuning has become the dominant recipe across modern AI. In NLP, BERT uses masked language modelling (predicting randomly masked tokens) and GPT uses next-token prediction; both produce representations that transfer to a wide range of downstream tasks. In computer vision, methods like SimCLR, BYOL, MoCo, and DINO learn visual features by solving pretext tasks such as predicting whether two augmented views come from the same image. CLIP takes a multimodal approach, training image and text encoders jointly to align image-caption pairs.
The power of self-supervised learning is that it unlocks training on effectively unlimited data. The internet contains trillions of words and billions of images, all unlabelled. By extracting supervision from the data itself, SSL has enabled the scaling laws that produced modern foundation models, and it has collapsed the distinction between "unsupervised" and "supervised" learning in practice.
Related terms: Large Language Model, BERT, CLIP, Unsupervised Learning
Discussed in:
- Chapter 1: What Is AI? — Machine Learning Overview
Also defined in: Textbook of AI