Glossary

Transfer Learning (NLP)

In NLP, Transfer Learning refers to the dominant modern paradigm of pretraining a large model on generic text (via self-supervised objectives like masked or next-token prediction) and then fine-tuning or prompting it for specific downstream tasks. This approach, which became standard after BERT's 2018 debut, has transformed NLP by dramatically reducing the amount of task-specific labelled data required and by establishing a single set of foundation models that serve hundreds of applications.

The paradigm has several flavours. Full fine-tuning updates all model parameters on task-specific data. Parameter-efficient fine-tuning (PEFT) methods like LoRA, adapters, and prefix tuning update only a small fraction of parameters, enabling efficient specialisation. In-context learning bypasses fine-tuning entirely by providing task demonstrations in the prompt. Instruction tuning is a general form of fine-tuning that teaches a base model to follow natural-language instructions, enabling zero-shot generalisation to unseen tasks.

The transfer paradigm's success rests on the observation that language has deep structural regularities that a large pretrained model captures effectively. Syntax, semantics, world knowledge, and reasoning patterns all reside, in compressed form, in the parameters of a sufficiently large language model trained on a sufficiently diverse corpus. Fine-tuning on thousands of examples can efficiently specialise such a model for classification, question answering, summarisation, or translation—tasks that once required dedicated architectures and millions of labelled examples. Transfer learning in NLP has been perhaps the single most important force behind the AI boom of the early 2020s.

Related terms: Fine-Tuning, BERT, Large Language Model, Self-Supervised Learning

Discussed in:

Also defined in: Textbook of AI