In-context learning (ICL) is the ability of large language models to perform new tasks from a few examples in their prompt, without any gradient updates to the model's parameters. The capability emerged at sufficient scale in GPT-3 (Brown et al., 2020) and has since become the foundation of how LLMs are deployed in practice.
Standard ICL prompting formats include: Zero-shot: just the task description and the input. "Translate to French: Hello." → "Bonjour." One-shot: one demonstration pair followed by the input. Few-shot: several demonstration pairs followed by the input.
The mechanism by which ICL works is poorly understood and an active research area. Empirically, ICL works better when demonstrations are: drawn from a coherent task distribution; formatted consistently; ordered by complexity (sometimes simplest first, sometimes hardest first); written by capable humans rather than algorithmic generators.
Theoretical frameworks proposing that ICL implements implicit gradient descent (Akyürek et al., 2022; von Oswald et al., 2022), Bayesian inference over latent task variables (Xie et al., 2021), or selection from a "library" of pre-trained algorithms (Olsson et al., 2022) have been advanced; the empirical reality probably involves all three plus additional mechanisms not yet characterised.
ICL contrasts with fine-tuning, where parameters are updated on task-specific data. Many tasks that previously required fine-tuning can now be handled by ICL, but fine-tuning still produces better performance on tasks with substantial available training data, and parameter-efficient fine-tuning methods (LoRA) substantially reduce the cost.
Interactive
Video
Related terms: GPT-3, Prompt Engineering
Discussed in:
- Chapter 15: Modern AI, Modern AI