A handful of input-output pairs in the prompt steer a frozen model to a new task.
From the chapter: Chapter 15: Modern AI
Glossary: in context learning, few shot learning
Transcript
A pre-trained language model. Frozen weights. We never gradient-update again.
How does it learn a new task. By being shown examples in the prompt.
Task: classify movie reviews as positive or negative.
Zero-shot prompt: "Review: this film is a masterpiece. Sentiment:" The model guesses, sometimes right, sometimes not.
One-shot prompt: prepend one labelled example. "Review: I hated every minute. Sentiment: negative. Review: this film is a masterpiece. Sentiment:" Now the model is more reliable.
Few-shot prompt: prepend three or five labelled examples. The accuracy keeps climbing.
The model has not been retrained. Its weights are unchanged. The labelled examples sit in its context window, and somehow the forward pass uses them to set the right output for the test query.
What is happening inside. Researchers have shown that attention can implement a form of gradient descent on a small linear regression problem hidden in the prompt. The transformer is acting as a learning algorithm, taking gradient steps inside the activations.
In-context learning was one of the most surprising properties of GPT-3. It made foundation models suddenly useful for tasks no one had imagined when training. Prompt engineering, retrieval-augmented generation, and tool use are all built on the same trick.