An induction head is a specific pattern of two interacting attention heads in a Transformer, identified by Anthropic's interpretability team (Olsson et al. 2022 In-Context Learning and Induction Heads) as the simplest mechanism by which Transformers do in-context learning.
The pattern: two attention heads in adjacent layers cooperate to implement copy-from-earlier-context behaviour:
Layer L head 1 (the previous-token head): attends one token back. Each position $t$ writes information about position $t-1$ into the residual stream.
Layer L+1 head 2 (the induction head): given the current token $A$, attends to past positions where the previous token was also $A$ (using the layer-1 information), copies whatever followed that previous occurrence. So after seeing the pattern "AB ... A", it predicts the next token will be "B".
The mechanism implements pattern completion: when the model sees a repeated context, it predicts the continuation that followed the same context earlier in the input.
Why this matters:
Mechanistic clarity: induction heads are one of the cleanest examples of a complex behaviour (in-context learning) reducing to a specific identifiable circuit. The two heads can be inspected, their attention patterns visualised, their causal role verified by ablation.
Phase transition: induction heads form during training at a specific point, often visible as a sharp drop in loss, coinciding with the emergence of in-context learning capability. Olsson et al. demonstrated this in a 2-layer attention-only Transformer.
Generalisation to large models: induction-head-like patterns appear in production LLMs (analyses by Olsson, Conmy, Wang, Conerly and others). Larger models develop more sophisticated induction-like heads that handle abstraction, fuzzy matching, and multi-step copying.
Implications:
- In-context learning is at least partly explained by mechanistic copying.
- Capabilities like few-shot pattern completion may be largely induction-driven.
- More complex in-context learning (analogical reasoning, abstract patterns) likely involves additional circuits, induction heads are necessary but not sufficient.
Connections:
- Modern Hopfield networks (Ramsauer et al. 2020) showed attention is mathematically equivalent to associative memory retrieval, suggesting induction heads are a form of content-addressable copying.
- Mesa-optimisation debate: some researchers argue induction heads are a primitive form of mesa-optimisation; others argue they're better described as pattern matching without explicit objectives.
Limitations of the picture: not every form of in-context learning is induction-head-driven. Tasks requiring abstraction, multi-step reasoning, or generation of novel responses involve different circuits, only partially understood.
Video
Related terms: Mechanistic Interpretability, In-Context Learning, christopher-olah, Attention Mechanism
Discussed in:
- Chapter 16: Ethics & Safety, AI Safety