Induction Head, Glossary, Textbook of AI

An induction head is a specific pattern of two interacting attention heads in a Transformer, identified by Anthropic's interpretability team (Olsson et al. 2022 In-Context Learning and Induction Heads) as the simplest mechanism by which Transformers do in-context learning.

The pattern: two attention heads in adjacent layers cooperate to implement copy-from-earlier-context behaviour:

Layer L head 1 (the previous-token head): attends one token back. Each position $t$ writes information about position $t-1$ into the residual stream.

Layer L+1 head 2 (the induction head): given the current token $A$, attends to past positions where the previous token was also $A$ (using the layer-1 information), copies whatever followed that previous occurrence. So after seeing the pattern "AB ... A", it predicts the next token will be "B".

The mechanism implements pattern completion: when the model sees a repeated context, it predicts the continuation that followed the same context earlier in the input.

Why this matters:

Mechanistic clarity: induction heads are one of the cleanest examples of a complex behaviour (in-context learning) reducing to a specific identifiable circuit. The two heads can be inspected, their attention patterns visualised, their causal role verified by ablation.

Phase transition: induction heads form during training at a specific point, often visible as a sharp drop in loss, coinciding with the emergence of in-context learning capability. Olsson et al. demonstrated this in a 2-layer attention-only Transformer.

Generalisation to large models: induction-head-like patterns appear in production LLMs (analyses by Olsson, Conmy, Wang, Conerly and others). Larger models develop more sophisticated induction-like heads that handle abstraction, fuzzy matching, and multi-step copying.

Implications:

In-context learning is at least partly explained by mechanistic copying.
Capabilities like few-shot pattern completion may be largely induction-driven.
More complex in-context learning (analogical reasoning, abstract patterns) likely involves additional circuits, induction heads are necessary but not sufficient.

Connections:

Modern Hopfield networks (Ramsauer et al. 2020) showed attention is mathematically equivalent to associative memory retrieval, suggesting induction heads are a form of content-addressable copying.
Mesa-optimisation debate: some researchers argue induction heads are a primitive form of mesa-optimisation; others argue they're better described as pattern matching without explicit objectives.

Limitations of the picture: not every form of in-context learning is induction-head-driven. Tasks requiring abstraction, multi-step reasoning, or generation of novel responses involve different circuits, only partially understood.

Video

Related terms: Mechanistic Interpretability, In-Context Learning, christopher-olah, Attention Mechanism

Discussed in:

Chapter 16: Ethics & Safety, AI Safety

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).