Modern AI: 15.11   Chain-of-thought

Dr Chris Paton

15.11 Chain-of-thought

Chain-of-thought (CoT) prompting (Wei et al. 2022) is one of those discoveries that, in retrospect, seems trivial. If you prompt the model with a few examples that show step-by-step reasoning before the final answer, the model produces step-by-step reasoning before its own final answer, and accuracy goes up dramatically on reasoning-heavy tasks. The original paper showed that adding "Let's think step by step", Kojima et al.'s zero-shot CoT 2022, was almost as good.

The mechanism is that producing more tokens lets the model spend more compute, and the intermediate tokens act as a scratchpad in which it can do steps it cannot do in a single forward pass. There is a literature on whether the CoT is "real" reasoning or post-hoc rationalisation. The answer is some of both: small perturbations of the CoT can change the final answer, but ablations also show that an unfaithful CoT can still produce a correct answer, so the CoT is not always load-bearing.

From CoT prompting to internalised CoT

Modern reasoning models, o1, o3, R1, Claude with extended thinking, Gemini Deep Think, internalise CoT. They are trained (via the GRPO-style verifiable-reward recipe of section 15.7) to produce CoT by default, often very long CoTs (10 000+ tokens), without needing to be prompted to do so. The CoT is sometimes hidden from the user (o1, where it is summarised), sometimes shown (R1, Claude extended thinking).

Internalised CoT is not a separate technique but a consequence of training. The base behaviour produced by RL on verifiable rewards is "think a lot, check your work, then answer", and the user-facing prompting is largely unnecessary.

CoT faithfulness

A worry: does the CoT reflect what the model is actually doing internally? Turpin et al. (2023) showed that biasing the few-shot examples can systematically change a model's CoT and final answer in ways the CoT does not acknowledge. The CoT is a story, not a trace.

Mechanistic interpretability research is trying to ground this: work on attention patterns in CoT (Lanham et al., 2023), on tracing causal contributions from CoT tokens to final-answer logits, and on training models to be "more faithful" via objectives that penalise post-hoc rationalisation. As of April 2026, faithfulness is an open problem.