Chain-of-thought prompting, Textbook of AI

Asking a model to think step by step before answering improves accuracy on multi-step problems.

From the chapter: Chapter 15: Modern AI

Glossary: chain of thought, in context learning

Transcript

Ask a language model: "Roger has five tennis balls. He buys two cans of three balls each. How many balls does he have."

Without prompting, the model answers, "eight". Wrong.

Add the phrase "let's think step by step" to the prompt and watch what happens.

The model writes: "Roger starts with five balls. He buys two cans. Each can has three. Two times three is six. Five plus six is eleven. The answer is eleven."

Correct.

Chain-of-thought prompting unlocks a latent capability: the model has the arithmetic and reasoning steps inside it, but only when given room to lay them out token by token.

Why does it work. Because each token is a chance to compute. With a forced single-token answer, the model must collapse everything into one forward pass. Given dozens of tokens of scratch space, it can route through many forward passes, each conditioning on the previous step.

The same trick works on math word problems, on code, on logical puzzles. On a benchmark called GSM8K, chain-of-thought roughly doubled accuracy for large models.

It is the simplest example of inference-time compute. More tokens of reasoning yield better answers, no model change required.

This insight is why o1 and DeepSeek-R1 hide enormous chain-of-thought traces inside a final summary. Reasoning is computation, and tokens are its substrate.

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).