Glossary

o1 / Reasoning Models

A reasoning model is a class of large language model trained to use extended chain-of-thought internal reasoning before producing its final answer. The model is fine-tuned, typically via reinforcement learning, to generate long internal "thinking" traces, sometimes thousands of tokens, that explore the problem, consider alternatives, check work, and only then commit to a final response. The internal trace is typically hidden from the user but consumes compute at inference time.

The first major reasoning model release was OpenAI's o1 (originally code-named "Strawberry") in September 2024. DeepSeek-R1 (January 2025) released a comparable open-weights reasoning model with full training details, demonstrating that the technique was reproducible. Claude 3.7 introduced "extended thinking" mode in early 2025; OpenAI's o3 improved on o1 in late 2024; Gemini 2.0 Flash Thinking added reasoning to Google's family.

The training innovation that drove the reasoning-model wave is reinforcement learning on verifiable rewards, using mathematical correctness, code execution, or formal-proof verification as the reward signal rather than human-preference proxies. This avoids the reward-hacking failures of pure RLHF and produces models that reliably reason through problems whose solutions can be checked.

Reasoning models have substantially advanced the state of the art on mathematics (AIME, USAMO competition problems), competitive programming (Codeforces), formal mathematics (Lean) and scientific question answering. They have also raised new questions about AI safety: extended chain-of-thought provides a window into model reasoning, but may also produce alignment-relevant "thinking" that the user does not see. Inference-time scaling, spending more compute per query, has emerged as a new axis of model capability orthogonal to training-time scaling.

Related terms: Chain-of-Thought, RLHF, Reinforcement Learning

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).