Glossary

o1's Hidden Chain of Thought

o1's hidden chain of thought is OpenAI's design decision, announced with the o1-preview release in September 2024, to hide the model's internal reasoning tokens from API users and chat consumers. Users see only a summary of the reasoning plus the final answer; the actual chain-of-thought tokens are billed for, but never shown. The decision was, and remains, controversial, and it sets up the contrast with visible-thinking-tokens in claude-4 and DeepSeek-R1.

OpenAI's stated rationale combines three concerns. Safety: showing raw reasoning could leak unsafe intermediate steps that a final-answer filter would catch. Honesty research: the reasoning chain is meant as a workspace for the model to reason "as it likes", potentially admitting mistakes or considering bad options; surfacing this would create pressure to perform reasoning rather than to think, defeating the purpose. Competitive advantage / IP: the reasoning traces are the most valuable training-data product OpenAI has, and exposing them would enable competitors to distil o1 into open-source models trivially.

The third reason is the load-bearing one. Distillation from a reasoning model is straightforward: pose problems to the teacher, harvest the reasoning chains, fine-tune a student on (problem → reasoning → answer). DeepSeek-R1's release demonstrated this concretely, once R1 made its reasoning visible, multiple labs shipped distilled small models in days. By hiding the chain, OpenAI denied competitors that direct distillation route, requiring them to retrain RL pipelines from scratch.

The costs of the decision are several. Trust and verifiability: users cannot inspect why o1 reached a conclusion, which matters in legal, medical, and scientific contexts. Debuggability: when o1 is wrong, there is no mid-chain artifact to diagnose. Auditability: red-teamers can only test inputs and outputs, not the reasoning workspace. Pricing transparency: users are billed for hidden tokens with limited visibility into what they bought.

OpenAI partially mitigated these concerns by providing a reasoning summary, a short, post-hoc paraphrase of the chain, and by exposing the reasoning token count for billing. The summaries are generated by a separate model and are deliberately less detailed than the underlying chain. They are sufficient for most product use cases but not for adversarial evaluation.

The decision has had competitive ripple effects. Anthropic took the opposite tack with claude-4 extended thinking, exposing the full reasoning chain by default; DeepSeek released R1's full chain-of-thought outputs; Google's Gemini 2.5 Thinking exposes them. The market has thus split between "transparent thinking" (Anthropic, DeepSeek, Google) and "opaque thinking" (OpenAI), and developer preference data appears to favour the transparent camp on most axes except raw frontier capability.

The hidden-chain debate also touches on the philosophical question of what a reasoning model's chain of thought is. If the chain is just a tool the model uses to compute a better answer, hiding it is fine. If the chain is part of what the user is paying for, a window onto the model's thought process for the user's own learning or verification, hiding it is a major reduction in product value. Different use cases land on different sides of this question, which is why the market split is likely to persist.

Related terms: Chain-of-Thought, o1 / Reasoning Models, OpenAI o3, Claude 4 Family, DeepSeek R1-Zero, Visible vs Hidden Thinking Tokens, Process Supervision

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).