Tree of Thoughts (ToT), introduced by Yao et al. (2023) in "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", generalises chain-of-thought prompting from a linear sequence of thoughts to a tree, enabling search and back-tracking.
The four moves
A ToT system is defined by four LLM-implemented operations:
- Thought decomposition , break the problem into intermediate steps.
- Thought generator $G(p, s, k)$, given prompt $p$ and partial state $s$, propose $k$ next-thought candidates.
- State evaluator $V(p, S)$, score each candidate state, e.g. with a value (1–10) or with a vote among candidates.
- Search algorithm, BFS, DFS, beam search, or A*.
Pseudocode
def tot(problem, max_depth, k_branches, k_keep):
frontier = [State(problem)]
for depth in range(max_depth):
candidates = []
for s in frontier:
for thought in propose(s, k_branches):
candidates.append(s.extend(thought))
scored = evaluate(candidates)
frontier = top_k(scored, k_keep)
return best(frontier)
Tasks demonstrated
Yao et al. evaluated ToT on:
- Game of 24, make 24 from four numbers using +,−,×,÷. ToT-GPT4: 74% vs CoT-GPT4: 4%.
- Creative writing, coherent 4-paragraph story with constraints.
- 5×5 mini-crosswords, far better than CoT.
Search variants
| Search | When |
|---|---|
| BFS | Shallow trees, want global best |
| DFS | Deep trees, want first feasible |
| Beam | Quality/cost tradeoff |
| MCTS | When state evaluator is cheap, branching wide |
Trade-offs
Pro:
- Solves problems that pure CoT cannot.
- Provides explicit deliberation ("type 2" thinking).
Con:
- Cost: $k$ branches × depth × $k$-keep multiplies LLM calls. Game of 24 costs ~10× a CoT solution.
- Evaluator quality is limiting, a wrong score derails the search.
- Branching explosion for open-ended tasks.
Relationship
- Chain-of-Thought is ToT with branching factor 1.
- Self-reflection is ToT with depth 1 + critique.
- Graph of Thoughts generalises ToT to arbitrary DAGs.
- Reasoning models (o1, DeepSeek R1) are trained to do ToT-like search internally without explicit prompting.
Modern relevance
ToT prompting is rarely used directly in production , its cost is brutal. But its conceptual influence is enormous: the entire 2024–2025 generation of reasoning models is built on the insight that deliberate search over reasoning paths improves correctness. The search has moved from prompt-time to training-time.
Citation
Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023. arXiv:2305.10601.
Related terms: Chain-of-Thought, Graph of Thoughts, Self-Reflection, o1 / Reasoning Models, ReAct
Discussed in:
- Chapter 15: Modern AI, Modern AI