Tree of Thoughts, Glossary, Textbook of AI

Tree of Thoughts (ToT), introduced by Yao et al. (2023) in "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", generalises chain-of-thought prompting from a linear sequence of thoughts to a tree, enabling search and back-tracking.

The four moves

A ToT system is defined by four LLM-implemented operations:

Thought decomposition , break the problem into intermediate steps.
Thought generator $G(p, s, k)$, given prompt $p$ and partial state $s$, propose $k$ next-thought candidates.
State evaluator $V(p, S)$, score each candidate state, e.g. with a value (1–10) or with a vote among candidates.
Search algorithm, BFS, DFS, beam search, or A*.

Pseudocode

def tot(problem, max_depth, k_branches, k_keep):
    frontier = [State(problem)]
    for depth in range(max_depth):
        candidates = []
        for s in frontier:
            for thought in propose(s, k_branches):
                candidates.append(s.extend(thought))
        scored = evaluate(candidates)
        frontier = top_k(scored, k_keep)
    return best(frontier)

Tasks demonstrated

Yao et al. evaluated ToT on:

Game of 24, make 24 from four numbers using +,−,×,÷. ToT-GPT4: 74% vs CoT-GPT4: 4%.
Creative writing, coherent 4-paragraph story with constraints.
5×5 mini-crosswords, far better than CoT.

Search variants

Search	When
BFS	Shallow trees, want global best
DFS	Deep trees, want first feasible
Beam	Quality/cost tradeoff
MCTS	When state evaluator is cheap, branching wide

Trade-offs

Pro:

Solves problems that pure CoT cannot.
Provides explicit deliberation ("type 2" thinking).

Con:

Cost: $k$ branches × depth × $k$-keep multiplies LLM calls. Game of 24 costs ~10× a CoT solution.
Evaluator quality is limiting, a wrong score derails the search.
Branching explosion for open-ended tasks.

Relationship

Chain-of-Thought is ToT with branching factor 1.
Self-reflection is ToT with depth 1 + critique.
Graph of Thoughts generalises ToT to arbitrary DAGs.
Reasoning models (o1, DeepSeek R1) are trained to do ToT-like search internally without explicit prompting.

Modern relevance

ToT prompting is rarely used directly in production , its cost is brutal. But its conceptual influence is enormous: the entire 2024–2025 generation of reasoning models is built on the insight that deliberate search over reasoning paths improves correctness. The search has moved from prompt-time to training-time.

Citation

Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023. arXiv:2305.10601.

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).