Glossary

Tree of Thoughts

Tree of Thoughts (ToT), introduced by Yao et al. (2023) in "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", generalises chain-of-thought prompting from a linear sequence of thoughts to a tree, enabling search and back-tracking.

The four moves

A ToT system is defined by four LLM-implemented operations:

  1. Thought decomposition , break the problem into intermediate steps.
  2. Thought generator $G(p, s, k)$, given prompt $p$ and partial state $s$, propose $k$ next-thought candidates.
  3. State evaluator $V(p, S)$, score each candidate state, e.g. with a value (1–10) or with a vote among candidates.
  4. Search algorithm, BFS, DFS, beam search, or A*.

Pseudocode

def tot(problem, max_depth, k_branches, k_keep):
    frontier = [State(problem)]
    for depth in range(max_depth):
        candidates = []
        for s in frontier:
            for thought in propose(s, k_branches):
                candidates.append(s.extend(thought))
        scored = evaluate(candidates)
        frontier = top_k(scored, k_keep)
    return best(frontier)

Tasks demonstrated

Yao et al. evaluated ToT on:

  • Game of 24, make 24 from four numbers using +,−,×,÷. ToT-GPT4: 74% vs CoT-GPT4: 4%.
  • Creative writing, coherent 4-paragraph story with constraints.
  • 5×5 mini-crosswords, far better than CoT.

Search variants

Search When
BFS Shallow trees, want global best
DFS Deep trees, want first feasible
Beam Quality/cost tradeoff
MCTS When state evaluator is cheap, branching wide

Trade-offs

Pro:

  • Solves problems that pure CoT cannot.
  • Provides explicit deliberation ("type 2" thinking).

Con:

  • Cost: $k$ branches × depth × $k$-keep multiplies LLM calls. Game of 24 costs ~10× a CoT solution.
  • Evaluator quality is limiting, a wrong score derails the search.
  • Branching explosion for open-ended tasks.

Relationship

Modern relevance

ToT prompting is rarely used directly in production , its cost is brutal. But its conceptual influence is enormous: the entire 2024–2025 generation of reasoning models is built on the insight that deliberate search over reasoning paths improves correctness. The search has moved from prompt-time to training-time.

Citation

Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023. arXiv:2305.10601.

Related terms: Chain-of-Thought, Graph of Thoughts, Self-Reflection, o1 / Reasoning Models, ReAct

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).