Graph of Thoughts (GoT) (Besta et al., ETH Zürich, 2023) generalises Tree of Thoughts by allowing thoughts to be combined, refined, and looped rather than only branched. The reasoning structure is a directed acyclic graph (DAG); thoughts are vertices, and edges represent dependencies between thoughts.
Why a graph
Tree of Thoughts cannot naturally express:
- Aggregation, merging insights from two sibling branches.
- Refinement, improving a single thought via self-reflection.
- Reuse, referencing a thought in multiple downstream branches.
GoT models all three by giving thoughts arbitrary in-/out-degree.
Operations
Besta et al. define five graph operations:
| Operation | Effect |
|---|---|
| Generate | Create $k$ child thoughts |
| Refine | Improve a thought in-place |
| Aggregate | Merge $n$ thoughts into one |
| Score | Evaluate a thought |
| KeepBest | Prune to top-$k$ |
A Graph of Operations (GoO) is a static schedule of these operations the developer designs for the task. The LLM executes each operation step.
Example: sorting
For sorting a list of 64 numbers (which GPT-4 cannot do reliably), Besta et al. structure the GoT as:
- Split the list into 4 chunks (Generate).
- Sort each chunk (Generate, parallel).
- Merge pairs of sorted chunks (Aggregate, twice).
- Score the result (Score).
- Refine if score is low (Refine).
Result: 70% accuracy on 64-number sorting vs 24% for CoT and 28% for ToT.
Other tasks
- Set intersection (62% reduction in cost vs ToT at same quality).
- Keyword counting in long documents.
- Document merging (multi-document summarisation).
Trade-offs
Pro:
- Strictly more general than ToT.
- Aggregation is the killer feature for tasks involving combination (merging, intersection, comparison).
Con:
- Complexity, designing the GoO requires task-specific engineering.
- Cost, same as ToT, multiplied.
- Less mainstream adoption than ToT.
Modern relevance
GoT is rarely used in production for the same reason ToT is, too expensive, but its insight that reasoning has graph structure informs modern thinking. Frameworks like LangGraph make graph-of-LLM-calls a first-class abstraction; AlphaProof's lemma-graph search is a GoT-style structure trained into the model.
Relationship
- Generalises Tree of Thoughts (a DAG with branching factor 1 inside paths).
- Conceptually related to self-reflection (the Refine operation).
- Influenced LangGraph and other graph-based agent frameworks (multi-agent orchestration).
Citation
Besta, M. et al. (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI 2024. arXiv:2308.09687.
Related terms: Tree of Thoughts, Chain-of-Thought, Self-Reflection, o1 / Reasoning Models
Discussed in:
- Chapter 15: Modern AI, Modern AI