Self-consistency (Wang et al. 2022) is a simple but effective technique for improving LLM reasoning quality. Rather than greedy-decoding a single chain-of-thought (CoT), sample $N$ different CoT traces with temperature and take the majority answer across them.
Algorithm:
- Prompt the model with a question + CoT instruction.
- Sample $N$ CoT traces with temperature $T > 0$ (typical $N = 20$ to $100$, $T = 0.7$).
- Extract the final answer from each trace.
- Return the most common answer (mode of the $N$ answers).
Why this works:
- Diverse reasoning paths: with temperature, different runs explore different solution strategies. Errors tend to be path-specific while correct answers can often be reached via multiple paths.
- Reduces variance: a wrong answer requires a flawed reasoning path; majority voting suppresses outliers.
- Free quality improvement: no additional training required; just inference-time compute.
Empirical results: on grade-school math (GSM8K), self-consistency with $N = 40$ improved a 540B PaLM from ~57% to ~75% accuracy in the original paper. Similar gains across many reasoning benchmarks.
Variants:
Universal self-consistency (Chen et al. 2023): rather than majority voting on extracted answers, prompt the model to read all $N$ traces and select the most consistent. Useful when answers are open-ended.
Weighted self-consistency: weight votes by the (model-estimated) confidence of each trace.
Self-consistency with verification: use a separate verifier model to score each trace; weight or filter votes accordingly.
Connections:
Self-consistency is one of several test-time compute techniques that trade inference cost for quality. Others include best-of-$N$ (sample $N$, score, pick best), tree-of-thoughts (search tree of candidate continuations), reasoning models (o1, DeepSeek-R1) that internalise extended CoT during training.
DeepSeek-R1 and OpenAI o1 went further: train models with reinforcement learning to internalise the multi-trace reasoning, so single-sample inference matches or exceeds self-consistency-style ensembling. The lineage from CoT prompting → self-consistency → reasoning models is the dominant trajectory of LLM reasoning capability development from 2022 to 2025.
Related terms: Chain-of-Thought, Test-Time Compute Scaling, o1 / Reasoning Models
Discussed in:
- Chapter 15: Modern AI, Modern AI