Glossary

Self-Consistency

Self-consistency (Wang et al. 2022) is a simple but effective technique for improving LLM reasoning quality. Rather than greedy-decoding a single chain-of-thought (CoT), sample $N$ different CoT traces with temperature and take the majority answer across them.

Algorithm:

  1. Prompt the model with a question + CoT instruction.
  2. Sample $N$ CoT traces with temperature $T > 0$ (typical $N = 20$ to $100$, $T = 0.7$).
  3. Extract the final answer from each trace.
  4. Return the most common answer (mode of the $N$ answers).

Why this works:

  • Diverse reasoning paths: with temperature, different runs explore different solution strategies. Errors tend to be path-specific while correct answers can often be reached via multiple paths.
  • Reduces variance: a wrong answer requires a flawed reasoning path; majority voting suppresses outliers.
  • Free quality improvement: no additional training required; just inference-time compute.

Empirical results: on grade-school math (GSM8K), self-consistency with $N = 40$ improved a 540B PaLM from ~57% to ~75% accuracy in the original paper. Similar gains across many reasoning benchmarks.

Variants:

Universal self-consistency (Chen et al. 2023): rather than majority voting on extracted answers, prompt the model to read all $N$ traces and select the most consistent. Useful when answers are open-ended.

Weighted self-consistency: weight votes by the (model-estimated) confidence of each trace.

Self-consistency with verification: use a separate verifier model to score each trace; weight or filter votes accordingly.

Connections:

Self-consistency is one of several test-time compute techniques that trade inference cost for quality. Others include best-of-$N$ (sample $N$, score, pick best), tree-of-thoughts (search tree of candidate continuations), reasoning models (o1, DeepSeek-R1) that internalise extended CoT during training.

DeepSeek-R1 and OpenAI o1 went further: train models with reinforcement learning to internalise the multi-trace reasoning, so single-sample inference matches or exceeds self-consistency-style ensembling. The lineage from CoT prompting → self-consistency → reasoning models is the dominant trajectory of LLM reasoning capability development from 2022 to 2025.

Related terms: Chain-of-Thought, Test-Time Compute Scaling, o1 / Reasoning Models

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).