Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, & Denny Zhou (2022)
International Conference on Learning Representations.
URL: https://arxiv.org/abs/2203.11171
Abstract. Introduces self-consistency, a decoding strategy for chain-of-thought reasoning. Sample $N$ chain-of-thought completions at non-zero temperature, extract the final answer from each, and take the mode (majority vote) over answers. The technique exploits the fact that there are typically many reasoning paths to the correct answer but few to any given wrong answer, so the marginal distribution over answers concentrates on truth as $N$ grows. Self-consistency reliably improves arithmetic and commonsense-reasoning benchmarks by 5-15 absolute percentage points over greedy CoT.
Tags: language-models reasoning chain-of-thought