Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, & Yejin Choi (2019)
arXiv.
DOI: https://doi.org/10.48550/arxiv.1904.09751
Abstract. Diagnoses the 'beam search curse' of repetitive and degenerate text generation and introduces nucleus (top-p) sampling, which samples from the smallest set of tokens whose cumulative probability exceeds a threshold, producing markedly more natural text.
Tags: language-models decoding sampling