Greedy Decoding, Glossary, Textbook of AI

Greedy decoding is the simplest decoding method for autoregressive sequence models. At each step:

$$x_{t+1} = \arg\max_w P(w | x_{1:t})$$

Take the most probable next token, append, repeat until end-of-sequence or maximum length.

Strengths:

Deterministic: same input → same output (modulo numerics).
Fast: one forward pass per token, no overhead.
Implementation simple: argmax over vocabulary.

Weaknesses:

Locally optimal but globally sub-optimal: high-probability next tokens can lead the sequence into low-probability regions. The optimal sequence often requires accepting a slightly lower-probability token to avoid getting stuck.
Repetition pathology: greedy decoding on language models often produces "the the the the" or "$x$ said $x$ said $x$ said" loops, because once a high-probability sequence pattern starts, every continuation reinforces it.
Boring outputs: high-probability text is by definition typical, lacking the variety humans expect.

Use cases:

Tasks with a single correct answer: classification, structured prediction, mathematical computation, code generation with verification.
Reproducibility: scientific evaluation requiring deterministic outputs.
Beam search width 1: equivalent.

Avoid for: open-ended creative generation, dialogue, text completion, where sampling-based methods (top-$p$, top-$k$) produce dramatically more natural and varied outputs.

In modern LLM APIs: setting temperature to 0 (or near-0) yields greedy decoding. This is the standard for "deterministic" inference modes.

Repetition penalty (Keskar et al. 2019): a soft fix for repetition without abandoning greedy decoding, divide the logits of recently-generated tokens by a penalty factor $\rho > 1$ before argmax. With $\rho = 1.1$ or so, sustained repetition is suppressed without sacrificing the determinism of greedy.

Argmax of softmax = argmax of logits: numerically, no need to compute softmax to do greedy decoding. Just the argmax of the logits is sufficient. Saves a small amount of compute and avoids floating-point issues.

Discussed in:

Chapter 12: Sequence Models, Sequence Models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).