Structured outputs (also called JSON mode or strict mode) is the production-grade evolution of "please respond in JSON" prompt hacks. OpenAI shipped it in August 2024, Anthropic and Google in late 2024. Unlike best-effort JSON, it is mathematically guaranteed to validate against the schema.
Why guarantees matter
Pre-structured-output, developers wrote retry loops:
for _ in range(5):
out = llm(prompt + "Respond with JSON")
try:
data = json.loads(out)
schema.validate(data)
break
except (JSONDecodeError, ValidationError):
continue
Even GPT-4 failed schema validation roughly 5–15% of the time on complex schemas. In a multi-step agent, this compounds, a 90% per-call success rate gives 35% over 10 calls.
Mechanism
Structured outputs use constrained decoding. The schema is compiled into a finite-state machine (FSM) that tracks which tokens are legal at the current position:
- After
{only"(start of key) is legal. - After a known key like
"name":, only tokens that begin a string are legal. - After a
string-typed value's", only the schema'senumtokens (if any) are legal.
At each decoding step the runtime computes a logit mask: legal tokens keep their logit, illegal tokens are set to $-\infty$. The mask is applied before softmax, so sampled tokens are always schema-valid.
logits = model.forward(context)
mask = fsm.allowed_tokens(state)
logits[~mask] = -float("inf")
next_token = sample(softmax(logits))
Schema features supported
- JSON object, array, string, number, boolean, null
enum,constrequired,additionalProperties: falseoneOf,anyOf- Nested schemas, recursive references
Relationship to function calling
Function calling is structured outputs applied to a tool-invocation schema. The same FSM machinery enforces both. OpenAI's response_format: {type: "json_schema", strict: true} and tool_choice use a shared backend.
Caveats
- Strict mode can still produce semantically wrong values (a wrong city in a
"location": strfield). It guarantees syntax, not truth. - Heavy schemas can slightly hurt model quality because the constraint pruning sometimes excludes the model's preferred phrasing.
Open-source equivalents
- Outlines (.txt)
- Guidance (Microsoft)
- lm-format-enforcer
- JSON-Mode-PR
- xgrammar (high-throughput, used in vLLM)
Related terms: Constrained Decoding, Function Calling, Tool Use
Discussed in:
- Chapter 15: Modern AI, Tool Use