Structured Outputs, Glossary, Textbook of AI

Structured outputs (also called JSON mode or strict mode) is the production-grade evolution of "please respond in JSON" prompt hacks. OpenAI shipped it in August 2024, Anthropic and Google in late 2024. Unlike best-effort JSON, it is mathematically guaranteed to validate against the schema.

Why guarantees matter

Pre-structured-output, developers wrote retry loops:

for _ in range(5):
    out = llm(prompt + "Respond with JSON")
    try:
        data = json.loads(out)
        schema.validate(data)
        break
    except (JSONDecodeError, ValidationError):
        continue

Even GPT-4 failed schema validation roughly 5–15% of the time on complex schemas. In a multi-step agent, this compounds, a 90% per-call success rate gives 35% over 10 calls.

Mechanism

Structured outputs use constrained decoding. The schema is compiled into a finite-state machine (FSM) that tracks which tokens are legal at the current position:

After { only " (start of key) is legal.
After a known key like "name":, only tokens that begin a string are legal.
After a string-typed value's ", only the schema's enum tokens (if any) are legal.

At each decoding step the runtime computes a logit mask: legal tokens keep their logit, illegal tokens are set to $-\infty$. The mask is applied before softmax, so sampled tokens are always schema-valid.

logits = model.forward(context)
mask = fsm.allowed_tokens(state)
logits[~mask] = -float("inf")
next_token = sample(softmax(logits))

Schema features supported

JSON object, array, string, number, boolean, null
enum, const
required, additionalProperties: false
oneOf, anyOf
Nested schemas, recursive references

Relationship to function calling

Function calling is structured outputs applied to a tool-invocation schema. The same FSM machinery enforces both. OpenAI's response_format: {type: "json_schema", strict: true} and tool_choice use a shared backend.

Caveats

Strict mode can still produce semantically wrong values (a wrong city in a "location": str field). It guarantees syntax, not truth.
Heavy schemas can slightly hurt model quality because the constraint pruning sometimes excludes the model's preferred phrasing.

Open-source equivalents

Outlines (.txt)
Guidance (Microsoft)
lm-format-enforcer
JSON-Mode-PR
xgrammar (high-throughput, used in vLLM)

Related terms: Constrained Decoding, Function Calling, Tool Use

Discussed in:

Chapter 15: Modern AI, Tool Use

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).