Self-Reflection, Glossary, Textbook of AI

Self-reflection (or self-critique, self-refinement) is a class of agent techniques where the model takes a second pass on its own output, identifies errors, and produces an improved version. The pattern exploits the empirical fact that LLMs are often better critics than generators, they can spot a flaw they would not have avoided on the first try.

Variants

Method	Critique source	Loop length
Self-Refine (Madaan et al. 2023)	Same model, different prompt	Iterative until convergence
Reflexion (Shinn et al. 2023)	Same model + environment feedback	Multi-trial with verbal "memory"
CRITIC (Gou et al. 2023)	Same model + tools (search, calc)	Iterative with tool-grounded critique
Constitutional AI (Anthropic 2022)	Different model with principles	Two-stage
LLM-as-Judge	Different LLM scores output	One-shot

Reflexion in detail

Shinn et al.'s Reflexion (2023) is the most cited self-reflection method. The agent:

Attempts a task (ReAct trace).
Receives binary feedback from the environment (success / failure).
Verbally reflects on what went wrong, in natural language.
Stores the reflection in memory.
Retries the task, conditioned on past reflections.

The reflection plays the role of a policy gradient signal but is implemented in language instead of weights. On HumanEval, Reflexion + GPT-4 achieved 91% pass@1 vs 80% baseline; on AlfWorld, 97% vs 75%.

memory = []
for trial in range(N):
    trace, success = agent.run(task, reflections=memory)
    if success: break
    reflection = llm("What went wrong and what should I do differently?", trace)
    memory.append(reflection)

Self-Refine

Madaan et al.'s Self-Refine is a simpler iterative pattern with no environment:

output_0 = generate(prompt)
for i in range(N):
    feedback = critique(output_i)
    if "looks good" in feedback: break
    output_{i+1} = refine(output_i, feedback)

Improves quality on dialogue, math, code by 5–20% on GPT-4.

Why it works (and when it doesn't)

Reflection works best when:

The task has verifiable outcomes (test pass, math equality).
The model is strong enough to critique but not strong enough to one-shot.

It can fail or even hurt when:

The model's critique is itself wrong (sycophancy, confabulation).
There is no external grounding signal, repeated self-critique converges on plausible-sounding but wrong answers.
On simple tasks, the extra steps just add noise and cost.

Modern relevance

Reflection is now baked into:

Reasoning models, the long thinking traces of OpenAI o1, DeepSeek R1, and Claude's extended thinking are trained self-reflection.
Coding agents, Aider, OpenHands, Devin all reflect on test failures and retry.
Constitutional AI training, used by Anthropic to teach Claude harmlessness.

Citation

Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366.

Related terms: ReAct, Tree of Thoughts, o1 / Reasoning Models, Chain-of-Thought

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).