Self-reflection (or self-critique, self-refinement) is a class of agent techniques where the model takes a second pass on its own output, identifies errors, and produces an improved version. The pattern exploits the empirical fact that LLMs are often better critics than generators, they can spot a flaw they would not have avoided on the first try.
Variants
| Method | Critique source | Loop length |
|---|---|---|
| Self-Refine (Madaan et al. 2023) | Same model, different prompt | Iterative until convergence |
| Reflexion (Shinn et al. 2023) | Same model + environment feedback | Multi-trial with verbal "memory" |
| CRITIC (Gou et al. 2023) | Same model + tools (search, calc) | Iterative with tool-grounded critique |
| Constitutional AI (Anthropic 2022) | Different model with principles | Two-stage |
| LLM-as-Judge | Different LLM scores output | One-shot |
Reflexion in detail
Shinn et al.'s Reflexion (2023) is the most cited self-reflection method. The agent:
- Attempts a task (ReAct trace).
- Receives binary feedback from the environment (success / failure).
- Verbally reflects on what went wrong, in natural language.
- Stores the reflection in memory.
- Retries the task, conditioned on past reflections.
The reflection plays the role of a policy gradient signal but is implemented in language instead of weights. On HumanEval, Reflexion + GPT-4 achieved 91% pass@1 vs 80% baseline; on AlfWorld, 97% vs 75%.
memory = []
for trial in range(N):
trace, success = agent.run(task, reflections=memory)
if success: break
reflection = llm("What went wrong and what should I do differently?", trace)
memory.append(reflection)
Self-Refine
Madaan et al.'s Self-Refine is a simpler iterative pattern with no environment:
output_0 = generate(prompt)
for i in range(N):
feedback = critique(output_i)
if "looks good" in feedback: break
output_{i+1} = refine(output_i, feedback)
Improves quality on dialogue, math, code by 5–20% on GPT-4.
Why it works (and when it doesn't)
Reflection works best when:
- The task has verifiable outcomes (test pass, math equality).
- The model is strong enough to critique but not strong enough to one-shot.
It can fail or even hurt when:
- The model's critique is itself wrong (sycophancy, confabulation).
- There is no external grounding signal, repeated self-critique converges on plausible-sounding but wrong answers.
- On simple tasks, the extra steps just add noise and cost.
Modern relevance
Reflection is now baked into:
- Reasoning models, the long thinking traces of OpenAI o1, DeepSeek R1, and Claude's extended thinking are trained self-reflection.
- Coding agents, Aider, OpenHands, Devin all reflect on test failures and retry.
- Constitutional AI training, used by Anthropic to teach Claude harmlessness.
Citation
Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366.
Related terms: ReAct, Tree of Thoughts, o1 / Reasoning Models, Chain-of-Thought
Discussed in:
- Chapter 15: Modern AI, Modern AI