Glossary

Self-Reflection

Self-reflection (or self-critique, self-refinement) is a class of agent techniques where the model takes a second pass on its own output, identifies errors, and produces an improved version. The pattern exploits the empirical fact that LLMs are often better critics than generators, they can spot a flaw they would not have avoided on the first try.

Variants

Method Critique source Loop length
Self-Refine (Madaan et al. 2023) Same model, different prompt Iterative until convergence
Reflexion (Shinn et al. 2023) Same model + environment feedback Multi-trial with verbal "memory"
CRITIC (Gou et al. 2023) Same model + tools (search, calc) Iterative with tool-grounded critique
Constitutional AI (Anthropic 2022) Different model with principles Two-stage
LLM-as-Judge Different LLM scores output One-shot

Reflexion in detail

Shinn et al.'s Reflexion (2023) is the most cited self-reflection method. The agent:

  1. Attempts a task (ReAct trace).
  2. Receives binary feedback from the environment (success / failure).
  3. Verbally reflects on what went wrong, in natural language.
  4. Stores the reflection in memory.
  5. Retries the task, conditioned on past reflections.

The reflection plays the role of a policy gradient signal but is implemented in language instead of weights. On HumanEval, Reflexion + GPT-4 achieved 91% pass@1 vs 80% baseline; on AlfWorld, 97% vs 75%.

memory = []
for trial in range(N):
    trace, success = agent.run(task, reflections=memory)
    if success: break
    reflection = llm("What went wrong and what should I do differently?", trace)
    memory.append(reflection)

Self-Refine

Madaan et al.'s Self-Refine is a simpler iterative pattern with no environment:

output_0 = generate(prompt)
for i in range(N):
    feedback = critique(output_i)
    if "looks good" in feedback: break
    output_{i+1} = refine(output_i, feedback)

Improves quality on dialogue, math, code by 5–20% on GPT-4.

Why it works (and when it doesn't)

Reflection works best when:

  • The task has verifiable outcomes (test pass, math equality).
  • The model is strong enough to critique but not strong enough to one-shot.

It can fail or even hurt when:

  • The model's critique is itself wrong (sycophancy, confabulation).
  • There is no external grounding signal, repeated self-critique converges on plausible-sounding but wrong answers.
  • On simple tasks, the extra steps just add noise and cost.

Modern relevance

Reflection is now baked into:

  • Reasoning models, the long thinking traces of OpenAI o1, DeepSeek R1, and Claude's extended thinking are trained self-reflection.
  • Coding agents, Aider, OpenHands, Devin all reflect on test failures and retry.
  • Constitutional AI training, used by Anthropic to teach Claude harmlessness.

Citation

Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366.

Related terms: ReAct, Tree of Thoughts, o1 / Reasoning Models, Chain-of-Thought

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).