Agentic RAG is the modern evolution of RAG. Classical RAG is a fixed pipeline: every query goes embed → retrieve → rerank → generate. Agentic RAG treats retrieval as just another tool the agent can invoke, possibly multiple times, with rewritten queries, against multiple stores, and only when needed.
Why agentic
Classical RAG fails on three classes of question:
- No-retrieval-needed queries ("What is 2+2?"), retrieval injects noise.
- Multi-hop queries ("Who directed the film that won Best Picture in the year Titanic was released?"), one retrieval doesn't cover both hops.
- Comparative queries ("Compare Apple's and Microsoft's Q3 2025 revenue"), needs separate retrievals per entity.
Agentic RAG handles all three by giving the agent control.
Patterns
| Pattern | Behaviour |
|---|---|
| Decide-to-retrieve | LLM first chooses retrieve vs answer-from-memory |
| Query rewriting | LLM rewrites the user query into a retrieval-optimised form |
| Multi-hop retrieval | Loop: retrieve, reason, retrieve again, until enough |
| Multi-source routing | Agent picks among multiple indexes (FAQ, code, manuals) |
| Self-evaluation | Agent grades retrieved chunks; re-retrieves if poor |
| Hybrid agent + tool | Retrieval is one tool among many (code, search, calc) |
Pseudocode (multi-hop)
history = []
for step in range(max_steps):
action = llm(history, tools=[retrieve, finish])
if action.name == "finish":
return action.args["answer"]
elif action.name == "retrieve":
chunks = vector_db.search(action.args["query"], k=5)
history.append(("retrieved", chunks))
Each iteration the LLM looks at what it has, decides if more is needed, and if so issues a refined query.
Notable architectures
- Self-RAG (Asai et al. 2023), model emits special tokens (
[Retrieve],[IsRel],[IsSup],[IsUse]) for self-evaluation. - CRAG (Yan et al. 2024), corrective RAG; if retrieval is bad, falls back to web search.
- GraphRAG (Microsoft Research 2024), agent traverses a knowledge graph rather than chunk vectors.
- HyDE (Hypothetical Document Embeddings, Gao et al. 2022), LLM hallucinates an answer, embeds that, retrieves real docs by similarity to the hypothetical.
Trade-offs
| Concern | Classical RAG | Agentic RAG |
|---|---|---|
| Latency | 1 retrieval, 1 LLM call | 2–10 retrievals, 3–10 LLM calls |
| Cost | Low | 3–10× higher |
| Quality on simple queries | Comparable | Comparable |
| Quality on complex queries | Poor | Strong |
| Determinism | High | Lower (agent loop) |
Modern relevance
By 2025 most production "RAG" systems are agentic in some sense, at minimum query rewriting and conditional retrieval. Pure single-shot RAG is now considered a prototype-only pattern. Frameworks like LlamaIndex, LangChain/LangGraph, and Haystack all default to agentic flows.
Citation
Asai, A. et al. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR 2024. arXiv:2310.11511.
Related terms: Retrieval-Augmented Generation, Vector Database, Re-Ranking, Tool Use, ReAct, Self-Reflection
Discussed in:
- Chapter 15: Modern AI, Modern AI