Agentic RAG, Glossary, Textbook of AI

Agentic RAG is the modern evolution of RAG. Classical RAG is a fixed pipeline: every query goes embed → retrieve → rerank → generate. Agentic RAG treats retrieval as just another tool the agent can invoke, possibly multiple times, with rewritten queries, against multiple stores, and only when needed.

Why agentic

Classical RAG fails on three classes of question:

No-retrieval-needed queries ("What is 2+2?"), retrieval injects noise.
Multi-hop queries ("Who directed the film that won Best Picture in the year Titanic was released?"), one retrieval doesn't cover both hops.
Comparative queries ("Compare Apple's and Microsoft's Q3 2025 revenue"), needs separate retrievals per entity.

Agentic RAG handles all three by giving the agent control.

Patterns

Pattern	Behaviour
Decide-to-retrieve	LLM first chooses retrieve vs answer-from-memory
Query rewriting	LLM rewrites the user query into a retrieval-optimised form
Multi-hop retrieval	Loop: retrieve, reason, retrieve again, until enough
Multi-source routing	Agent picks among multiple indexes (FAQ, code, manuals)
Self-evaluation	Agent grades retrieved chunks; re-retrieves if poor
Hybrid agent + tool	Retrieval is one tool among many (code, search, calc)

Pseudocode (multi-hop)

history = []
for step in range(max_steps):
    action = llm(history, tools=[retrieve, finish])
    if action.name == "finish":
        return action.args["answer"]
    elif action.name == "retrieve":
        chunks = vector_db.search(action.args["query"], k=5)
        history.append(("retrieved", chunks))

Each iteration the LLM looks at what it has, decides if more is needed, and if so issues a refined query.

Notable architectures

Self-RAG (Asai et al. 2023), model emits special tokens ([Retrieve], [IsRel], [IsSup], [IsUse]) for self-evaluation.
CRAG (Yan et al. 2024), corrective RAG; if retrieval is bad, falls back to web search.
GraphRAG (Microsoft Research 2024), agent traverses a knowledge graph rather than chunk vectors.
HyDE (Hypothetical Document Embeddings, Gao et al. 2022), LLM hallucinates an answer, embeds that, retrieves real docs by similarity to the hypothetical.

Trade-offs

Concern	Classical RAG	Agentic RAG
Latency	1 retrieval, 1 LLM call	2–10 retrievals, 3–10 LLM calls
Cost	Low	3–10× higher
Quality on simple queries	Comparable	Comparable
Quality on complex queries	Poor	Strong
Determinism	High	Lower (agent loop)

Modern relevance

By 2025 most production "RAG" systems are agentic in some sense, at minimum query rewriting and conditional retrieval. Pure single-shot RAG is now considered a prototype-only pattern. Frameworks like LlamaIndex, LangChain/LangGraph, and Haystack all default to agentic flows.

Citation

Asai, A. et al. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR 2024. arXiv:2310.11511.

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).