Glossary

Agentic RAG

Agentic RAG is the modern evolution of RAG. Classical RAG is a fixed pipeline: every query goes embed → retrieve → rerank → generate. Agentic RAG treats retrieval as just another tool the agent can invoke, possibly multiple times, with rewritten queries, against multiple stores, and only when needed.

Why agentic

Classical RAG fails on three classes of question:

  1. No-retrieval-needed queries ("What is 2+2?"), retrieval injects noise.
  2. Multi-hop queries ("Who directed the film that won Best Picture in the year Titanic was released?"), one retrieval doesn't cover both hops.
  3. Comparative queries ("Compare Apple's and Microsoft's Q3 2025 revenue"), needs separate retrievals per entity.

Agentic RAG handles all three by giving the agent control.

Patterns

Pattern Behaviour
Decide-to-retrieve LLM first chooses retrieve vs answer-from-memory
Query rewriting LLM rewrites the user query into a retrieval-optimised form
Multi-hop retrieval Loop: retrieve, reason, retrieve again, until enough
Multi-source routing Agent picks among multiple indexes (FAQ, code, manuals)
Self-evaluation Agent grades retrieved chunks; re-retrieves if poor
Hybrid agent + tool Retrieval is one tool among many (code, search, calc)

Pseudocode (multi-hop)

history = []
for step in range(max_steps):
    action = llm(history, tools=[retrieve, finish])
    if action.name == "finish":
        return action.args["answer"]
    elif action.name == "retrieve":
        chunks = vector_db.search(action.args["query"], k=5)
        history.append(("retrieved", chunks))

Each iteration the LLM looks at what it has, decides if more is needed, and if so issues a refined query.

Notable architectures

  • Self-RAG (Asai et al. 2023), model emits special tokens ([Retrieve], [IsRel], [IsSup], [IsUse]) for self-evaluation.
  • CRAG (Yan et al. 2024), corrective RAG; if retrieval is bad, falls back to web search.
  • GraphRAG (Microsoft Research 2024), agent traverses a knowledge graph rather than chunk vectors.
  • HyDE (Hypothetical Document Embeddings, Gao et al. 2022), LLM hallucinates an answer, embeds that, retrieves real docs by similarity to the hypothetical.

Trade-offs

Concern Classical RAG Agentic RAG
Latency 1 retrieval, 1 LLM call 2–10 retrievals, 3–10 LLM calls
Cost Low 3–10× higher
Quality on simple queries Comparable Comparable
Quality on complex queries Poor Strong
Determinism High Lower (agent loop)

Modern relevance

By 2025 most production "RAG" systems are agentic in some sense, at minimum query rewriting and conditional retrieval. Pure single-shot RAG is now considered a prototype-only pattern. Frameworks like LlamaIndex, LangChain/LangGraph, and Haystack all default to agentic flows.

Citation

Asai, A. et al. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR 2024. arXiv:2310.11511.

Related terms: Retrieval-Augmented Generation, Vector Database, Re-Ranking, Tool Use, ReAct, Self-Reflection

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).