Glossary

LlamaIndex

LlamaIndex (originally GPT Index, Jerry Liu, Nov 2022) was the first framework to take retrieval as its central problem rather than as a chain step. While LangChain optimised for general-purpose composability, LlamaIndex optimised for "how do I get my data into and out of an LLM".

Core data primitives

Primitive Role
Document A piece of source content (PDF, web page, row, etc.)
Node A chunked document fragment with metadata
Index A queryable structure over Nodes
Retriever Returns relevant Nodes for a query
Query Engine Retriever + synthesiser; answers questions
Agent LLM with Query Engines as tools

Index types (a unique LlamaIndex contribution)

  1. VectorStoreIndex, standard embedding-based retrieval (the RAG default).
  2. SummaryIndex, sequential summarisation for query-over-all.
  3. TreeIndex, hierarchical summary tree; navigates top-down.
  4. KeywordTableIndex, sparse keyword retrieval.
  5. PropertyGraphIndex, knowledge-graph extraction + retrieval.
  6. DocumentSummaryIndex, per-doc summaries, retrieve summary then drill in.

Quick example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

docs = SimpleDirectoryReader("./papers").load_data()
index = VectorStoreIndex.from_documents(docs)
qe = index.as_query_engine()
print(qe.query("What is the main claim of the ReAct paper?"))

Three lines from raw PDFs to a working RAG bot.

Strengths over LangChain

  • Better default chunking and ingestion, LlamaParse handles PDF tables, equations, images.
  • Focused on retrieval, more index types, better hybrid search support.
  • Stronger structured-data agents, SQL, pandas, graph DBs as first-class.

LlamaParse

A 2024 commercial offering: an LLM-powered document parser that turns complex PDFs (financial reports, research papers, scanned forms) into clean markdown with preserved tables and figures. A key competitive moat for production agentic RAG.

LlamaCloud

Managed RAG-as-a-service: hosted indexing, parsing, retrieval, observability.

Modern relevance

By 2025 LlamaIndex is the first-choice framework when retrieval is the central concern (enterprise document QA, technical documentation, legal/compliance). For agent loops without retrieval, LangChain/LangGraph or DSPy are typically chosen instead.

Related terms: LangChain, DSPy, Retrieval-Augmented Generation, Agentic RAG, Vector Database, Embeddings APIs

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).