A vector database (vector DB, vectorstore) stores embedding vectors $\mathbf{v} \in \mathbb{R}^d$ (typically $d \in [256, 4096]$) alongside metadata, and answers queries of the form "return the $k$ vectors closest to query vector $\mathbf{q}$" in milliseconds even on hundreds of millions of items. They are the storage layer of RAG, agent memory, recommendation systems, and semantic search.
Why a special database
Exact $k$-nearest-neighbour search is $O(N d)$, fine for thousands of vectors, ruinous for billions. Vector DBs use approximate nearest neighbour (ANN) indexes that trade exactness for $O(\log N)$ query time at >95% recall.
ANN algorithms
| Algorithm | Idea | Used by |
|---|---|---|
| HNSW (Hierarchical Navigable Small World) | Multi-layer graph, greedy descent | Most modern DBs |
| IVF-PQ (Inverted File + Product Quantisation) | Cluster + compress | FAISS, Milvus |
| DiskANN | SSD-resident HNSW variant | Microsoft, OSS |
| ScaNN | Quantisation + pruning |
Distance metrics
- Cosine similarity: $\cos(\mathbf{a},\mathbf{b}) = \frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}$, standard for normalised text embeddings.
- Euclidean $\|\mathbf{a}-\mathbf{b}\|_2$, image embeddings, some text models.
- Dot product $\mathbf{a}\cdot\mathbf{b}$, fastest, equivalent to cosine when vectors are normalised.
Major systems (2025)
| DB | Type | Key feature |
|---|---|---|
| Pinecone | Managed SaaS | Serverless, fastest TTM |
| Weaviate | OSS / Cloud | Hybrid search, modules ecosystem |
| Qdrant | OSS / Cloud | Rust core, payload filtering |
| Chroma | OSS embedded | Simplest local dev experience |
| Milvus | OSS / Zilliz cloud | Largest scale, GPU index |
| pgvector | Postgres extension | "Just use Postgres" winner |
| Vespa | OSS | Hybrid sparse+dense, ranking |
| MongoDB Atlas Vector | Hosted | Bolt-on for existing Mongo |
| OpenSearch / Elasticsearch | OSS | Lucene + vectors |
Hybrid search
Production RAG rarely uses pure vector search. Hybrid search combines:
- Sparse retrieval, BM25 over keywords.
- Dense retrieval, vector ANN.
- Reciprocal Rank Fusion (RRF), merges the two ranked lists.
- Re-ranking, cross-encoder rescores top-100.
Hybrid + rerank typically beats either alone by 10–20 percentage points on retrieval-quality benchmarks.
Indexing pseudocode
import openai, qdrant_client
qd = qdrant_client.QdrantClient(":memory:")
qd.create_collection("docs", vectors_config={"size": 1536, "distance": "Cosine"})
for chunk in chunks:
emb = openai.embeddings.create(model="text-embedding-3-small", input=chunk).data[0].embedding
qd.upsert("docs", points=[{"id": uuid(), "vector": emb, "payload": {"text": chunk}}])
hits = qd.search("docs", query_vector=embed(query), limit=5)
The "is pgvector enough?" debate
By 2025 a common position is that pgvector inside Postgres is sufficient for ≤10M vectors. Specialised DBs justify themselves only at >100M vectors, multi-tenant SaaS, or when latency budgets are <50 ms.
Related terms: Retrieval-Augmented Generation, Agentic RAG, Embeddings APIs, Re-Ranking, Memory and Context Management
Discussed in:
- Chapter 15: Modern AI, Modern AI