Vector Database, Glossary, Textbook of AI

A vector database (vector DB, vectorstore) stores embedding vectors $\mathbf{v} \in \mathbb{R}^d$ (typically $d \in [256, 4096]$) alongside metadata, and answers queries of the form "return the $k$ vectors closest to query vector $\mathbf{q}$" in milliseconds even on hundreds of millions of items. They are the storage layer of RAG, agent memory, recommendation systems, and semantic search.

Why a special database

Exact $k$-nearest-neighbour search is $O(N d)$, fine for thousands of vectors, ruinous for billions. Vector DBs use approximate nearest neighbour (ANN) indexes that trade exactness for $O(\log N)$ query time at >95% recall.

ANN algorithms

Algorithm	Idea	Used by
HNSW (Hierarchical Navigable Small World)	Multi-layer graph, greedy descent	Most modern DBs
IVF-PQ (Inverted File + Product Quantisation)	Cluster + compress	FAISS, Milvus
DiskANN	SSD-resident HNSW variant	Microsoft, OSS
ScaNN	Quantisation + pruning	Google

Distance metrics

Cosine similarity: $\cos(\mathbf{a},\mathbf{b}) = \frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}$, standard for normalised text embeddings.
Euclidean $\|\mathbf{a}-\mathbf{b}\|_2$, image embeddings, some text models.
Dot product $\mathbf{a}\cdot\mathbf{b}$, fastest, equivalent to cosine when vectors are normalised.

Major systems (2025)

DB	Type	Key feature
Pinecone	Managed SaaS	Serverless, fastest TTM
Weaviate	OSS / Cloud	Hybrid search, modules ecosystem
Qdrant	OSS / Cloud	Rust core, payload filtering
Chroma	OSS embedded	Simplest local dev experience
Milvus	OSS / Zilliz cloud	Largest scale, GPU index
pgvector	Postgres extension	"Just use Postgres" winner
Vespa	OSS	Hybrid sparse+dense, ranking
MongoDB Atlas Vector	Hosted	Bolt-on for existing Mongo
OpenSearch / Elasticsearch	OSS	Lucene + vectors

Hybrid search

Production RAG rarely uses pure vector search. Hybrid search combines:

Sparse retrieval, BM25 over keywords.
Dense retrieval, vector ANN.
Reciprocal Rank Fusion (RRF), merges the two ranked lists.
Re-ranking, cross-encoder rescores top-100.

Hybrid + rerank typically beats either alone by 10–20 percentage points on retrieval-quality benchmarks.

Indexing pseudocode

import openai, qdrant_client

qd = qdrant_client.QdrantClient(":memory:")
qd.create_collection("docs", vectors_config={"size": 1536, "distance": "Cosine"})

for chunk in chunks:
    emb = openai.embeddings.create(model="text-embedding-3-small", input=chunk).data[0].embedding
    qd.upsert("docs", points=[{"id": uuid(), "vector": emb, "payload": {"text": chunk}}])

hits = qd.search("docs", query_vector=embed(query), limit=5)

The "is pgvector enough?" debate

By 2025 a common position is that pgvector inside Postgres is sufficient for ≤10M vectors. Specialised DBs justify themselves only at >100M vectors, multi-tenant SaaS, or when latency budgets are <50 ms.

Discussed in:

Chapter 15: Modern AI, Modern AI

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).