Glossary

Vector Database

A vector database (vector DB, vectorstore) stores embedding vectors $\mathbf{v} \in \mathbb{R}^d$ (typically $d \in [256, 4096]$) alongside metadata, and answers queries of the form "return the $k$ vectors closest to query vector $\mathbf{q}$" in milliseconds even on hundreds of millions of items. They are the storage layer of RAG, agent memory, recommendation systems, and semantic search.

Why a special database

Exact $k$-nearest-neighbour search is $O(N d)$, fine for thousands of vectors, ruinous for billions. Vector DBs use approximate nearest neighbour (ANN) indexes that trade exactness for $O(\log N)$ query time at >95% recall.

ANN algorithms

Algorithm Idea Used by
HNSW (Hierarchical Navigable Small World) Multi-layer graph, greedy descent Most modern DBs
IVF-PQ (Inverted File + Product Quantisation) Cluster + compress FAISS, Milvus
DiskANN SSD-resident HNSW variant Microsoft, OSS
ScaNN Quantisation + pruning Google

Distance metrics

  • Cosine similarity: $\cos(\mathbf{a},\mathbf{b}) = \frac{\mathbf{a}\cdot\mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}$, standard for normalised text embeddings.
  • Euclidean $\|\mathbf{a}-\mathbf{b}\|_2$, image embeddings, some text models.
  • Dot product $\mathbf{a}\cdot\mathbf{b}$, fastest, equivalent to cosine when vectors are normalised.

Major systems (2025)

DB Type Key feature
Pinecone Managed SaaS Serverless, fastest TTM
Weaviate OSS / Cloud Hybrid search, modules ecosystem
Qdrant OSS / Cloud Rust core, payload filtering
Chroma OSS embedded Simplest local dev experience
Milvus OSS / Zilliz cloud Largest scale, GPU index
pgvector Postgres extension "Just use Postgres" winner
Vespa OSS Hybrid sparse+dense, ranking
MongoDB Atlas Vector Hosted Bolt-on for existing Mongo
OpenSearch / Elasticsearch OSS Lucene + vectors

Hybrid search

Production RAG rarely uses pure vector search. Hybrid search combines:

  1. Sparse retrieval, BM25 over keywords.
  2. Dense retrieval, vector ANN.
  3. Reciprocal Rank Fusion (RRF), merges the two ranked lists.
  4. Re-ranking, cross-encoder rescores top-100.

Hybrid + rerank typically beats either alone by 10–20 percentage points on retrieval-quality benchmarks.

Indexing pseudocode

import openai, qdrant_client

qd = qdrant_client.QdrantClient(":memory:")
qd.create_collection("docs", vectors_config={"size": 1536, "distance": "Cosine"})

for chunk in chunks:
    emb = openai.embeddings.create(model="text-embedding-3-small", input=chunk).data[0].embedding
    qd.upsert("docs", points=[{"id": uuid(), "vector": emb, "payload": {"text": chunk}}])

hits = qd.search("docs", query_vector=embed(query), limit=5)

The "is pgvector enough?" debate

By 2025 a common position is that pgvector inside Postgres is sufficient for ≤10M vectors. Specialised DBs justify themselves only at >100M vectors, multi-tenant SaaS, or when latency budgets are <50 ms.

Related terms: Retrieval-Augmented Generation, Agentic RAG, Embeddings APIs, Re-Ranking, Memory and Context Management

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).