FAISS, Glossary, Textbook of AI

FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings released by Johnson, Douze and Jégou (2017). It provides a unified API over a family of indices for nearest-neighbour search in high-dimensional vector spaces, scaling to billions of vectors on a single machine and supporting GPU acceleration. FAISS underpins production retrieval at Meta and is widely used in retrieval-augmented generation pipelines.

The retrieval problem. Given a query vector $\mathbf{q} \in \mathbb{R}^d$ and a database $\{\mathbf{x}_1, \ldots, \mathbf{x}_N\} \subset \mathbb{R}^d$, find the $k$ vectors with the smallest distance under either Euclidean ($\ell_2$) or inner-product metric:

$$\mathrm{kNN}(\mathbf{q}) = \mathop{\arg\min}_{S \subseteq [N],\, |S| = k}\; \sum_{i \in S} \|\mathbf{q} - \mathbf{x}_i\|_2^2.$$

Brute-force search costs $O(Nd)$ per query, which is infeasible when $N$ exceeds a few million.

Index families. FAISS exposes search structures as composable index types:

IndexFlatL2 / IndexFlatIP. Exact brute-force search; useful as a ground truth baseline.
IndexIVF. Inverted file with coarse quantisation. Run k-means with $n_{\text{list}}$ centroids; assign each database vector to its nearest centroid. At query time, search only the $n_{\text{probe}}$ closest centroids' inverted lists, reducing comparisons by a factor of roughly $n_{\text{list}} / n_{\text{probe}}$ at the cost of recall.
IndexHNSW. Hierarchical Navigable Small World graph (see HNSW), giving $O(\log N)$-style search times in practice with no need for retraining when items are added.
IndexPQ. Product quantisation. Split each vector into $m$ subvectors of dimension $d/m$; learn a separate $k$-means codebook of size $K$ for each subspace; encode each vector as $m$ bytes (one codebook index per subspace). Distances are approximated via lookup tables, giving up to $32\times$ memory compression.
IndexIVFPQ. Combines IVF coarse partitioning with PQ residual encoding. The de facto standard for billion-scale search; see IVF-PQ.

GPU support. Most index types have GPU equivalents (GpuIndexFlat, GpuIndexIVFPQ) that achieve order-of-magnitude speed-ups by parallelising the distance computations across thousands of cores.

Composition. FAISS indices follow a factory pattern: index_factory(d, "IVF4096,PQ32") produces an IVF-PQ index with 4096 lists and 32-byte codes. Pre-processing transforms (PCA, OPQ rotation) can be prepended to improve PQ accuracy.

Choosing an index.

Database size	Recommended index
$< 10^6$	`IndexFlatL2` (exact)
$10^6$–$10^8$	`IndexHNSWFlat` or `IVF` with reranking
$> 10^8$	`IVFPQ` or `IVF-OPQ-PQ`

FAISS pairs naturally with embedding models: encode documents with a sentence transformer, store the vectors in FAISS, and query with the same encoder for semantic search.

Video

Related terms: HNSW, IVF-PQ

Discussed in:

Chapter 14: Generative Models, Retrieval

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).