FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings released by Johnson, Douze and Jégou (2017). It provides a unified API over a family of indices for nearest-neighbour search in high-dimensional vector spaces, scaling to billions of vectors on a single machine and supporting GPU acceleration. FAISS underpins production retrieval at Meta and is widely used in retrieval-augmented generation pipelines.
The retrieval problem. Given a query vector $\mathbf{q} \in \mathbb{R}^d$ and a database $\{\mathbf{x}_1, \ldots, \mathbf{x}_N\} \subset \mathbb{R}^d$, find the $k$ vectors with the smallest distance under either Euclidean ($\ell_2$) or inner-product metric:
$$\mathrm{kNN}(\mathbf{q}) = \mathop{\arg\min}_{S \subseteq [N],\, |S| = k}\; \sum_{i \in S} \|\mathbf{q} - \mathbf{x}_i\|_2^2.$$
Brute-force search costs $O(Nd)$ per query, which is infeasible when $N$ exceeds a few million.
Index families. FAISS exposes search structures as composable index types:
IndexFlatL2/IndexFlatIP. Exact brute-force search; useful as a ground truth baseline.IndexIVF. Inverted file with coarse quantisation. Run k-means with $n_{\text{list}}$ centroids; assign each database vector to its nearest centroid. At query time, search only the $n_{\text{probe}}$ closest centroids' inverted lists, reducing comparisons by a factor of roughly $n_{\text{list}} / n_{\text{probe}}$ at the cost of recall.IndexHNSW. Hierarchical Navigable Small World graph (see HNSW), giving $O(\log N)$-style search times in practice with no need for retraining when items are added.IndexPQ. Product quantisation. Split each vector into $m$ subvectors of dimension $d/m$; learn a separate $k$-means codebook of size $K$ for each subspace; encode each vector as $m$ bytes (one codebook index per subspace). Distances are approximated via lookup tables, giving up to $32\times$ memory compression.IndexIVFPQ. Combines IVF coarse partitioning with PQ residual encoding. The de facto standard for billion-scale search; see IVF-PQ.
GPU support. Most index types have GPU equivalents (GpuIndexFlat, GpuIndexIVFPQ) that achieve order-of-magnitude speed-ups by parallelising the distance computations across thousands of cores.
Composition. FAISS indices follow a factory pattern: index_factory(d, "IVF4096,PQ32") produces an IVF-PQ index with 4096 lists and 32-byte codes. Pre-processing transforms (PCA, OPQ rotation) can be prepended to improve PQ accuracy.
Choosing an index.
| Database size | Recommended index |
|---|---|
| $< 10^6$ | IndexFlatL2 (exact) |
| $10^6$–$10^8$ | IndexHNSWFlat or IVF with reranking |
| $> 10^8$ | IVFPQ or IVF-OPQ-PQ |
FAISS pairs naturally with embedding models: encode documents with a sentence transformer, store the vectors in FAISS, and query with the same encoder for semantic search.
Video
Discussed in:
- Chapter 14: Generative Models, Retrieval