IVF-PQ, Glossary, Textbook of AI

IVF-PQ is the workhorse approximate-nearest-neighbour index for billion-scale vector retrieval, combining two complementary techniques: an inverted file (IVF) for coarse partitioning and product quantisation (PQ) for compressed distance approximation. It is exposed in FAISS as IndexIVFPQ.

Coarse quantisation (IVF). Run k-means on a sample of the database with $n_{\text{list}}$ centroids $\{\mathbf{c}_1, \ldots, \mathbf{c}_{n_{\text{list}}}\}$. Each database vector $\mathbf{x}$ is assigned to its nearest centroid, and the residual $\mathbf{r} = \mathbf{x} - \mathbf{c}_{q(\mathbf{x})}$ is stored rather than $\mathbf{x}$ itself. At query time only the $n_{\text{probe}}$ centroids closest to $\mathbf{q}$ are examined, pruning the search space by roughly $n_{\text{list}} / n_{\text{probe}}$.

Product quantisation (PQ). To compress the residuals, split each $d$-dimensional residual into $m$ subvectors of dimension $d/m$:

$$\mathbf{r} = (\mathbf{r}^{(1)}, \mathbf{r}^{(2)}, \ldots, \mathbf{r}^{(m)}), \quad \mathbf{r}^{(j)} \in \mathbb{R}^{d/m}.$$

For each subspace $j$ run k-means with $K$ centroids (typically $K = 256$ so each code fits in one byte), giving codebooks $\mathcal{C}^{(j)} = \{\mathbf{c}^{(j)}_1, \ldots, \mathbf{c}^{(j)}_K\}$. Encode $\mathbf{r}$ as $m$ codebook indices

$$(i_1, i_2, \ldots, i_m), \quad i_j = \arg\min_k \|\mathbf{r}^{(j)} - \mathbf{c}^{(j)}_k\|^2.$$

A vector originally taking $4d$ bytes (float32) is compressed to $m$ bytes; with $d = 768$ and $m = 64$ this is a $48\times$ memory reduction.

Asymmetric distance computation. Given the (uncompressed) query $\mathbf{q}$ and the (compressed) database vector $\mathbf{x}$, the squared Euclidean distance is approximated by

$$\|\mathbf{q} - \hat{\mathbf{x}}\|^2 \;=\; \sum_{j=1}^{m} \|\mathbf{q}^{(j)} - \mathbf{c}^{(j)}_{i_j}\|^2.$$

This sum is computed efficiently by precomputing, once per query, an $m \times K$ lookup table $L[j, k] = \|\mathbf{q}^{(j)} - \mathbf{c}^{(j)}_k\|^2$ and adding $m$ entries per database vector ($O(m)$ rather than $O(d)$ per comparison).

Putting it together.

Index time. Train coarse quantiser; assign each $\mathbf{x}$ to a centroid; train PQ codebooks on residuals; encode and store $(q(\mathbf{x}), \text{PQ code})$ in inverted lists.
Query time. Find the $n_{\text{probe}}$ closest coarse centroids to $\mathbf{q}$. For each, compute the corresponding query residual $\mathbf{q} - \mathbf{c}_i$, build the PQ lookup table, scan the inverted list summing PQ-approximated distances, and maintain a min-heap of the top $k$.

Trade-offs.

$n_{\text{list}}$ controls coarse-search granularity (typical $\sqrt{N}$).
$n_{\text{probe}}$ controls recall–latency.
$m$ controls memory–accuracy.
Optional rotation pre-processing (OPQ) decorrelates subspaces and improves PQ accuracy.

IVF-PQ is the preferred index above $\sim 10^8$ vectors, where graph methods like HNSW become memory-prohibitive.

Related terms: FAISS, HNSW

Discussed in:

Chapter 14: Generative Models, Retrieval

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).