InstantNGP, Glossary, Textbook of AI

Instant Neural Graphics Primitives (InstantNGP) is the technique introduced by Thomas Müller, Alex Evans, Christoph Schied and Alexander Keller at NVIDIA in the SIGGRAPH 2022 paper "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding". It accelerates NeRF training by 1000$\times$ and is the encoding behind NVIDIA's Instant-NeRF demos and many subsequent neural-graphics works.

Multi-resolution hash encoding. The bottleneck in vanilla NeRF is the small MLP itself: it must memorise high-frequency scene detail, so it needs many parameters and sinusoidal positional encoding with many frequency bands. InstantNGP swaps this for an explicit learnable feature grid, but stored sparsely via hashing.

The encoding is parameterised by $L$ resolution levels with grid sizes $N_l = N_{\min} \cdot b^l$, geometrically increasing from $N_{\min} \approx 16$ to $N_{\max} \approx 2048$. Each level has $T$ feature vectors of dimension $F = 2$. To encode a 3D point $\mathbf{x}$:

For each level $l$, find the surrounding grid cell, with corners at integer coordinates.
Hash each corner index $\mathbf{i} \in \mathbb{Z}^3$ to an index in $[0, T)$ via the spatial hash $$h(\mathbf{i}) = \left( i_x \cdot \pi_1 \oplus i_y \cdot \pi_2 \oplus i_z \cdot \pi_3 \right) \bmod T$$ with large primes $\pi_k$.
Look up the feature at each hashed corner.
Trilinearly interpolate the eight corner features to get a feature at $\mathbf{x}$.
Concatenate features from all $L$ levels into a single $L \cdot F$-dimensional vector.

This vector is the input to a tiny MLP (2 hidden layers, 64 units) that predicts colour and density.

Why hashing works. At fine levels, the dense grid would have $N_{\max}^3 \approx 10^{10}$ cells, infeasible to store. The hash table caps storage at $T \approx 2^{19}$ per level. Hash collisions happen, but they tend to be among empty regions (most fine-grained voxels are empty in real scenes), and the MLP learns to disambiguate informative collisions.

Performance. Training a NeRF-quality scene drops from $\sim$24 hours to 5 seconds on a single RTX 3090. InstantNGP is implemented in CUDA and uses fully fused MLP kernels, achieving $\sim$10$\times$ throughput over PyTorch-native NeRFs.

Beyond NeRF. The same multi-resolution hash encoding has been applied to:

Signed distance functions (SDFs) for 3D shape modelling.
Gigapixel image fitting (a single image as $f(x, y) \to \mathbf{c}$).
Volumetric path tracing of participating media.
Generative 3D priors (DreamFusion uses hash grids for the underlying NeRF).

Significance. InstantNGP was the first practical demonstration that the bottleneck in neural representations was the encoding, not the MLP. It made NeRFs interactive, kicked off the explicit-vs-implicit debate (eventually resolved by Gaussian Splatting), and remains widely used wherever a small, fast neural field is needed.

Related terms: Neural Radiance Fields, Gaussian Splatting

Discussed in:

Chapter 11: CNNs, 3D Scene Representation

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).