3D Gaussian Splatting (3DGS) is a scene representation introduced by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler and George Drettakis at Inria in the SIGGRAPH 2023 paper "3D Gaussian Splatting for Real-Time Radiance Field Rendering". It has largely displaced NeRF for real-time radiance-field applications because it renders at 100+ FPS while matching or exceeding NeRF quality.
Representation. A scene is a set of $N \approx 10^5$–$10^7$ anisotropic 3D Gaussians. Each Gaussian $i$ has:
- Position $\boldsymbol{\mu}_i \in \mathbb{R}^3$.
- 3D covariance $\Sigma_i \in \mathbb{R}^{3 \times 3}$, parameterised as $\Sigma = R S S^\top R^\top$ with rotation $R$ (quaternion) and diagonal scale $S$.
- Opacity $\alpha_i \in [0, 1]$.
- View-dependent colour, encoded as third-order spherical harmonic coefficients $\mathbf{c}_i \in \mathbb{R}^{48}$.
The density at point $\mathbf{x}$ contributed by Gaussian $i$ is
$$G_i(\mathbf{x}) = \exp\!\left(-\tfrac{1}{2} (\mathbf{x} - \boldsymbol{\mu}_i)^\top \Sigma_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)\right).$$
Differentiable rasterisation. Rendering does not march rays through a volume; instead, each 3D Gaussian is projected into screen space (its 2D covariance is $\Sigma' = J W \Sigma W^\top J^\top$ where $W$ is the view transform and $J$ is the Jacobian of the projection). Projected Gaussians are sorted by depth per tile and alpha-composited:
$$C(\mathbf{p}) = \sum_{i \in \mathcal{N}} \alpha_i G_i'(\mathbf{p}) \prod_{j < i} (1 - \alpha_j G_j'(\mathbf{p})) \, \mathbf{c}_i$$
with the front-to-back rule. This rasterisation is parallelised across tiles and is fully differentiable, allowing gradient flow back into the Gaussian parameters.
Optimisation. Training proceeds from a sparse SfM point cloud (typically COLMAP). Initial Gaussians are placed at SfM points; gradient descent on photometric loss adjusts their parameters; periodic adaptive density control clones high-gradient Gaussians (under-reconstruction) and prunes low-opacity ones (over-reconstruction). Training time on a single A100 is $\sim$30 minutes per scene, an order of magnitude faster than vanilla NeRF.
Why faster than NeRF? NeRF requires a forward pass of an MLP for every sample on every ray. 3DGS replaces the MLP with explicit primitives plus a sparse, sorted tile rasteriser, hardware-friendly for GPUs designed for triangle rendering.
Variants and extensions.
- 4D Gaussian Splatting. Adds time as a dimension for dynamic scenes.
- GS-LRM, LGM, Splatter Image. Feed-forward 3D reconstruction predicting Gaussians from one or a few views.
- 2DGS. Replaces 3D ellipsoids with 2D oriented disks for better surface reconstruction.
- SuGaR, GaussianAvatar. Mesh extraction and avatar applications.
Adoption. 3DGS is now standard in commercial photogrammetry pipelines (Polycam, Luma AI), VFX, augmented reality, and as the 3D backbone for several text-to-3D systems. Its real-time rendering on consumer GPUs (and even mobile devices) is the key practical advantage that NeRF never achieved.
Related terms: Neural Radiance Fields, InstantNGP
Discussed in:
- Chapter 11: CNNs, 3D Scene Representation