Glossary

Gaussian Splatting

3D Gaussian Splatting (3DGS) is a scene representation introduced by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler and George Drettakis at Inria in the SIGGRAPH 2023 paper "3D Gaussian Splatting for Real-Time Radiance Field Rendering". It has largely displaced NeRF for real-time radiance-field applications because it renders at 100+ FPS while matching or exceeding NeRF quality.

Representation. A scene is a set of $N \approx 10^5$–$10^7$ anisotropic 3D Gaussians. Each Gaussian $i$ has:

  • Position $\boldsymbol{\mu}_i \in \mathbb{R}^3$.
  • 3D covariance $\Sigma_i \in \mathbb{R}^{3 \times 3}$, parameterised as $\Sigma = R S S^\top R^\top$ with rotation $R$ (quaternion) and diagonal scale $S$.
  • Opacity $\alpha_i \in [0, 1]$.
  • View-dependent colour, encoded as third-order spherical harmonic coefficients $\mathbf{c}_i \in \mathbb{R}^{48}$.

The density at point $\mathbf{x}$ contributed by Gaussian $i$ is

$$G_i(\mathbf{x}) = \exp\!\left(-\tfrac{1}{2} (\mathbf{x} - \boldsymbol{\mu}_i)^\top \Sigma_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)\right).$$

Differentiable rasterisation. Rendering does not march rays through a volume; instead, each 3D Gaussian is projected into screen space (its 2D covariance is $\Sigma' = J W \Sigma W^\top J^\top$ where $W$ is the view transform and $J$ is the Jacobian of the projection). Projected Gaussians are sorted by depth per tile and alpha-composited:

$$C(\mathbf{p}) = \sum_{i \in \mathcal{N}} \alpha_i G_i'(\mathbf{p}) \prod_{j < i} (1 - \alpha_j G_j'(\mathbf{p})) \, \mathbf{c}_i$$

with the front-to-back rule. This rasterisation is parallelised across tiles and is fully differentiable, allowing gradient flow back into the Gaussian parameters.

Optimisation. Training proceeds from a sparse SfM point cloud (typically COLMAP). Initial Gaussians are placed at SfM points; gradient descent on photometric loss adjusts their parameters; periodic adaptive density control clones high-gradient Gaussians (under-reconstruction) and prunes low-opacity ones (over-reconstruction). Training time on a single A100 is $\sim$30 minutes per scene, an order of magnitude faster than vanilla NeRF.

Why faster than NeRF? NeRF requires a forward pass of an MLP for every sample on every ray. 3DGS replaces the MLP with explicit primitives plus a sparse, sorted tile rasteriser, hardware-friendly for GPUs designed for triangle rendering.

Variants and extensions.

  • 4D Gaussian Splatting. Adds time as a dimension for dynamic scenes.
  • GS-LRM, LGM, Splatter Image. Feed-forward 3D reconstruction predicting Gaussians from one or a few views.
  • 2DGS. Replaces 3D ellipsoids with 2D oriented disks for better surface reconstruction.
  • SuGaR, GaussianAvatar. Mesh extraction and avatar applications.

Adoption. 3DGS is now standard in commercial photogrammetry pipelines (Polycam, Luma AI), VFX, augmented reality, and as the 3D backbone for several text-to-3D systems. Its real-time rendering on consumer GPUs (and even mobile devices) is the key practical advantage that NeRF never achieved.

Related terms: Neural Radiance Fields, InstantNGP

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).