Gaussian Splatting, Glossary, Textbook of AI

3D Gaussian Splatting (3DGS) is a scene representation introduced by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler and George Drettakis at Inria in the SIGGRAPH 2023 paper "3D Gaussian Splatting for Real-Time Radiance Field Rendering". It has largely displaced NeRF for real-time radiance-field applications because it renders at 100+ FPS while matching or exceeding NeRF quality.

Representation. A scene is a set of $N \approx 10^5$–$10^7$ anisotropic 3D Gaussians. Each Gaussian $i$ has:

Position $\boldsymbol{\mu}_i \in \mathbb{R}^3$.
3D covariance $\Sigma_i \in \mathbb{R}^{3 \times 3}$, parameterised as $\Sigma = R S S^\top R^\top$ with rotation $R$ (quaternion) and diagonal scale $S$.
Opacity $\alpha_i \in [0, 1]$.
View-dependent colour, encoded as third-order spherical harmonic coefficients $\mathbf{c}_i \in \mathbb{R}^{48}$.

The density at point $\mathbf{x}$ contributed by Gaussian $i$ is

$$G_i(\mathbf{x}) = \exp\!\left(-\tfrac{1}{2} (\mathbf{x} - \boldsymbol{\mu}_i)^\top \Sigma_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)\right).$$

Differentiable rasterisation. Rendering does not march rays through a volume; instead, each 3D Gaussian is projected into screen space (its 2D covariance is $\Sigma' = J W \Sigma W^\top J^\top$ where $W$ is the view transform and $J$ is the Jacobian of the projection). Projected Gaussians are sorted by depth per tile and alpha-composited:

$$C(\mathbf{p}) = \sum_{i \in \mathcal{N}} \alpha_i G_i'(\mathbf{p}) \prod_{j < i} (1 - \alpha_j G_j'(\mathbf{p})) \, \mathbf{c}_i$$

with the front-to-back rule. This rasterisation is parallelised across tiles and is fully differentiable, allowing gradient flow back into the Gaussian parameters.

Optimisation. Training proceeds from a sparse SfM point cloud (typically COLMAP). Initial Gaussians are placed at SfM points; gradient descent on photometric loss adjusts their parameters; periodic adaptive density control clones high-gradient Gaussians (under-reconstruction) and prunes low-opacity ones (over-reconstruction). Training time on a single A100 is $\sim$30 minutes per scene, an order of magnitude faster than vanilla NeRF.

Why faster than NeRF? NeRF requires a forward pass of an MLP for every sample on every ray. 3DGS replaces the MLP with explicit primitives plus a sparse, sorted tile rasteriser, hardware-friendly for GPUs designed for triangle rendering.

Variants and extensions.

4D Gaussian Splatting. Adds time as a dimension for dynamic scenes.
GS-LRM, LGM, Splatter Image. Feed-forward 3D reconstruction predicting Gaussians from one or a few views.
2DGS. Replaces 3D ellipsoids with 2D oriented disks for better surface reconstruction.
SuGaR, GaussianAvatar. Mesh extraction and avatar applications.

Adoption. 3DGS is now standard in commercial photogrammetry pipelines (Polycam, Luma AI), VFX, augmented reality, and as the 3D backbone for several text-to-3D systems. Its real-time rendering on consumer GPUs (and even mobile devices) is the key practical advantage that NeRF never achieved.

Related terms: Neural Radiance Fields, InstantNGP

Discussed in:

Chapter 11: CNNs, 3D Scene Representation

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).