Gibbs sampling is the special case of Metropolis-Hastings where each step updates one variable at a time, sampling from its conditional distribution given the others:
$$x_i^{(t+1)} \sim P(x_i \mid x_{-i}^{(t)})$$
where $x_{-i}^{(t)}$ is the current value of all other variables.
The acceptance probability is always 1 (every proposal is accepted), provided the conditional distributions can be sampled exactly. The chain provably converges to the joint distribution $P(x)$ as its stationary distribution.
Why this is useful: many distributions are intractable to sample from jointly but have tractable conditionals. Bayesian networks, undirected graphical models, mixture models, latent Dirichlet allocation, all admit straightforward conditional sampling.
Block Gibbs sampling: update groups of variables jointly when their conditional distribution is tractable. Often dramatically improves mixing.
Collapsed Gibbs sampling: integrate out some variables analytically before sampling. Used in LDA (collapse out per-document topic mixtures and topic-word distributions, sample only word-topic assignments).
Limitations:
- Slow mixing in high-correlation regimes, moves one coordinate at a time.
- Requires tractable conditionals, not always available.
- Doesn't generalise easily to continuous latent variables (Hamiltonian MC and SVI handle these better).
Despite these, Gibbs sampling remains the workhorse of probabilistic graphical model inference and the standard implementation in PyMC, Stan (for the discrete components), JAGS and BUGS.
Video
Related terms: MCMC, Latent Dirichlet Allocation, Bayesian Inference
Discussed in:
- Chapter 4: Probability, Probability