The Dirichlet distribution $\mathrm{Dir}(\alpha)$ is a distribution over the $(K-1)$-simplex, the set of probability vectors $(p_1, \ldots, p_K)$ with $p_k \geq 0$ and $\sum_k p_k = 1$. Density:
$$\mathrm{Dir}(p | \alpha) = \frac{1}{B(\alpha)} \prod_{k=1}^K p_k^{\alpha_k - 1}$$
where $\alpha_k > 0$ are concentration parameters and $B(\alpha) = \prod_k \Gamma(\alpha_k) / \Gamma(\sum_k \alpha_k)$ is the multinomial Beta function.
Mean: $\mathbb{E}[p_k] = \alpha_k / \alpha_0$ where $\alpha_0 = \sum_k \alpha_k$ is the total concentration.
Concentration: the larger $\alpha_0$, the more peaked the distribution around the mean. Symmetric $\mathrm{Dir}(\alpha, \alpha, \ldots, \alpha)$ with:
- $\alpha = 1$: uniform over the simplex.
- $\alpha < 1$: peaks toward the corners (sparse distributions).
- $\alpha > 1$: peaks toward the centroid (uniform-like distributions).
Conjugate prior for the categorical/multinomial. Given prior $\mathrm{Dir}(\alpha)$ and observed counts $n_1, \ldots, n_K$, posterior is $\mathrm{Dir}(\alpha + n)$. Posterior mean of $p_k$ is
$$\hat p_k = \frac{\alpha_k + n_k}{\alpha_0 + N}$$
, exactly maximum-likelihood estimation with Laplace smoothing at $\alpha_k = 1$, or add-$\alpha$ smoothing more generally.
Beta is the $K = 2$ special case (the conjugate prior for Bernoulli).
In AI / ML:
- Latent Dirichlet Allocation uses Dirichlet priors on document-topic and topic-word distributions.
- Bayesian non-parametrics: the Dirichlet process (limit as $K \to \infty$) underlies infinite mixture models.
- Stick-breaking construction: an explicit sampling scheme for Dirichlet (and Dirichlet-process) variables.
- Categorical reparameterisation: Dirichlet sampling can be used to draw the parameters of a categorical, then sample from it.
Sampling: draw independent Gamma-distributed variables $g_k \sim \mathrm{Gamma}(\alpha_k, 1)$, then $p = g / \sum_j g_j$. The result is exactly Dirichlet-distributed.
Related terms: Categorical Distribution, Latent Dirichlet Allocation, Bayesian Inference
Discussed in:
- Chapter 4: Probability, Probability