Dirichlet Distribution, Glossary, Textbook of AI

The Dirichlet distribution $\mathrm{Dir}(\alpha)$ is a distribution over the $(K-1)$-simplex, the set of probability vectors $(p_1, \ldots, p_K)$ with $p_k \geq 0$ and $\sum_k p_k = 1$. Density:

$$\mathrm{Dir}(p | \alpha) = \frac{1}{B(\alpha)} \prod_{k=1}^K p_k^{\alpha_k - 1}$$

where $\alpha_k > 0$ are concentration parameters and $B(\alpha) = \prod_k \Gamma(\alpha_k) / \Gamma(\sum_k \alpha_k)$ is the multinomial Beta function.

Mean: $\mathbb{E}[p_k] = \alpha_k / \alpha_0$ where $\alpha_0 = \sum_k \alpha_k$ is the total concentration.

Concentration: the larger $\alpha_0$, the more peaked the distribution around the mean. Symmetric $\mathrm{Dir}(\alpha, \alpha, \ldots, \alpha)$ with:

$\alpha = 1$: uniform over the simplex.
$\alpha < 1$: peaks toward the corners (sparse distributions).
$\alpha > 1$: peaks toward the centroid (uniform-like distributions).

Conjugate prior for the categorical/multinomial. Given prior $\mathrm{Dir}(\alpha)$ and observed counts $n_1, \ldots, n_K$, posterior is $\mathrm{Dir}(\alpha + n)$. Posterior mean of $p_k$ is

$$\hat p_k = \frac{\alpha_k + n_k}{\alpha_0 + N}$$

, exactly maximum-likelihood estimation with Laplace smoothing at $\alpha_k = 1$, or add-$\alpha$ smoothing more generally.

Beta is the $K = 2$ special case (the conjugate prior for Bernoulli).

In AI / ML:

Latent Dirichlet Allocation uses Dirichlet priors on document-topic and topic-word distributions.
Bayesian non-parametrics: the Dirichlet process (limit as $K \to \infty$) underlies infinite mixture models.
Stick-breaking construction: an explicit sampling scheme for Dirichlet (and Dirichlet-process) variables.
Categorical reparameterisation: Dirichlet sampling can be used to draw the parameters of a categorical, then sample from it.

Sampling: draw independent Gamma-distributed variables $g_k \sim \mathrm{Gamma}(\alpha_k, 1)$, then $p = g / \sum_j g_j$. The result is exactly Dirichlet-distributed.

Discussed in:

Chapter 4: Probability, Probability

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).