Probability Distribution, Glossary, Textbook of AI

A Probability Distribution specifies how probability mass or density is allocated across the possible values of a random variable. Choosing the right distribution family is one of the most consequential decisions in applied machine learning, since it encodes structural assumptions about the data-generating process.

Key discrete distributions include the Bernoulli (single binary trial), Binomial ($n$ independent Bernoulli trials), Categorical (generalises Bernoulli to $K$ outcomes; the natural output of a softmax classifier), Multinomial (categorical with multiple draws), and Poisson (count of events in a fixed interval). Key continuous distributions include the Gaussian or normal (ubiquitous thanks to the Central Limit Theorem), Uniform, Exponential, Beta (for probabilities in $[0, 1]$), Dirichlet (for probability simplices), and Gamma.

A unifying perspective is provided by the exponential family, a broad class with canonical form $p(x \mid \eta) = h(x)\exp(\eta^T T(x) - A(\eta))$. Bernoulli, Gaussian, Poisson, Beta, Gamma, and Dirichlet are all exponential family members. They enjoy useful properties: conjugate priors, sufficient statistics that capture all information, and maximum-likelihood estimators that depend on the data only through those sufficient statistics. Generalised linear models pair an exponential-family response with a linear predictor through a link function, a powerful framework that includes linear and logistic regression as special cases.

Related terms: Gaussian Distribution, Random Variable

Discussed in:

Chapter 4: Probability, Probability Distributions

This site is currently in Beta. Please get in touch via chrispaton.org with any suggestions, questions or comments.