A Probability Distribution specifies how probability mass or density is allocated across the possible values of a random variable. Choosing the right distribution family is one of the most consequential decisions in applied machine learning, since it encodes structural assumptions about the data-generating process.
Key discrete distributions include the Bernoulli (single binary trial), Binomial ($n$ independent Bernoulli trials), Categorical (generalises Bernoulli to $K$ outcomes; the natural output of a softmax classifier), Multinomial (categorical with multiple draws), and Poisson (count of events in a fixed interval). Key continuous distributions include the Gaussian or normal (ubiquitous thanks to the Central Limit Theorem), Uniform, Exponential, Beta (for probabilities in $[0, 1]$), Dirichlet (for probability simplices), and Gamma.
A unifying perspective is provided by the exponential family, a broad class with canonical form $p(x \mid \eta) = h(x)\exp(\eta^T T(x) - A(\eta))$. Bernoulli, Gaussian, Poisson, Beta, Gamma, and Dirichlet are all exponential family members. They enjoy elegant properties: conjugate priors, sufficient statistics that capture all information, and maximum-likelihood estimators that depend on the data only through those sufficient statistics. Generalised linear models pair an exponential-family response with a linear predictor through a link function—a powerful framework that includes linear and logistic regression as special cases.
Related terms: Gaussian Distribution, Random Variable
Discussed in:
- Chapter 4: Probability — Probability Distributions
Also defined in: Textbook of AI