Maximum Likelihood Estimation — Glossary

Also known as: MLE

Maximum Likelihood Estimation (MLE) is the most widely used method for fitting parametric models. Given a parametric family $p(x \mid \theta)$ and i.i.d. data $D = {x_1, \ldots, x_n}$, the likelihood function is $L(\theta) = \prod_i p(x_i \mid \theta)$. The MLE is the value $\hat{\theta}_{MLE}$ that maximises this likelihood—or equivalently, the log-likelihood $\ell(\theta) = \sum_i \log p(x_i \mid \theta)$. Working in log space improves numerical stability and converts products into sums.

For many standard models, the MLE has a closed form. The MLE of a Gaussian's mean is the sample mean; of the Bernoulli parameter, the sample proportion. When no closed form exists—as in logistic regression or neural networks—the log-likelihood is maximised numerically, which is equivalent to minimising the negative log-likelihood via gradient descent. Training a classifier with cross-entropy loss is maximum likelihood estimation of the model's parameters.

Under mild regularity conditions, MLE is consistent (converges to the true parameter), asymptotically normal, and asymptotically efficient (attains the Cramér–Rao lower bound). Maximum a Posteriori (MAP) estimation extends MLE by adding a log-prior term: a Gaussian prior yields L2 regularisation (weight decay), a Laplace prior yields L1 regularisation (lasso). MAP thus unifies frequentist penalised estimation with Bayesian reasoning, revealing that many seemingly different training procedures are, under the hood, the same calculation viewed from different angles.

Related terms: Cross-Entropy, Logistic Regression

Discussed in:

Chapter 5: Statistics — Maximum Likelihood Estimation

Also defined in: Textbook of AI, Textbook of Medical AI