Statistics

Dr Chris Paton

Learning Objectives

Summarise data using measures of central tendency and spread, and choose appropriate summaries for the data type
Distinguish population from sample and construct confidence intervals to quantify estimation uncertainty
Formulate and test statistical hypotheses, interpreting p-values and the risks of Type I and Type II errors
Derive parameter estimates via maximum likelihood estimation and connect MLE to loss minimisation
Decompose prediction error into bias, variance, and irreducible noise and use the tradeoff to diagnose models

Probability asks: "Given a known process, what data will we see?" Statistics asks the reverse: "Given observed data, what process produced them?" This inverse problem is the heart of machine learning. Every learning algorithm, from least squares to deep neural networks, from k-means to diffusion models, is, beneath its packaging, an estimator built on statistical foundations. You always have a finite sample and must draw conclusions about the broader world: making predictions, estimating parameters, choosing between models, and judging whether the patterns you see are real signals or accidents of noise.

This chapter builds those foundations from first principles and pushes them to the level required to read modern AI papers with confidence. We begin with descriptive statistics and the philosophy of estimation, contrast the frequentist and Bayesian schools that quietly underpin every algorithmic choice, then build the machinery of estimators, confidence intervals, hypothesis testing, maximum likelihood and Bayesian inference, the bootstrap, linear and generalised linear models, hierarchical estimation, model selection, causal inference, and the evaluation of machine learning systems themselves. Throughout, the emphasis is conceptual clarity married to computational practice, every major idea is illustrated with hand calculations, worked examples, and Python code you can run immediately. For deeper canonical treatment, the reader should consult Hastie, Tibshirani, and Friedman's Elements of Statistical Learning Hastie, 2009, Bishop's Pattern Recognition and Machine Learning Bishop, 2006, Casella and Berger's Statistical Inference, Wasserman's All of Statistics, and Gelman et al.'s Bayesian Data Analysis.

Textbook of AI

Statistics

In this chapter