- Summarise data using measures of central tendency and spread, and choose appropriate summaries for the data type
- Distinguish population from sample and construct confidence intervals to quantify estimation uncertainty
- Formulate and test statistical hypotheses, interpreting p-values and the risks of Type I and Type II errors
- Derive parameter estimates via maximum likelihood estimation and connect MLE to loss minimisation
- Decompose prediction error into bias, variance, and irreducible noise and use the tradeoff to diagnose models
Probability asks: "Given a known process, what data will we see?" Statistics asks the reverse: "Given observed data, what process produced them?" This inverse problem is the heart of machine learning. Every learning algorithm, from least squares to deep neural networks, from k-means to diffusion models, is, beneath its packaging, an estimator built on statistical foundations. You always have a finite sample and must draw conclusions about the broader world: making predictions, estimating parameters, choosing between models, and judging whether the patterns you see are real signals or accidents of noise.
This chapter builds those foundations from first principles and pushes them to the level required to read modern AI papers with confidence. We begin with descriptive statistics and the philosophy of estimation, contrast the frequentist and Bayesian schools that quietly underpin every algorithmic choice, then build the machinery of estimators, confidence intervals, hypothesis testing, maximum likelihood and Bayesian inference, the bootstrap, linear and generalised linear models, hierarchical estimation, model selection, causal inference, and the evaluation of machine learning systems themselves. Throughout, the emphasis is conceptual clarity married to computational practice, every major idea is illustrated with hand calculations, worked examples, and Python code you can run immediately. For deeper canonical treatment, the reader should consult Hastie, Tibshirani, and Friedman's Elements of Statistical Learning Hastie, 2009, Bishop's Pattern Recognition and Machine Learning Bishop, 2006, Casella and Berger's Statistical Inference, Wasserman's All of Statistics, and Gelman et al.'s Bayesian Data Analysis.
In this chapter
- 5.1 Why Statistics for AI
- 5.2 Frequentist vs Bayesian: The Two Schools
- 5.3 Descriptive Statistics
- 5.4 Estimators, Bias, Variance, MSE
- 5.5 Maximum Likelihood Estimation
- 5.6 MAP and Bayesian Inference
- 5.7 Confidence and Credible Intervals
- 5.8 Hypothesis Testing
- 5.9 Bootstrap and Resampling
- 5.10 Linear Regression as Statistics
- 5.11 Generalised Linear Models
- 5.12 Empirical Bayes and Hierarchical Models
- 5.13 Model Evaluation and Selection
- 5.14 Causal Inference (Preview)
- 5.15 Statistics for AI Evaluation
- 5.16 Bias–Variance Tradeoff
- 5.17 Closing, Statistics as the Substrate of AI
- Exercises
- Solution sketches