- State the axioms of probability and use them to compute the probability of compound and conditional events
- Apply Bayes' theorem to update beliefs in light of new evidence
- Recognise the key discrete and continuous distributions used in AI (Bernoulli, binomial, normal, categorical)
- Compute the expectation and variance of random variables and explain their role in model training
- Measure information and uncertainty using entropy, cross-entropy, and KL divergence
A doctor reads a positive screening test and tells the patient they probably have the disease. The maths says otherwise: if the disease is rare and the test imperfect, most positive results are false alarms. Getting this wrong destroys lives. Probability theory is the tool that gets it right.
AI systems face the same challenge every day. A spam filter must decide whether an email is junk. A self-driving car must judge whether the shape ahead is a pedestrian. A language model must pick the most likely next token. None of these systems has perfect information, and none can afford to act as though it does. Probability gives them a principled way to handle that uncertainty.
This chapter builds probability from the ground up. You will start with Kolmogorov's axioms, learn Bayes' theorem, the single most important formula in applied AI, and then meet the distributions that recur throughout machine learning. From there we move to expectation and variance, the inequalities and limit theorems that underpin generalisation theory, the multivariate Gaussian, and finally information theory, which supplies the loss functions used to train neural networks. For deeper coverage, see Bishop 2006, Murphy 2022, and MacKay 2003.
In this chapter
- 4.1 Why probability for AI
- 4.2 Probability basics
- 4.3 Bayes' theorem in depth
- 4.4 Random variables, PMFs and PDFs
- 4.5 Common distributions
- 4.6 Joint, marginal and conditional distributions
- 4.7 Expectation, variance, covariance
- 4.8 Inequalities
- 4.9 Limit theorems
- 4.10 The multivariate Gaussian
- 4.11 Information theory
- 4.12 Maximum likelihood and Bayesian inference (preview)
- 4.13 Sampling
- 4.14 Python: putting it all together
- 4.15 Worked mini-project: calibrating a spam filter
- 4.16 Summary
- Exercises
- Solution sketches