Probability

Dr Chris Paton

Learning Objectives

State the axioms of probability and use them to compute the probability of compound and conditional events
Apply Bayes' theorem to update beliefs in light of new evidence
Recognise the key discrete and continuous distributions used in AI (Bernoulli, binomial, normal, categorical)
Compute the expectation and variance of random variables and explain their role in model training
Measure information and uncertainty using entropy, cross-entropy, and KL divergence

A doctor reads a positive screening test and tells the patient they probably have the disease. The maths says otherwise: if the disease is rare and the test imperfect, most positive results are false alarms. Getting this wrong destroys lives. Probability theory is the tool that gets it right.

AI systems face the same challenge every day. A spam filter must decide whether an email is junk. A self-driving car must judge whether the shape ahead is a pedestrian. A language model must pick the most likely next token. None of these systems has perfect information, and none can afford to act as though it does. Probability gives them a principled way to handle that uncertainty.

This chapter builds probability from the ground up. You will start with Kolmogorov's axioms, learn Bayes' theorem, the single most important formula in applied AI, and then meet the distributions that recur throughout machine learning. From there we move to expectation and variance, the inequalities and limit theorems that underpin generalisation theory, the multivariate Gaussian, and finally information theory, which supplies the loss functions used to train neural networks. For deeper coverage, see Bishop 2006, Murphy 2022, and MacKay 2003.

Textbook of AI

Probability

In this chapter