Exercises
Probability basics
Three coins. Three coins are fair, biased $0.7$, and double-headed. You pick one uniformly, flip, and see heads. Compute the posterior over the three coins and verify the answer in Section 4.2.
Inclusion-exclusion. For events $A, B, C$, derive $\mathrm{P}(A \cup B \cup C)$ in terms of probabilities of intersections. State the general formula for $n$ events.
Birthday paradox. With 23 people in a room, what is the probability that at least two share a birthday? Derive the formula and compute the numerical answer.
Conditional independence. Give an example of three events where $A \perp B$ but $A \not\perp B \mid C$. Give another where $A \perp B \mid C$ but $A \not\perp B$.
Bayes
Two-test screening. A disease has prevalence $0.005$. Test 1 has sensitivity $0.95$ and specificity $0.99$. Test 2 (independent of test 1 given disease status) has sensitivity $0.90$ and specificity $0.95$. Compute $\mathrm{P}(D \mid \text{both positive})$.
Naïve Bayes spam. The word "free" appears in $40\%$ of spam and $5\%$ of ham; "win" appears in $30\%$ of spam and $1\%$ of ham. A new email contains both words. Assuming conditional independence given class and $\mathrm{P}(\text{spam}) = 0.3$, compute the posterior probability of spam.
Gaussian-Gaussian update. Prior on $\mu$: $\mathcal{N}(0, 1)$. Likelihood: 5 i.i.d. samples from $\mathcal{N}(\mu, 4)$ with sample mean $\bar{x} = 1.2$. Derive the posterior on $\mu$ and compute its mean and variance.
Distributions
Bernoulli MLE. Derive the MLE of $p$ from $n$ Bernoulli observations.
Poisson approximation. Justify the Poisson approximation to the Binomial via the limit $n \to \infty$, $p \to 0$, $np = \lambda$ fixed.
Beta posterior. Show that with a Beta($\alpha, \beta$) prior and $k$ successes in $n$ Bernoulli trials, the posterior is Beta($\alpha + k$, $\beta + n - k$).
Dirichlet marginals. Show that if $\boldsymbol\pi \sim \text{Dirichlet}(\boldsymbol\alpha)$, then $\pi_k \sim \text{Beta}(\alpha_k, \alpha_0 - \alpha_k)$ where $\alpha_0 = \sum_k \alpha_k$.
Exponential family canonical form. Express the univariate Gaussian (with both $\mu$ and $\sigma^2$ unknown) as an exponential family and identify $\boldsymbol\eta$, $T(x)$, $A(\boldsymbol\eta)$, and $h(x)$.
Expectation, variance, covariance
Linearity demonstration. Prove $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ without assuming independence.
Variance of sum. Show $\mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\,\mathrm{Cov}(X, Y)$.
Sample variance bias. Show that the MLE variance estimator $\hat{\sigma}^2_\text{MLE} = \tfrac{1}{n}\sum (X_i - \bar{X})^2$ has expectation $\sigma^2(n-1)/n$. What unbiased correction do you apply?
Tower rule numerics. $Y \sim \text{Bernoulli}(0.4)$; given $Y = 1$, $X \sim \mathcal{N}(2, 1)$; given $Y = 0$, $X \sim \mathcal{N}(-1, 4)$. Compute $\mathbb{E}[X]$ and $\mathrm{Var}(X)$ using the laws of total expectation and variance.
Inequalities and limit theorems
Markov tightness. Construct a non-negative random variable for which Markov's inequality is tight at $a = 2\, \mathbb{E}[X]$.
Chebyshev for Poisson. A Poisson($\lambda = 100$) variable is observed. Use Chebyshev to bound $\mathrm{P}(|X - 100| \geq 30)$.
Hoeffding sample size. How many i.i.d. Bernoulli observations do you need to estimate $p$ to within $\epsilon = 0.005$ with probability at least $0.99$?
CLT empirical check. Simulate $10\,000$ averages of $n = 30$ i.i.d. Exponential($1$) draws and plot a histogram against the predicted Gaussian.
Multivariate Gaussian
Marginalisation by hand. Given $\boldsymbol\mu = (1, 2, 3)^\top$ and $\boldsymbol\Sigma$ with $\Sigma_{11} = 4$, $\Sigma_{22} = 1$, $\Sigma_{33} = 9$, $\Sigma_{12} = 1$, $\Sigma_{13} = 2$, $\Sigma_{23} = -1$, find the joint distribution of $(X_1, X_3)$.
Conditional Gaussian. Using the same $\boldsymbol\Sigma$, find $\mathbb{E}[X_1 \mid X_2 = 4, X_3 = 5]$ and $\mathrm{Var}(X_1 \mid X_2 = 4, X_3 = 5)$.
PCA from covariance. Diagonalise $\boldsymbol\Sigma = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ and write the principal components.
Information theory
Entropy of Bernoulli. Plot $H(p) = -p\log p - (1-p)\log(1-p)$ on $p \in (0,1)$ and identify the maximum.
KL between Gaussians. Derive $$ D_\text{KL}\big(\mathcal{N}(\mu_1, \sigma_1^2)\,\|\,\mathcal{N}(\mu_2, \sigma_2^2)\big) = \log\frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} - \frac{1}{2}. $$
Cross-entropy = NLL. Show that minimising $H(p_\text{empirical}, q_\theta)$ is equivalent to maximum-likelihood estimation of $\boldsymbol\theta$.
Mutual information of bivariate Gaussian. Show that $I(X; Y) = -\tfrac{1}{2}\log(1 - \rho^2)$ for a bivariate Gaussian with correlation $\rho$.
Channel capacity. For the Gaussian channel with $P/\sigma^2 = 7$, compute the capacity in bits per channel use.
Sampling
Inverse CDF for Laplace. Derive the inverse CDF of the Laplace($\mu, b$) distribution and write a Python sampler.
Importance sampling failure. Estimate $\mathbb{E}_p[X^2]$ for $X \sim \mathcal{N}(0,1)$ using a $\mathcal{N}(0, 0.1)$ proposal. Compute the effective sample size $\text{ESS} = (\sum w_i)^2 / \sum w_i^2$ and discuss why this proposal is poor.
Gumbel-Max trick. Show that $\arg\max_k (\log \pi_k + g_k)$ with $g_k \sim \text{Gumbel}(0, 1)$ i.i.d. is distributed as Categorical($\boldsymbol\pi$).
Project-style
Calibration of a softmax classifier. Train a small CNN on CIFAR-10. Plot the reliability diagram on the test set. Apply temperature scaling and report ECE before and after.
Bayesian A/B test. Two product variants have $145/2000$ and $172/2000$ click-throughs. Using uniform Beta priors, compute the posterior over each rate and the posterior probability that variant B has a higher true rate.