Solution sketches
1. $\mathrm{P}(C_3 \mid H) = (1 \cdot 1/3)/(0.5/3 + 0.7/3 + 1.0/3) = 1/2.2 \approx 0.4545$. Similarly $\mathrm{P}(C_2 \mid H) = 0.7/2.2 \approx 0.318$ and $\mathrm{P}(C_1 \mid H) = 0.5/2.2 \approx 0.227$.
3. $\mathrm{P}(\text{no shared}) = \prod_{k=0}^{22}(365-k)/365 \approx 0.4927$, so $\mathrm{P}(\text{shared}) \approx 0.5073$.
5. Let $T_1, T_2$ be the test results and $D$ the disease status. Conditional independence given $D$: $$ \mathrm{P}(D \mid T_1=1, T_2=1) = \frac{0.95 \cdot 0.90 \cdot 0.005}{0.95 \cdot 0.90 \cdot 0.005 + 0.01 \cdot 0.05 \cdot 0.995} = \frac{0.004275}{0.004275 + 0.000497} \approx 0.896. $$
7. Posterior precision $= 1/1 + 5/4 = 9/4$, so posterior variance $= 4/9$. Posterior mean $= (4/9)(0/1 + 5 \cdot 1.2/4) = (4/9)(1.5) = 2/3 \approx 0.667$. Posterior is $\mathcal{N}(2/3, 4/9)$.
8. $\ell(p) = k \log p + (n-k)\log(1-p)$; $\ell'(p) = k/p - (n-k)/(1-p) = 0 \implies \hat p = k/n$.
13. Use the joint distribution $p(x, y)$. $\mathbb{E}[X+Y] = \sum_{x,y}(x+y)p(x,y) = \sum_x x \sum_y p(x,y) + \sum_y y \sum_x p(x,y) = \mathbb{E}[X] + \mathbb{E}[Y]$. No independence used.
16. $\mathbb{E}[X] = 0.4 \cdot 2 + 0.6 \cdot (-1) = 0.2$. By the law of total variance, $$ \mathrm{Var}(X) = \mathbb{E}[\mathrm{Var}(X \mid Y)] + \mathrm{Var}(\mathbb{E}[X \mid Y]). $$ The first term is $0.4 \cdot 1 + 0.6 \cdot 4 = 2.8$. The second is $\mathbb{E}[(\mathbb{E}[X \mid Y])^2] - (\mathbb{E}[X])^2 = (0.4 \cdot 4 + 0.6 \cdot 1) - 0.04 = 2.2 - 0.04 = 2.16$. Total: $\mathrm{Var}(X) = 2.8 + 2.16 = 4.96$.
19. Hoeffding: $2\exp(-2n\epsilon^2) \leq 0.01 \implies n \geq \log(200)/(2 \cdot 0.005^2) \approx 105\,966$.
24. $H(p)$ peaks at $p = 0.5$ with $H = \log 2 \approx 0.693$ nats $= 1$ bit.
25. Direct integration of $\int p(x) [\log p(x) - \log q(x)]\, dx$ with $p, q$ Gaussians yields the stated formula.
26. $H(p_\text{emp}, q_\theta) = -\tfrac{1}{n}\sum_i \log q_\theta(x_i) = -\tfrac{1}{n}\ell(\boldsymbol\theta)$. Minimising the cross-entropy is the same as maximising the log-likelihood.
28. $C = \tfrac{1}{2}\log_2(1+7) = \tfrac{1}{2}\log_2 8 = 3/2$ bits per use.
29. Laplace CDF: $F(x) = \tfrac{1}{2}\exp((x-\mu)/b)$ for $x \leq \mu$, and $1 - \tfrac{1}{2}\exp(-(x-\mu)/b)$ for $x \geq \mu$. Inverse: $F^{-1}(u) = \mu - b\,\mathrm{sgn}(u - 0.5)\log(1 - 2|u - 0.5|)$.
31. $\mathrm{P}(\arg\max_k = j) = \mathrm{P}(\log \pi_j + g_j > \log \pi_k + g_k\ \forall k \neq j)$. Use the fact that $\max_k (\log \pi_k + g_k)$ has CDF $\exp(-e^{-x}\sum_k \pi_k) = \exp(-e^{-x})$ (the standard Gumbel CDF since $\sum_k \pi_k = 1$); standard Gumbel calculus gives $\mathrm{P}(\arg\max = j) = \pi_j$.
33. Posterior on $p_A$: Beta$(146, 1856)$; on $p_B$: Beta$(173, 1829)$. Sample 100k pairs and report the fraction with $p_B > p_A$, a few lines of NumPy give $\approx 0.93$.
These exercises and sketches give plenty of practice. Move to Chapter 5 once Bayes' theorem, the multivariate Gaussian, and KL divergence feel like routine tools rather than abstract formulas.