Statistics: Solution sketches

Dr Chris Paton

Solution sketches

1. With $\{2,4,4,5,6,6,7\}$: mean $= 4.857$, median $= 5$. Replacing 7 with 700: mean $= 103.857$, median $= 5$. The mean changes by 99 while the median is invariant, the canonical robustness contrast.

2. $\sum (X_i - \bar X)^2 = \sum X_i^2 - n\bar X^2$. Take expectations: $\mathbb{E}[X_i^2] = \sigma^2 + \mu^2$, $\mathbb{E}[\bar X^2] = \sigma^2/n + \mu^2$. So $\mathbb{E}[\sum(X_i-\bar X)^2] = n(\sigma^2 + \mu^2) - n(\sigma^2/n + \mu^2) = (n-1)\sigma^2$, giving $\mathbb{E}[\frac{1}{n}\sum(X_i-\bar X)^2] = \frac{n-1}{n}\sigma^2$ and proving the necessity of dividing by $n-1$ for unbiasedness.

5. $\operatorname{MSE}_1 = 9 + 0 = 9$. $\operatorname{MSE}_2 = 4 + 1 = 5$. The biased estimator $\hat\theta_2$ wins by 4, bias is not the enemy when it buys variance.

7. Log-likelihood: $n\log\lambda - \lambda\sum x_i$. Setting derivative to zero: $n/\lambda - \sum x_i = 0$, so $\hat\lambda = n/\sum x_i = 1/\bar x$. The MLE of an exponential rate is the reciprocal of the sample mean.

8. The likelihood is $\theta^{-n}$ for $\theta \geq \max(x_i)$ and 0 otherwise. It is maximised at $\hat\theta = \max(x_i) = X_{(n)}$. Bias: $\mathbb{E}[X_{(n)}] = \frac{n}{n+1}\theta$, so the MLE underestimates by a factor $1/(n+1)$. The unbiased estimator is $(n+1)/n \cdot X_{(n)}$.

10. Posterior: $\operatorname{Beta}(2+8, 2+2) = \operatorname{Beta}(10, 4)$. Posterior mean: $10/14 \approx 0.714$. Mode: $(10-1)/(10+4-2) = 9/12 = 0.75$. 95% credible interval: approximately $[0.46, 0.91]$.

11. Posterior $\theta \mid x \sim \mathcal{N}\!\left(\frac{\tau^2}{1+\tau^2}x,\ \frac{\tau^2}{1+\tau^2}\right)$. MAP $= \frac{\tau^2}{1+\tau^2}x$, a shrinkage estimator. With $X\beta + \varepsilon$ and prior $\beta\sim\mathcal{N}(0, \tau^2 I)$, MAP gives ridge with $\lambda = \sigma^2/\tau^2$.

13. Power against $\mu_1$ for the one-sample $z$-test: $\Phi(\mu_1\sqrt{n}/\sigma - z_{\alpha/2}) + \Phi(-\mu_1\sqrt{n}/\sigma - z_{\alpha/2})$. With $n=100$, $\sigma=1$, $\alpha=0.05$, $z_{0.025} = 1.96$. For $\mu = 0.2$: $\Phi(0.04) + \Phi(-3.96) \approx 0.516$. For $\mu = 0.5$: power $\approx 0.999$. For $\mu = 0.1$: $\approx 0.169$.

15. Bonferroni at $\alpha/m = 0.0025$: rejects only $p_1 = 0.001$. BH at $\alpha = 0.05$: find largest $k$ with $p_{(k)} \leq 0.05k/20$. $p_{(1)}=0.001 \leq 0.0025$ ✓; $p_{(2)}=0.005 \leq 0.005$ ✓; $p_{(3)}=0.01 \leq 0.0075$ ✗; $p_{(4)}=0.02 \leq 0.01$ ✗, so the largest $k$ where the condition holds is $k=2$. BH rejects the bottom 2.

17. $X^\top X = \begin{pmatrix} 4 & 10 \\ 10 & 30 \end{pmatrix}$, $X^\top Y = (19, 56)^\top$. $\det = 4\cdot 30 - 100 = 20$, so $(X^\top X)^{-1} = (1/20)\begin{pmatrix} 30 & -10 \\ -10 & 4 \end{pmatrix}$. $\hat\beta = (1/20)\begin{pmatrix} 30 & -10 \\ -10 & 4 \end{pmatrix}\begin{pmatrix}19\\56\end{pmatrix} = (1/20)(570-560,\ -190+224)^\top = (0.5, 1.7)^\top$. Intercept 0.5, slope 1.7.

18. With no features, $p_i = \sigma(\beta_0)$ is constant. The score equation is $\sum(y_i - p) = 0$, giving $p = \bar y$, and inverting the sigmoid gives $\beta_0 = \log(\bar y/(1-\bar y))$.

19. Gaussian, identity (continuous outcome). Bernoulli/Binomial, logit (binary/proportion). Poisson, log (counts). Gamma, inverse (positive continuous, e.g. waiting times).

22. AIC penalty per extra parameter is 2; BIC penalty is $\log n$. $\log n > 2$ iff $n > e^2 \approx 7.39$. For any dataset of meaningful size, BIC's penalty dominates, leading BIC to favour smaller models.

24. Confounder = common cause of treatment and outcome (e.g. age affects both blood-pressure medication choice and stroke risk); adjust for it. Mediator = on the causal pathway from treatment to outcome (e.g. blood pressure mediates the effect of medication on stroke); do not adjust for it when estimating the total effect, adjusting blocks the pathway you are trying to measure.

Textbook of AI

Solution sketches

Further Learning