Causal inference is the framework for inferring causal effects from data, distinguishing cause from correlation. Classical statistics describes joint distributions; causal inference asks what happens when we intervene.
Two equivalent frameworks:
Potential outcomes (Rubin 1974): for each unit $i$ and treatment $t$, posit a potential outcome $Y_i(t)$. Average treatment effect: $\mathrm{ATE} = \mathbb{E}[Y(1) - Y(0)]$. The fundamental problem: only one potential outcome is observed per unit (which one was actually treated).
Structural causal models (Pearl 2000): a directed acyclic graph (DAG) of variables where each node is determined by its parents and exogenous noise. The do-operator $\mathrm{do}(X = x)$ represents intervention. Bayes' theorem doesn't transfer directly under intervention; the do-calculus provides three rules for translating $P(y | \mathrm{do}(x))$ into observational distributions, when possible.
Identification: when can $P(y | \mathrm{do}(x))$ be computed from observational data alone? Backdoor criterion, frontdoor criterion, instrumental variables, regression discontinuity are sufficient conditions in different settings.
Counterfactuals: $P(Y_x | X = x', Y = y')$, "what would $Y$ have been if $X$ had been $x$, given that we observed $X = x'$ and $Y = y'$"? Strictly stronger than intervention; the third rung of Pearl's "ladder of causation" (association → intervention → counterfactual).
Modern relevance:
- Causal inference for ML: out-of-distribution generalisation, fairness, transportability all benefit from causal framing.
- Causal representation learning (Schölkopf et al.): learn representations whose components correspond to disentangled causal variables.
- A/B testing: the gold standard for causal effects in tech companies. Randomisation eliminates confounding.
- Health, social science, policy: where randomised trials are infeasible, observational causal inference (matching, propensity scores, IV) is the workhorse.
Pearl received the 2011 Turing Award partly for the do-calculus.
Related terms: judea-pearl, Bayesian Inference, Bayes' Theorem
Discussed in:
- Chapter 4: Probability, Probability