Glossary

Differential Privacy

Also known as: DP

Differential Privacy (DP) provides the most rigorous mathematical framework for reasoning about privacy in data analysis and machine learning. A randomised algorithm $M$ satisfies $(\varepsilon, \delta)$-differential privacy if, for any two datasets $D$ and $D'$ differing in a single record, and for any set of outputs $S$:

$$P[M(D) \in S] \leq e^\varepsilon \cdot P[M(D') \in S] + \delta$$

Intuitively, the presence or absence of any single individual's data has only a bounded effect on the algorithm's output, quantified by the privacy budget $\varepsilon$. Smaller $\varepsilon$ gives stronger privacy but typically reduces utility. The parameter $\delta$ allows a small probability of privacy loss exceeding $\varepsilon$, typically set very small.

Differentially Private SGD (Abadi et al., 2016) enables deep learning with formal privacy guarantees. It clips per-example gradients to a fixed norm (bounding any individual's influence) and adds calibrated Gaussian noise at each training step. The privacy budget accumulates across iterations via the moments accountant or Rényi DP accounting, providing tight composition bounds. DP-SGD has been used to train models on sensitive medical records, genomic data, and private user data at scale.

DP is complementary to other privacy-preserving techniques. Federated learning keeps data on devices and communicates only model updates, but those updates can leak information without DP. Secure multi-party computation and homomorphic encryption allow computation on encrypted data without exposing it. Combining these techniques provides layered privacy protections. Differential privacy has become the gold standard in privacy-preserving ML, adopted by companies (Apple, Google) and governments (the US Census Bureau).

Related terms: Federated Learning

Discussed in:

Also defined in: Textbook of AI, Textbook of Medical AI