Privacy in machine learning concerns the protection of personal data at every stage of an ML pipeline, collection, training, fine-tuning, inference and deployment. The field synthesises ideas from cryptography, statistics, and law (GDPR, HIPAA, the EU AI Act, the California Consumer Privacy Act) into a stack of complementary techniques.
Threat models
The principal privacy attacks against ML systems:
Membership inference, given access to a trained model, determine whether a specific record was in its training set.
Attribute inference, recover sensitive attributes (sex, ethnicity, health status) from a model's behaviour.
Model inversion, reconstruct representative training examples from model parameters or queries.
Training-data extraction, recover near-verbatim training examples from a generative model (Carlini et al. 2021 demonstrated this on GPT-2; subsequent work scaled to GPT-3.5 and beyond).
Model stealing, extract enough behavioural information from a query API to reconstruct an approximate copy.
Defensive techniques
Differential privacy (DP), Dwork's formal guarantee that any single training example has bounded influence on the output. Implemented in training via DP-SGD (Abadi et al. 2016). The gold standard for provable training-data privacy.
Federated learning, train across many devices without centralising raw data; gradients are aggregated centrally. Protects raw data but not always gradient privacy without additional DP.
Secure multi-party computation (MPC), cryptographic protocol allowing parties to jointly compute over private inputs. Practical for some sub-routines, not yet for full LLM training.
Homomorphic encryption, perform computation on encrypted data. Currently 1000× too slow for production LLM inference but improving.
Machine unlearning, efficiently remove a specific record's influence from a trained model, as required by GDPR's "right to erasure".
Special case: LLMs
LLMs trained on web-scale corpora pose unusual privacy challenges:
Memorisation, Carlini et al. (2023) extracted thousands of verbatim training examples from production LLMs.
Linkage, even non-verbatim leaks can recombine to identify individuals.
Right to be forgotten, exact unlearning from a frontier LLM is currently infeasible; approximate methods (Eldan & Russinovich's "Who's Harry Potter?") only partially work.
Status
As of 2026, a layered privacy stack is the norm for sensitive ML deployments: differentially private training where feasible, federated learning for on-device training, MPC for narrowly-scoped joint computation, and unlearning research as ongoing legal-compliance scaffolding. No widely-deployed frontier LLM trains with full DP, although fine-tuning with DP is increasingly common (Apple Intelligence, for example, applies DP fine-tuning).
References
Dwork, Roth (2014). The Algorithmic Foundations of Differential Privacy.
Abadi et al. (2016). Deep Learning with Differential Privacy.
Carlini et al. (2021, 2023). Extracting Training Data from Large Language Models.
GDPR Article 17 (right to erasure).
Related terms: Differential Privacy, Membership Inference Attacks, Model Stealing / Distillation Attacks, Federated Learning
Discussed in:
- Chapter 14: Generative Models, Privacy in ML