Glossary

Privacy in ML

Privacy in machine learning concerns the protection of personal data at every stage of an ML pipeline, collection, training, fine-tuning, inference and deployment. The field synthesises ideas from cryptography, statistics, and law (GDPR, HIPAA, the EU AI Act, the California Consumer Privacy Act) into a stack of complementary techniques.

Threat models

The principal privacy attacks against ML systems:

  • Membership inference, given access to a trained model, determine whether a specific record was in its training set.

  • Attribute inference, recover sensitive attributes (sex, ethnicity, health status) from a model's behaviour.

  • Model inversion, reconstruct representative training examples from model parameters or queries.

  • Training-data extraction, recover near-verbatim training examples from a generative model (Carlini et al. 2021 demonstrated this on GPT-2; subsequent work scaled to GPT-3.5 and beyond).

  • Model stealing, extract enough behavioural information from a query API to reconstruct an approximate copy.

Defensive techniques

  • Differential privacy (DP), Dwork's formal guarantee that any single training example has bounded influence on the output. Implemented in training via DP-SGD (Abadi et al. 2016). The gold standard for provable training-data privacy.

  • Federated learning, train across many devices without centralising raw data; gradients are aggregated centrally. Protects raw data but not always gradient privacy without additional DP.

  • Secure multi-party computation (MPC), cryptographic protocol allowing parties to jointly compute over private inputs. Practical for some sub-routines, not yet for full LLM training.

  • Homomorphic encryption, perform computation on encrypted data. Currently 1000× too slow for production LLM inference but improving.

  • Machine unlearning, efficiently remove a specific record's influence from a trained model, as required by GDPR's "right to erasure".

Special case: LLMs

LLMs trained on web-scale corpora pose unusual privacy challenges:

  • Memorisation, Carlini et al. (2023) extracted thousands of verbatim training examples from production LLMs.

  • Linkage, even non-verbatim leaks can recombine to identify individuals.

  • Right to be forgotten, exact unlearning from a frontier LLM is currently infeasible; approximate methods (Eldan & Russinovich's "Who's Harry Potter?") only partially work.

Status

As of 2026, a layered privacy stack is the norm for sensitive ML deployments: differentially private training where feasible, federated learning for on-device training, MPC for narrowly-scoped joint computation, and unlearning research as ongoing legal-compliance scaffolding. No widely-deployed frontier LLM trains with full DP, although fine-tuning with DP is increasingly common (Apple Intelligence, for example, applies DP fine-tuning).

References

  • Dwork, Roth (2014). The Algorithmic Foundations of Differential Privacy.

  • Abadi et al. (2016). Deep Learning with Differential Privacy.

  • Carlini et al. (2021, 2023). Extracting Training Data from Large Language Models.

  • GDPR Article 17 (right to erasure).

Related terms: Differential Privacy, Membership Inference Attacks, Model Stealing / Distillation Attacks, Federated Learning

Discussed in:

This site is currently in Beta. Contact: Chris Paton

Textbook of Usability · Textbook of Digital Health

Auckland Maths and Science Tutoring

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).