Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, & Boaz Barak (2023), References, Textbook of AI

Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, & Boaz Barak (2023)

arXiv:2311.04378.

URL: https://arxiv.org/abs/2311.04378

Abstract. A theoretical and empirical analysis of the limits of LLM watermarking. The authors prove that any "strong" watermark, one that survives a polynomial-time adversary with paraphrasing access, is impossible for generative models that approximate the natural-language distribution closely enough. They demonstrate the result empirically by breaking the published Kirchenbauer-style watermarks via adversarial paraphrasing at modest compute cost. The paper sharpened the policy debate around mandatory AI-content watermarking and the limits of detection-based governance approaches.

Tags: safety watermarking language-models

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Watermarks in the Sand - Impossibility of Strong Watermarking for Generative Models