John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, & Tom Goldstein (2023), References, Textbook of AI

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, & Tom Goldstein (2023)

International Conference on Machine Learning.

URL: https://arxiv.org/abs/2301.10226

Abstract. The canonical text-watermarking scheme for language models. At each generation step, hash the previous token to seed a pseudo-random partition of the vocabulary into "green" and "red" lists; bias the logits to prefer green tokens. The watermark is detected by counting green tokens in the candidate text and comparing the count to the chance baseline; the test has analytic Type-I and Type-II error bounds. The scheme is generation-side (no model retraining), backwards compatible with existing decoders and survives moderate paraphrasing. It is the foundation of every subsequent LLM-watermarking proposal.

Tags: safety language-models watermarking

Cited in:

Chapter 16: Ethics & Safety

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

A Watermark for Large Language Models