Nicholas Carlini & Andreas Terzis (2022), References, Textbook of AI

Nicholas Carlini & Andreas Terzis (2022)

International Conference on Learning Representations.

URL: https://arxiv.org/abs/2106.09667

Abstract. Demonstrates that contrastive vision-language models (CLIP-style) are highly vulnerable to data poisoning. Modifying as few as 0.01% of the training image-caption pairs is sufficient to install a targeted backdoor that, at inference time, makes the poisoned model misclassify a chosen class as another chosen class. The paper highlights the systemic risk that web-scraped multimodal training corpora cannot be vetted pair by pair and that the standard scaling-by-noisy-data recipe is incompatible with strong assurance against targeted poisoning.

Tags: adversarial safety poisoning clip

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Poisoning and Backdooring Contrastive Learning