Nicholas Carlini & Andreas Terzis (2022)
International Conference on Learning Representations.
URL: https://arxiv.org/abs/2106.09667
Abstract. Demonstrates that contrastive vision-language models (CLIP-style) are highly vulnerable to data poisoning. Modifying as few as 0.01% of the training image-caption pairs is sufficient to install a targeted backdoor that, at inference time, makes the poisoned model misclassify a chosen class as another chosen class. The paper highlights the systemic risk that web-scraped multimodal training corpora cannot be vetted pair by pair and that the standard scaling-by-noisy-data recipe is incompatible with strong assurance against targeted poisoning.
Tags: adversarial safety poisoning clip