Tianyu Gu, Brendan Dolan-Gavitt, & Siddharth Garg (2017), References, Textbook of AI

Tianyu Gu, Brendan Dolan-Gavitt, & Siddharth Garg (2017)

arXiv:1708.06733.

URL: https://arxiv.org/abs/1708.06733

Abstract. The canonical demonstration of backdoor attacks in deep learning. Shows that an attacker who can poison a small fraction of training data (e.g., 1% of CIFAR-10 or street-sign images) by adding a small trigger pattern (a yellow square) and a target label can produce a network that behaves normally on clean inputs and reliably misclassifies any input containing the trigger. The trigger is invisible-but-arbitrary and survives subsequent fine-tuning. The paper crystallised the supply-chain risk: any model trained on data of unverified provenance is a candidate vector.

Tags: adversarial safety poisoning

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain