Ian J. Goodfellow, Jonathon Shlens, & Christian Szegedy (2015)
International Conference on Learning Representations.
URL: https://arxiv.org/abs/1412.6572
Abstract. The paper that turned adversarial examples from a curiosity into a research programme. Argues that the existence of adversarial examples is a property of high-dimensional linear models with sufficiently steep gradients, not a peculiarity of overparameterised neural networks. Introduces the Fast Gradient Sign Method (FGSM), a one-step attack along the sign of the input gradient, and demonstrates it on MNIST, CIFAR-10 and ImageNet. The canonical panda-to-gibbon example originated here. FGSM remains the standard one-step adversarial baseline.
Tags: adversarial safety robustness