Jeremy Cohen, Elan Rosenfeld, & J. Zico Kolter (2019), References, Textbook of AI

Jeremy Cohen, Elan Rosenfeld, & J. Zico Kolter (2019)

International Conference on Machine Learning.

URL: https://arxiv.org/abs/1902.02918

Abstract. Provides the first scalable certified defence against $\ell_2$ adversarial perturbations. Constructs a smoothed classifier $g(\mathbf{x}) = \arg\max_c \Pr_{\delta\sim\mathcal{N}(0,\sigma^2 I)}[f(\mathbf{x}+\delta)=c]$ and proves a tight robustness radius for $g$ in terms of the base classifier's class-probability margin under Gaussian smoothing. Unlike empirical defences that can be broken by adaptive attacks, the certificate is mathematical: any perturbation within the certified radius provably cannot change the smoothed classifier's prediction. Randomised smoothing remains the dominant scalable certified-robustness method.

Tags: adversarial safety robustness certified

Cited in:

Chapter 16: Ethics & Safety

AI tools used: Claude (research, coding, text), ChatGPT (diagrams, images), Grammarly (editing).

Certified Adversarial Robustness via Randomized Smoothing