Jeremy Cohen, Elan Rosenfeld, & J. Zico Kolter (2019)
International Conference on Machine Learning.
URL: https://arxiv.org/abs/1902.02918
Abstract. Provides the first scalable certified defence against $\ell_2$ adversarial perturbations. Constructs a smoothed classifier $g(\mathbf{x}) = \arg\max_c \Pr_{\delta\sim\mathcal{N}(0,\sigma^2 I)}[f(\mathbf{x}+\delta)=c]$ and proves a tight robustness radius for $g$ in terms of the base classifier's class-probability margin under Gaussian smoothing. Unlike empirical defences that can be broken by adaptive attacks, the certificate is mathematical: any perturbation within the certified radius provably cannot change the smoothed classifier's prediction. Randomised smoothing remains the dominant scalable certified-robustness method.
Tags: adversarial safety robustness certified
Cited in: