The authors propose a solution to the inherent vulnerability of deep learning classifiers to adversarial examples. These are minute, human-imperceptible image perturbations that trick machine learning models into misclassifying the modified input. This paper introduces the problem of asymmetric certified robustness, which requires certified robustness for only one class, reflecting real-world adversarial scenarios.
The paper argues conventional certified robustness methods have various shortcomings, including slow execution, poor scalability, and non-determinism. To address these drawbacks, the authors propose the concept of asymmetric certified robustness that sharpens the focus on one sensitive class and maintains high clean accuracy for all other inputs. The implication is that not all classes need such rigorous certification, and that it could be more efficient and precise to focus on the classes that present the most vulnerability.
Building upon this concept, they develop feature-convex neural networks for addressing asymmetric robustness. The networks entail a Lipschitz-continuous feature map coupled with an Input-Convex Neural Network (ICNN) which enforces convexity from input to output logit through composing ReLU nonlinearities with nonnegative weight matrices.
The feature-convex classifiers provide quick computation of certified radii for all $ell_p$-norms for sensitive-class inputs. By globally underapproximating convex functions through any tangent plane, a certified radius is provided in the feature space, which is then propagated to the input space by Lipschitzness. This architecture only produces certificates for the positive-logit class $g(varphi(x)) > 0$ with respect to the asymmetric nature of their methodology.
The certified radius formula they arrive at holds for any $ell_p$-norm and are computed swiftly, decisively, and are easily scalable. Compared to others, these certificates can be calculated in milliseconds and are applicable to any network size. In contrast, existing benchmark techniques can take several seconds to certify even small networks, and are inherently non-deterministic with certificates holding only with high probability.
The authors encourage further exploration in the field, presenting an open problem of achieving perfect training accuracy for the CIFAR-10 cats-vs-dogs dataset using an input-convex classifier. Despite their approach achieving only 73.4% training accuracy, they suggest theoretical potential in ICNNs, invigorating future research. They aspire for debut architectures that are certifiable in the asymmetric robustness framework and hope their feature-convex classifier would serve as inspiration. The methodology was presented at the 37th Conference on Neural Information Processing Systems and is based on the paper “Asymmetric Certified Robustness via Feature-Convex Neural Networks”.