Adversarial attacks, efforts to deceitfully force machine learning (ML) models to make incorrect predictions, have presented a significant challenge to the safety and dependability of crucial machine learning applications. Neural networks, a form of machine learning algorithm, are especially susceptible to adversarial attacks. These attacks are especially concerning in applications such as facial recognition systems, as the system could potentially be tricked into granting unlawful access.
As a response to this adversarial threat, a team of researchers from the Weizmann Institute of Science in Israel and New York University’s Center for Data Science have developed MALT (Mesoscopic Almost Linearity Targeting), a novel technique designed specifically to tackle these attacks that take advantage of the vulnerabilities in machine learning models. The leading adversarial attack, called AutoAttack, utilises a strategy that targets classes based on their confidence level within a model. However, it’s a computationally exhausting process. The computationally complex nature of AutoAttack means it can’t target an extensive range of classes, which could potentially result in missed vulnerable classes.
MALT introduces a different approach to adversarial targeting. It borrows the concept from the theory that neural networks act almost linearly on a mesoscopic level. Rather than only relying on model confidence to determine target classes, MALT arranges potential classes based on normalised gradients to determine the classes that would require minimal modifications to be misclassified erroneously.
MALT effectively utilises the concept of ‘mesoscopic almost linearity’ to develop adversarial examples for ML models. This concept implies that the behaviour of a model can be approximated as a straight line with slight alterations to input data.
To illustrate, if we thought of the decision-making process as a landscape with hills and valleys, MALT concentrates on altering data within a small region where the landscape could be interpreted as flat.
MALT applies gradient estimation techniques to comprehend how tiny modifications to the input data could influence the model’s output. It then uses this understanding to pinpoint which pixels or features to modify in any image to achieve the desired misclassification. MALT also incorporates a process of iterative optimization, where it starts with an initial change to input data and then fine-tunes these changes based on gradient information. This iterative process continues until the model confidently identifies data as the target class.
In essence, MALT presents a considerable progression in adversarial attack techniques by providing a more efficient and effective targeting strategy. By leveraging mesoscopic almost linearity, MALT focuses on small, localized modifications to data, reducing the optimization process complexity compared to other methods that explore a broader range of changes. As a result, MALT has shown significant advantages over other existing adversarial attack methods, particularly concerning speed and efficacy.