SHIELD: Defending Deep Neural Networks from Adversarial Attacks

by Nilaksh Das, PhD student at Georgia Institute of Technology in the School of Computational Science and Engineering. Das is advised by Polo Chau.

“SHIELD is a fast and practical approach to defend deep neural networks from adversarial attacks. This work proposes a multifaceted framework which incorporates compression, randomization, model-retraining, and ensembling to make computer vision models robust to adversarial perturbations.”

Deep Neural Networks (DNNs) are a popular mathematical construct which are increasingly being used in several machine learning tasks such as image classification, object detection, and speech recognition. Models based on DNNs power many high-impact applications ranging from Apple Siri and Google Assistant in your phones to autonomous vehicles running on the streets of California.

However, it has recently been shown that DNNs are highly vulnerable to adversarial attacks. An attacker can craft malicious inputs that are inconspicuous to humans, but completely confuse a DNN model into making arbitrary predictions. For example, given a model that does very well at the task of traffic sign detection (a self-driving car would use something like this to make its decisions on the fly), an attacker can create a road sign that looks like a slightly distorted “Stop” sign to humans, but contains a concealed design that could confuse the self-driving car into thinking it is a “Max Speed 100” sign!

Screen Shot 2018-08-09 at 2.59.36 PM
Example of an adversarial attack on a traffic sign recognition model. Notice how the adversarial perturbation added to the image on the right is nearly invisible to the human eye, but completely confuses the machine learning model.

In this work, we propose a multifaceted framework called SHIELD, which is a fast and practical approach to defending DNNs from such adversarial attacks. SHIELD stands for Secure Heterogenous Image Ensemble with Localized Denoising. It incorporates image compression, randomization, model-retraining, and ensembling to make computer vision models based on DNNs more robust to adversarial perturbations.

Screen Shot 2018-08-09 at 2.59.45 PM
An overview of our multifaceted SHIELD framework. Our approach removes adversarial perturbations using Stochastic Local Quantization which results in correct predictions even on malicious inputs (top row). Since the underlying models are vaccinated (re-trained) with JPEG compressed images, it also preserves the original performance of the model (bottom row).

Adversarial Attacks: Behind the Scenes

Most adversarial attacks compute malicious inputs that fool the DNN model by using an algorithm called backpropagation, which also happens to be the same algorithm used to train the models. During training, the backpropagation algorithm is used to determine how to update the internal weights of the model in order to decrease some loss function which is specific to the task at hand. In contrast, the attacker uses the same algorithm with the objective of increasing this loss function to confuse the model. Only now, instead of updating the internal weights of the model, the attacker very slightly perturbs the input itself. Precise methods of constructing such input perturbations have been proposed in the literature that can target a fully observed model using backpropagation, but its adversarial effect can be transferred significantly even to other models which are not observed or targeted by the attack. This means, for instance, that the attacker doesn’t even need to know exactly which model a self-driving car is using. They can make a reasonable guess about the model architecture and their attack could still be effective against the car’s model.

SHIELD to the Rescue!

SHIELD introduces a novel preprocessing technique called Stochastic Local Quantization (SLQ) that leverages JPEG compression for defending DNNs from adversarial attacks. It is non-differentiable by design and hence denies the attacker a useful gradient to compute adversarial perturbations by blocking the backpropagation algorithm while the model is making predictions. JPEG is a popular compression technique which widely used to reduce the file size of an image while maintaining the image’s perceptual integrity. It does so by removing the high-frequency components in the image which are anyway ignored by the human eye. This means that the user is not readily able to tell the difference between the original image and the compressed image. Since adversarial perturbations are also inconspicuous to the human eye, we posit that this technique has the potential to remove such perturbations as well. The JPEG compression algorithm takes as input, a quality factor which is an integer ranging from 0 to 100. Higher quality factor means the image is of better quality and lesser compression is applied. Lower quality factor means the image is of lower quality and more compression is applied. SLQ leverages JPEG compression by breaking up the image into tiny square blocks and applying JPEG compression of a random quality factor to each block before it is fed into the model. This randomization makes it harder for the attacker to estimate which JPEG compression is applied at test time, rendering the attack ineffective against SLQ preprocessing.

Screen Shot 2018-08-09 at 2.59.57 PM
(Top row) An attacker uses the backpropagation algorithm on the given model to compute an adversarial perturbation that confuses the model into classifying an image of a dog as a cat. (Bottom row) SHIELD which incorporates SLQ blocks the backpropagation algorithm while making predictions so that the attacker is no longer able to compute an adversarial perturbation using traditional methods. It also removes the high-frequency adversarial perturbations that may have been computed on an undefended model, resulting in a correct prediction.

Within the SHIELD framework, we also re-train several models, each on JPEG compressed images of a specific quality factor. This is done so that the model accuracy on compressed images also increases (since the model was originally trained on non-compressed images or images with arbitrary compression). We finally use these trained models as an ensemble where the final prediction is determined by a majority vote of all the constituent models. Our empirical experiments reveal that this method is effective against various kinds of strong adversarial attacks, eliminating up to 98% of the errors introduced by gray-box attacks. Since JPEG is inherently a fast algorithm and is widely adopted with many fast hardware implementations, we show that it is also much faster than other preprocessing defenses proposed in the literature. The teaser video linked at the top discusses our results in further detail.

If you are interested in learning more about SHIELD, please refer to our full paper:

SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression

Das and his co-authors will be presenting this paper at KDD 2018 in London, United Kingdom.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.