An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Yige-Li

Last update: Jan 4, 2023

Related tags

Overview

Neural Attention Distillation

This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

NAD: Quick start with pretrained model

We have already uploaded the all2one pretrained backdoor student model(i.e. gridTrigger WRN-16-1, target label 5) and the clean teacher model(i.e. WRN-16-1) in the path of ./weight/s_net and ./weight/t_net respectively.

For evaluating the performance of NAD, you can easily run command:

$ python main.py

where the default parameters are shown in config.py.

The trained model will be saved at the path weight/erasing_net/.tar

Please carefully read the main.py and configs.py, then change the parameters for your experiment.

Erasing Results on BadNets

Dataset	Baseline ACC	Baseline ASR	NAD ACC	NAD ASR
CIFAR-10	85.65	100.0	82.12	3.57

Training your own backdoored model

We have provided a DatasetBD Class in data_loader.py for generating training set of different backdoor attacks.

For implementing backdoor attack(e.g. GridTrigger attack), you can run the below command:

$ python train_badnet.py

This command will train the backdoored model and print clean accuracies and attack rate. You can also select the other backdoor triggers reported in the paper.

Please carefully read the train_badnet.py and configs.py, then change the parameters for your experiment.

How to get teacher model?

we obtained the teacher model by finetuning all layers of the backdoored model using 5% clean data with data augmentation techniques. In our paper, we only finetuning the backdoored model for 5~10 epochs. Please check more details of our experimental settings in section 4.1 and Appendix A; The finetuning code is easy to get by just setting all the param beta = 0, which means the distillation loss to be zero in the training process.

Other source of backdoor attacks

Attack

CL: Clean-label backdoor attacks

SIG: A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning

Paper

Barni, M., Kallas, K., & Tondi, B. (2019). > A new Backdoor Attack in CNNs by training set corruption without label poisoning. > arXiv preprint arXiv:1902.11237 superimposed sinusoidal backdoor signal with default parameters """ alpha = 0.2 img = np.float32(img) pattern = np.zeros_like(img) m = pattern.shape[1] for i in range(img.shape[0]): for j in range(img.shape[1]): for k in range(img.shape[2]): pattern[i, j] = delta * np.sin(2 * np.pi * j * f / m) img = alpha * np.uint32(img) + (1 - alpha) * pattern img = np.uint8(np.clip(img, 0, 255)) # if debug: # cv2.imshow('planted image', img) # cv2.waitKey() return img ">

## reference code
def plant_sin_trigger(img, delta=20, f=6, debug=False):
    """
    Implement paper:
    > Barni, M., Kallas, K., & Tondi, B. (2019).
    > A new Backdoor Attack in CNNs by training set corruption without label poisoning.
    > arXiv preprint arXiv:1902.11237
    superimposed sinusoidal backdoor signal with default parameters
    """
    alpha = 0.2
    img = np.float32(img)
    pattern = np.zeros_like(img)
    m = pattern.shape[1]
    for i in range(img.shape[0]):
        for j in range(img.shape[1]):
            for k in range(img.shape[2]):
                pattern[i, j] = delta * np.sin(2 * np.pi * j * f / m)

    img = alpha * np.uint32(img) + (1 - alpha) * pattern
    img = np.uint8(np.clip(img, 0, 255))

    #     if debug:
    #         cv2.imshow('planted image', img)
    #         cv2.waitKey()

    return img

Refool: Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

Defense

MCR: Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

Fine-tuning & Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Library

Note: TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.

Backdoors 101 — is a PyTorch framework for state-of-the-art backdoor defenses and attacks on deep learning models.

References

If you find this code is useful for your research, please cite our paper

@inproceedings{li2021neural,
  title={Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks},
  author={Li, Yige and Lyu, Xixiang and Koren, Nodens and Lyu, Lingjuan and Li, Bo and Ma, Xingjun},
  booktitle={ICLR},
  year={2021}
}

Contacts

If you have any questions, leave a message below with GitHub.

Comments

Results of BadNet and Fine-tuning

Hi,

Thanks for providing the code for us. I tried to rerun the code to replicate the baseline for further improvement. But the results are pretty different. My major changes focus on two aspects:

Fix the random seeds at main function of train_badnet.py and main.py:

def main():
    seed = 93
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

Disable the default path of t_model and s_model in config.py so I can retrain the model.

I don't change any hyperparameters and the script for train_badnet.py is

OUTPUT=results/nad/backdoor/

python train_badnet.py \
--checkpoint_root $OUTPUT \
--log_root $OUTPUT \

The result in csv file is:

epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
1,57.01111111111111,99.67777777777778,0.00816304203728214
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
2,62.666666666666664,99.77777777777777,0.006588545432469497
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
3,70.1,99.4888888888889,0.01506445547990087
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
4,75.15555555555555,99.91111111111111,0.002418477892476302
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
5,77.68888888888888,99.9888888888889,0.00043397019659460056
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
6,75.83333333333333,99.84444444444445,0.0036796125145895833
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
7,75.61111111111111,99.9888888888889,0.0006442612384966601
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
8,77.87777777777778,99.9888888888889,0.0006178246608526226
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss
9,76.8,99.6,0.012300759838609438

The BadNet accuracy is 76.80 and ASR is 99.60.

Then I tried fine-tune baseline, the script is

OUTPUT=results/nad/finetune

python main.py \
--s_model results/nad/backdoor/WRN-16-1-S-model_best.pth.tar \
--checkpoint_root $OUTPUT \
--log_root $OUTPUT \
--beta1 0 \
--beta2 0 \
--beta3 0 \

The results in the csv file are

epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
0,76.8000,99.6000,0.0123,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
1,69.4444,11.0333,7.8091,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
2,62.1333,0.1333,9.8440,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
3,78.8222,4.1444,7.9249,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
4,80.4556,3.9778,8.1341,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
5,79.7333,4.9778,7.5043,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
6,80.9667,3.3667,8.6271,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
7,81.3333,4.1111,8.5744,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
8,81.4778,3.8222,8.4739,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
9,80.8556,4.2667,8.1581,0.0000
epoch,test_clean_acc,test_bad_acc,test_bad_cls_loss,test_bad_at_loss
10,81.7667,3.0667,9.1183,0.0000

The accuracy is 81.76 and ASR is 3.06.

The results are different from those on the github in two aspects:

The accuracy of the backdoored model is much lower (85.65~76.80)
The ASR of fine-tuned model is pretty different (18.13~3.06). The ASR result of my replication is low enough.

I run the code multiple times and the results are consistent.

opened by ziqi-zhang 10

the loss function is not useful in the experiment?

Hello, I'm very interested in this paper, when I see the main.py, the three at_loss are used .detach() to out of the calculate graph in the PyTorch. So I delete the at1_loss、at2_loss、at3_loss in the loss function. But, when I run the changed code, the ASP is still very low. I think the at_loss is not useful in the code. The training dataset in the main code is the clean dataset, not the backdoor dataset, so the NAD ASR is very low. However, the training dataset in the train_badnets code uses the backdoor dataset, so the baseline ASR is high. I changed the training dataset in the main code to the backdoor dataset. Unfortunately, the NAD is not useful in the backdoor dataset.

opened by xiajun112233 7
How to get the teacher model?

Hello, I have read your paper and learnt about NAD, but I have a question that how to get the teacher model? In your paper, it say "The teacher network can be obtained by an independent finetune process on the same clean data", but I have no idea how to "finetune". If you just finetune the last layer or all layers using clean data, I think it's hard to get a clean network, because the gradient of loss function in the clean data may be very low with convergency model

opened by tonggege001 3
The reproducibility of experiments in the paper
Hi author, I have some questions about this paper and the public code.

Your v1 version of arXiv paper is 15 Jan 2021 (https://arxiv.org/abs/2101.05930v1), and your first commit on the Github is 21 Jan 2021 (https://github.com/bboylyg/NAD/commit/a5a10f1ee746d51e75945821027ade76e88e2901).

The most important loss that you proposed in your paper is Eq(3), which is controlled by the hyperparameter beta.

However, in your first commit code, this part is written as :

cls_loss = criterionCls(output_s, target) at3_loss = criterionAT(activation3_s, activation3_t).detach() * opt.beta3 at2_loss = criterionAT(activation2_s, activation2_t).detach() * opt.beta2 at1_loss = criterionAT(activation1_s, activation1_t).detach() * opt.beta1 at_loss = at1_loss + at2_loss + at3_loss + cls_loss

The detach() in Pytorch is used to return a new Tensor, detached from the current graph (see the doc).. So if the model uses this at_loss to optimize, the at1_loss, at2_loss, at3_loss will not contribute anything to the training of the model. The user @zeabin has submit the issue https://github.com/bboylyg/NAD/issues/8 and fortunately, you fixed it in https://github.com/bboylyg/NAD/commit/6907ea2fb9c8445fb408fd77627e4f990ad6d9be at 10 Jan 2022:

at3_loss = criterionAT(activation3_s, activation3_t.detach()) * opt.beta3 at2_loss = criterionAT(activation2_s, activation2_t.detach()) * opt.beta2 at1_loss = criterionAT(activation1_s, activation1_t.detach()) * opt.beta1

Based on the above facts, my question is whether the results in your paper are based on the first commit code or the fixed code.

If the experiments were run with the correct code, why the first commit is wrong?

If the experiments were run with the wrong code, the idea does not work in your paper, it's just a fine-tuning. I have valid reasons to doubt the reliability of the results.

For the papers published from 21 Jan 2021 -- 10 Jan 2022, and use NAD as a comparison, whether the results of these papers are reliable. Because during this time, the code in this repository is wrong.

I will appreciate it if you can solve my above questions.
opened by neiljohn1990 2
What config did you use to have the model return the activations?

Hello,

I'm trying to understand why your model is returning 3 activations along with the outputs when running inferences.

line 27 of main.py: activation1_s, activation2_s, activation3_s, output_s = snet(img)

Was there some thought process in returning the last 3 activations?

opened by tituslhy 1
A few question
Hello. I am looking for the possible solution for backdoor attack. I've read the interesting and promising research, but still in confusion.

Why distillation with pruned model as teacher can purify the poisoned model, do you have more detailed insights?

Have you have tried bigger model and dataset?

There is an attack against the pruning-defense(through pruning in the training period, however unrealistic in real world), what do you think of such attackers which are specially designed for pruning.

Looking for your reply.
opened by zhaitongqing233 1
How does the attention loss work

Hi, thanks for sharing the code.

I notice that detach() is called before backward() for attention loss in train_step and the back propagation should not go through attention loss. So how can the attention loss work? https://github.com/bboylyg/NAD/blob/d61e4d74ee697f125336bfc42a03c707679071a6/main.py#L30-L34

opened by zeabin 1
performance on GTSRB

Hi! Thanks for your great work!

Have you tested the defense effect of NAD against attacks other than refool on gtsrb ? such as badnets, blend, sig, If so, could you share the experimental results. I'd appreciate it very much!

best!

opened by bingxumu 1
How to train CL and Refool backdoored model?

Hello, I'm very interested in this paper, and I try to reproduce the work. When I tried to train the backdoored model on CIFAR-10, I followed the advice in readme.md and successfully trained the BadNets, Trojan, Blend and SIG backdoored models mentioned in the paper . However, when I tried to train the Clean-label and Refool backdoored models, I found that it seemed impossible to do this by simply modifying parameters in configs.py. Then I went to the CL and Refool links mentioned in readme.md, but I still didn't know how to implement these two backdoor attacks. Now, I have absolutely no idea how to train these two backdoored models, if you could give me some advice and help I would be very grateful

opened by Urdarbrunner 1

RuntimeError: view size is not compatible with input tensor's size and stride

Hello,

When I deploy and try to run the codes, here comes an issue:

----------- Train Initialization --------------
epoch: 0  lr: 0.1000
Traceback (most recent call last):
  File "main.py", line 204, in <module>
    main()
  File "main.py", line 201, in main
    train(opt)
  File "main.py", line 171, in train
    test(opt, test_clean_loader, test_bad_loader, nets,
  File "main.py", line 72, in test
    prec1, prec5 = accuracy(output_s, target, topk=(1, 5))
  File "/home/longkangli/NAD/utils/util.py", line 63, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Then I change the code according to the error messages. In the File: "~/NAD/utils/util.py", line 63, in accuracy change: "correct_k = correct[:k].view(-1).float().sum(0)" to "correct_k = correct[:k].contiguous().view(-1).float().sum(0)". Then it works.

My environment: python3.8, py-torch1.7, cuda10.2.

Not sure if the problem comes from the different versions of environments. Anyway~

Regards.

opened by longkang1 1

Owner

Yige-Li

CV&DeepLearning&Security

GitHub

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

17 May 16, 2022

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

248 Dec 17, 2022

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

127 Dec 28, 2022

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

6 Sep 21, 2022

Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

80 Dec 16, 2022

This is the repository for paper NEEDLE: Towards Non-invertible Backdoor Attack to Deep Learning Models.

1 Oct 25, 2021

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

SD-AANet The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation" [arxiv] Overview confi

9 Nov 7, 2022

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

137 Dec 23, 2022

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

1.4k Dec 23, 2022

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

329 Dec 30, 2022

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

47 Nov 6, 2022

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runtime Web.

58 Dec 18, 2022

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Lightweight-Deep-CNN-for-Natural-Image-Matting-via-Similarity-Preserving-Knowledge-Distillation Introduction Accepted at IEEE Signal Processing Letter

19 Jun 7, 2022

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

128 Dec 24, 2022

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

59 Dec 17, 2022

The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

284 Dec 21, 2022