Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Eric Wallace

Last update: Dec 17, 2022

Related tags

Deep Learning universal-triggers

Overview

Universal Adversarial Triggers for Attacking and Analyzing NLP

This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for Attacking and Analyzing NLP. This repository contains the code for replicating our experiments and creating universal triggers.

Read our blog and our paper for more information on the method.

Dependencies

This code is written using PyTorch. The code for GPT-2 is based on HuggingFace's Transformer repo and the experiments on SQuAD, SNLI, and SST use AllenNLP. The code is flexible and should be generally applicable to most models (especially if its in AllenNLP), i.e., you can easily extend this code to work for the model or task you want.

The code is made to run on GPU, and a GPU is likely necessary due to the costs of running the larger models. I used one GTX 1080 for all the experiments; most experiments run in a few minutes. It is possible to run the SST and SNLI experiments without a GPU.

Installation

An easy way to install the code is to create a fresh anaconda environment:

conda create -n triggers python=3.6
source activate triggers
pip install -r requirements.txt

Now you should be ready to go!

Getting Started

The repository is broken down by task:

sst attacks sentiment analysis using the SST dataset (AllenNLP-based).
snli attacks natural language inference models on the SNLI dataset (AllenNLP-based).
squad attacks reading comprehension models using the SQuAD dataset (AllenNLP-based).
gpt2 attacks the GPT-2 language model using HuggingFace's model.

To get started, we recommend you start with snli or sst. In snli, we download pre-trained models (no training required) and create the triggers for the hypothesis sentence. In sst, we walk through training a simple LSTM sentiment analysis model in AllenNLP. It then creates universal adversarial triggers for that model. The code is well documented and walks you through the attack methodology.

The gradient-based attacks are written in attacks.py. The file utils.py contains the code for evaluating models, computing gradients, and evaluating the top candidates for the attack. utils.py is only used by the AllenNLP models (i.e., not for GPT-2).

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@inproceedings{Wallace2019Triggers,
  Author = {Eric Wallace and Shi Feng and Nikhil Kandpal and Matt Gardner and Sameer Singh},
  Booktitle = {Empirical Methods in Natural Language Processing},                            
  Year = {2019},
  Title = {Universal Adversarial Triggers for Attacking and Analyzing {NLP}}
}

Contributions and Contact

This code was developed by Eric Wallace, contact available at [email protected].

If you'd like to contribute code, feel free to open a pull request. If you find an issue with the code, please open an issue.

Comments

accuracy not dropping but trigger keeps changing

Hi there, a little background on my project. I am currently doing a benign/malware app classifier based on API sequences, which can be quite similar to text classification (positive/negative).

I am running the code based on sst.py. To prepare my dataset, I followed the allennlp to create instances for train and dev data. Everything seems fine when I run the main() function, the training is done but when it comes to the trigger part, the "words" seems to be changing but accuracy is not dropping. Do you have any idea why is this happening? The same behaviour can be seen with the different attacks (e.g hotflip, nearest_neighbor_grad etc..)

Without Triggers: 0.9994070560332049 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/os/bundle;->putsparseparcelablearray, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/os/bundle;->putsparseparcelablearray, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/media/audiomanager;->adjuststreamvolume, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, lorg/apache/cordova/directorymanager;->gettempdirectorypath, landroid/hardware/sensormanager;->getsensorlist, ljava/lang/runtime;->runfinalization, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/os/environment;->isexternalstorageemulated, landroid/content/intent;->getcomponent, : 0.9997035280166024 Current Triggers: landroid/app/activity;->databaselist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/accounts/accountmanager;->getauthtoken, landroid/location/locationmanager;->removeproximityalert, landroid/os/bundle;->getparcelablearraylist, : 0.9997035280166024 Current Triggers: landroid/content/intent;->replaceextras, landroid/net/uri;->getfragment, landroid/net/uri;->getfragment, landroid/app/activitymanager;->killbackgroundprocesses, landroid/content/clipboardmanager;->getservice, landroid/os/bundle;->putparcelablearraylist, ljava/lang/system;->setsecuritymanager, : 0.9997035280166024 Current Triggers: landroid/content/res/assetmanager;->opennonassetfdnative, ljava/net/urlconnection;->getfilenamemap, ljava/net/urlconnection;->getfilenamemap, lorg/apache/xerces/impl/xmlentitymanager;->isentitydeclinexternalsubset, landroid/content/clipboardmanager;->reportprimaryclipchanged, landroid/app/fragmentmanager;->begintransaction, landroid/net/uri;->getencodedpath, : 0.9997035280166024 Current Triggers: ljava/lang/runtime;->runfinalization, landroid/net/uri;->tostring, lorg/apache/xerces/impl/xmlentitymanager;->closereaders, landroid/hardware/camera;->cancelautofocus, landroid/app/activitymanager;->getlocktaskmodestate, landroid/webkit/cookiesyncmanager;->resetsync, ljava/net/urlconnection;->getdooutput, : 0.9997035280166024 Current Triggers: landroid/content/intent;->getcomponent, landroid/bluetooth/rfcommsocket;->waitforasyncconnectnative, landroid/app/activity;->finalize, landroid/hardware/sensor;->getreportingmode, landroid/content/intent;->setdataandtype, landroid/hardware/camera;->startsmoothzoom, lorg/apache/cordova/file/directorymanager;->getfreediskspace, : 0.9997035280166024

opened by bowtiejicode 7
how to slove the question?thank you

when I run "python sst.py",this error always exist.So,please tell me how to slove this error.

error: " from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto' "

thank you @Eric-Wallace

opened by Frank-LXR 6
getting best candidates (beam search)

Apologies for multiple questions, in this function (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L119) maybe I am missing something but shouldn't this line (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L138) go inside the first for loop? thanks

opened by mehdimashayekhi 4
Loss thresholds for successful triggers on language models?

Hi Eric! Thanks for sharing this work. I've implemented this in Tensorflow to use with a dupe of the 124M GPT-2 model and was wondering if you could provide some details on the range of final "best loss" #s you were seeing with the smallest model and the triggers which worked (I'm working under the assumption that on a vocab size of 50k that cross entropy of ~10.8 ish would be equivalent to "random"). My current process isn't producing triggers which are successfully adversarial and I'm wondering if perhaps I'm just not finding very good triggers. Thanks!

opened by mathemakitten 3
Error when running the squad script with up-to-date libraries

When I run pip install -r requirements.txt with the default requirements.txt file, I get the following error: Traceback (most recent call last): File "squad/squad.py", line 2, in <module> from allennlp.data.dataset_readers.reading_comprehension.squad import SquadReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/__init__.py", line 1, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module> from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module> from allennlp.data.instance import Instance File "/opt/conda/lib/python3.6/site-packages/allennlp/data/instance.py", line 3, in <module> from allennlp.data.fields.field import DataArray, Field File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/__init__.py", line 10, in <module> from allennlp.data.fields.knowledge_graph_field import KnowledgeGraphField File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/knowledge_graph_field.py", line 14, in <module> from allennlp.data.token_indexers.token_indexer import TokenIndexer, TokenType File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/__init__.py", line 5, in <module> from allennlp.data.token_indexers.dep_label_indexer import DepLabelIndexer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/dep_label_indexer.py", line 9, in <module> from allennlp.data.tokenizers.token import Token File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/__init__.py", line 8, in <module> from allennlp.data.tokenizers.pretrained_transformer_tokenizer import PretrainedTransformerTokenizer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/pretrained_transformer_tokenizer.py", line 5, in <module> from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto'

So I install the latest version of allennlp with pip install allennlp --upgrade and get the following versions: allennlp-0.9.0 pytorch-transformers-1.1.0

However, I still can't run the squad script because I get the following error: Traceback (most recent call last): File "squad.py", line 118, in <module> main() File "squad.py", line 19, in main model.eval().cuda() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda return self._apply(lambda t: t.cuda(device)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 127, in _apply self.flatten_parameters() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I am running the code on an AWS instance with CUDA and Python 3.6.9. I would appreciate it if you could help me figure out what I'm doing wrong.

opened by taylorshin 3

Comparing model accuracies for SNLI

When comparing the accuracy of esim-glove-snli with decomposable-attention, it seems like the the accuracy computed for the second model is not computed correctly.

Am I missing something here?

Here's a Colab.

# Load model and vocab
model = load_archive('https://allennlp.s3-us-west-2.amazonaws.com/models/esim-glove-snli- 
2019.04.23.tar.gz').model
model.eval().cuda()
vocab = model.vocab

# Get original accuracy before adding universal triggers
utils.get_accuracy(model, subset_dev_dataset, vocab, trigger_token_ids=None, snli=True)

100%|██████████| 53676932/53676932 [00:00<00:00, 62451888.15B/s]
Without Triggers: 0.9095824571943527

# Load model and vocab
model2 = load_archive('https://s3-us-west-2.amazonaws.com/allennlp/models/decomposable-attention-2017.09.04.tar.gz').model
model2.eval().cuda()
vocab2 = model2.vocab

# Get original accuracy before adding universal triggers
utils.get_accuracy(model2, subset_dev_dataset, vocab2, trigger_token_ids=None, snli=True)

100%|██████████| 38907176/38907176 [00:00<00:00, 77210263.22B/s]
Did not use initialization regex that was passed: .*token_embedder_tokens\\\\._projection.*weight
Without Triggers: 0.42805647341544006

opened by daysm 3

A little confused about the training process

For snli task, if i sample the subset of which the labels are 'entailment' and attack toward the same class(increase_loss=Flase), the accuracy should get higher right? cause we minimize the loss between the input and the target 'entailment', which is the true label.

That wasn't the case though, the accuracy drops.

opened by dwaydwaydway 3
How are “target_texts” generated?

Thank you for your wonderful work! I'd like to ask How are “target_texts” generated, in "Conditional Text Generation" ? Is there a strategy in this step?

opened by zhaishengfang 2
sign from nearest neighbor attack?

Hi Wallace, thanks for sharing this work! As I am trying the different token replace strategies in the atttack.py I find a little hard to understand why for increase_loss = False, we apply, e(t+1)=e(t)+g (gradient ascent), while for increase_loss = True, we apply, e(t+1)=e(t)-g. In my understanding, the gradient descent tries to minimize the loss function? And I also try to flipping the sign the original code which gets better attack accuracy compared to the source code? Thanks!

opened by Xinyu-ustc 2
get_best_candidates in snli.py

Hi,

In snli.py file, I'm wondering why get_best_candidates() function extracts the best candidate with the largest loss but at the same time, the model extracts candidates that can minimize model's loss (hotflip_attack() function with increase_loss=False)? Shouldn't the best candidate also minimize the loss? Thanks!

opened by JieyuZhao 2
Questions regarding sentiment words

Hello Eric, in your paper you wrote that a sentiment word list is used to filter the words to substitute, but I could not find related code in the repository. Maybe the reason is that it is not important in generating the trigger words?

opened by dugu9sword 2

Owner

Eric Wallace

Ph.D. Student at Berkeley working on ML and NLP.

GitHub

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

38 Oct 16, 2022

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

84 Jan 4, 2023

Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

17 May 16, 2022

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

11 Jan 1, 2023

Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

229 Dec 23, 2022

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark Yong

19 Dec 17, 2022

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022

Data augmentation for NLP, accepted at EMNLP 2021 Findings

AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni

81 Dec 9, 2022

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

53 Jul 4, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

32 Dec 13, 2022

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

28 Dec 25, 2022

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2：ImageNet无限制对抗攻击介绍：深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

25 Dec 8, 2022

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

Causal-Adversarial-Instruments This is a PyTorch Implementation code for develop

26 Dec 28, 2022

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Related tags

Overview

Universal Adversarial Triggers for Attacking and Analyzing NLP

Dependencies

Installation

Getting Started

References

Contributions and Contact

Comments

Owner

Eric Wallace

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Notification Triggers for Python

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

Code release for Universal Domain Adaptation(CVPR 2019)

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

Data augmentation for NLP, accepted at EMNLP 2021 Findings

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

A framework for analyzing computer vision models with simulated data

Analyzing basic network responses to novel classes