Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Overview

Universal Adversarial Triggers for Attacking and Analyzing NLP

This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for Attacking and Analyzing NLP. This repository contains the code for replicating our experiments and creating universal triggers.

Read our blog and our paper for more information on the method.

Dependencies

This code is written using PyTorch. The code for GPT-2 is based on HuggingFace's Transformer repo and the experiments on SQuAD, SNLI, and SST use AllenNLP. The code is flexible and should be generally applicable to most models (especially if its in AllenNLP), i.e., you can easily extend this code to work for the model or task you want.

The code is made to run on GPU, and a GPU is likely necessary due to the costs of running the larger models. I used one GTX 1080 for all the experiments; most experiments run in a few minutes. It is possible to run the SST and SNLI experiments without a GPU.

Installation

An easy way to install the code is to create a fresh anaconda environment:

conda create -n triggers python=3.6
source activate triggers
pip install -r requirements.txt

Now you should be ready to go!

Getting Started

The repository is broken down by task:

  • sst attacks sentiment analysis using the SST dataset (AllenNLP-based).
  • snli attacks natural language inference models on the SNLI dataset (AllenNLP-based).
  • squad attacks reading comprehension models using the SQuAD dataset (AllenNLP-based).
  • gpt2 attacks the GPT-2 language model using HuggingFace's model.

To get started, we recommend you start with snli or sst. In snli, we download pre-trained models (no training required) and create the triggers for the hypothesis sentence. In sst, we walk through training a simple LSTM sentiment analysis model in AllenNLP. It then creates universal adversarial triggers for that model. The code is well documented and walks you through the attack methodology.

The gradient-based attacks are written in attacks.py. The file utils.py contains the code for evaluating models, computing gradients, and evaluating the top candidates for the attack. utils.py is only used by the AllenNLP models (i.e., not for GPT-2).

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@inproceedings{Wallace2019Triggers,
  Author = {Eric Wallace and Shi Feng and Nikhil Kandpal and Matt Gardner and Sameer Singh},
  Booktitle = {Empirical Methods in Natural Language Processing},                            
  Year = {2019},
  Title = {Universal Adversarial Triggers for Attacking and Analyzing {NLP}}
}    

Contributions and Contact

This code was developed by Eric Wallace, contact available at [email protected].

If you'd like to contribute code, feel free to open a pull request. If you find an issue with the code, please open an issue.

Comments
  • accuracy not dropping but trigger keeps changing

    accuracy not dropping but trigger keeps changing

    Hi there, a little background on my project. I am currently doing a benign/malware app classifier based on API sequences, which can be quite similar to text classification (positive/negative).

    I am running the code based on sst.py. To prepare my dataset, I followed the allennlp to create instances for train and dev data. Everything seems fine when I run the main() function, the training is done but when it comes to the trigger part, the "words" seems to be changing but accuracy is not dropping. Do you have any idea why is this happening? The same behaviour can be seen with the different attacks (e.g hotflip, nearest_neighbor_grad etc..)

    Without Triggers: 0.9994070560332049 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/os/bundle;->putsparseparcelablearray, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/os/bundle;->putsparseparcelablearray, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/media/audiomanager;->adjuststreamvolume, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, lorg/apache/cordova/directorymanager;->gettempdirectorypath, landroid/hardware/sensormanager;->getsensorlist, ljava/lang/runtime;->runfinalization, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/os/environment;->isexternalstorageemulated, landroid/content/intent;->getcomponent, : 0.9997035280166024 Current Triggers: landroid/app/activity;->databaselist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/accounts/accountmanager;->getauthtoken, landroid/location/locationmanager;->removeproximityalert, landroid/os/bundle;->getparcelablearraylist, : 0.9997035280166024 Current Triggers: landroid/content/intent;->replaceextras, landroid/net/uri;->getfragment, landroid/net/uri;->getfragment, landroid/app/activitymanager;->killbackgroundprocesses, landroid/content/clipboardmanager;->getservice, landroid/os/bundle;->putparcelablearraylist, ljava/lang/system;->setsecuritymanager, : 0.9997035280166024 Current Triggers: landroid/content/res/assetmanager;->opennonassetfdnative, ljava/net/urlconnection;->getfilenamemap, ljava/net/urlconnection;->getfilenamemap, lorg/apache/xerces/impl/xmlentitymanager;->isentitydeclinexternalsubset, landroid/content/clipboardmanager;->reportprimaryclipchanged, landroid/app/fragmentmanager;->begintransaction, landroid/net/uri;->getencodedpath, : 0.9997035280166024 Current Triggers: ljava/lang/runtime;->runfinalization, landroid/net/uri;->tostring, lorg/apache/xerces/impl/xmlentitymanager;->closereaders, landroid/hardware/camera;->cancelautofocus, landroid/app/activitymanager;->getlocktaskmodestate, landroid/webkit/cookiesyncmanager;->resetsync, ljava/net/urlconnection;->getdooutput, : 0.9997035280166024 Current Triggers: landroid/content/intent;->getcomponent, landroid/bluetooth/rfcommsocket;->waitforasyncconnectnative, landroid/app/activity;->finalize, landroid/hardware/sensor;->getreportingmode, landroid/content/intent;->setdataandtype, landroid/hardware/camera;->startsmoothzoom, lorg/apache/cordova/file/directorymanager;->getfreediskspace, : 0.9997035280166024

    opened by bowtiejicode 7
  • how to slove the question?thank you

    how to slove the question?thank you

    when I run "python sst.py",this error always exist.So,please tell me how to slove this error.

    error: " from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto' "

    thank you @Eric-Wallace

    opened by Frank-LXR 6
  • getting best candidates (beam search)

    getting best candidates (beam search)

    Apologies for multiple questions, in this function (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L119) maybe I am missing something but shouldn't this line (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L138) go inside the first for loop? thanks

    opened by mehdimashayekhi 4
  • Loss thresholds for successful triggers on language models?

    Loss thresholds for successful triggers on language models?

    Hi Eric! Thanks for sharing this work. I've implemented this in Tensorflow to use with a dupe of the 124M GPT-2 model and was wondering if you could provide some details on the range of final "best loss" #s you were seeing with the smallest model and the triggers which worked (I'm working under the assumption that on a vocab size of 50k that cross entropy of ~10.8 ish would be equivalent to "random"). My current process isn't producing triggers which are successfully adversarial and I'm wondering if perhaps I'm just not finding very good triggers. Thanks!

    opened by mathemakitten 3
  • Error when running the squad script with up-to-date libraries

    Error when running the squad script with up-to-date libraries

    When I run pip install -r requirements.txt with the default requirements.txt file, I get the following error: Traceback (most recent call last): File "squad/squad.py", line 2, in <module> from allennlp.data.dataset_readers.reading_comprehension.squad import SquadReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/__init__.py", line 1, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module> from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module> from allennlp.data.instance import Instance File "/opt/conda/lib/python3.6/site-packages/allennlp/data/instance.py", line 3, in <module> from allennlp.data.fields.field import DataArray, Field File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/__init__.py", line 10, in <module> from allennlp.data.fields.knowledge_graph_field import KnowledgeGraphField File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/knowledge_graph_field.py", line 14, in <module> from allennlp.data.token_indexers.token_indexer import TokenIndexer, TokenType File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/__init__.py", line 5, in <module> from allennlp.data.token_indexers.dep_label_indexer import DepLabelIndexer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/dep_label_indexer.py", line 9, in <module> from allennlp.data.tokenizers.token import Token File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/__init__.py", line 8, in <module> from allennlp.data.tokenizers.pretrained_transformer_tokenizer import PretrainedTransformerTokenizer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/pretrained_transformer_tokenizer.py", line 5, in <module> from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto'

    So I install the latest version of allennlp with pip install allennlp --upgrade and get the following versions: allennlp-0.9.0 pytorch-transformers-1.1.0

    However, I still can't run the squad script because I get the following error: Traceback (most recent call last): File "squad.py", line 118, in <module> main() File "squad.py", line 19, in main model.eval().cuda() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda return self._apply(lambda t: t.cuda(device)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 127, in _apply self.flatten_parameters() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

    I am running the code on an AWS instance with CUDA and Python 3.6.9. I would appreciate it if you could help me figure out what I'm doing wrong.

    opened by taylorshin 3
  • Comparing model accuracies for SNLI

    Comparing model accuracies for SNLI

    When comparing the accuracy of esim-glove-snli with decomposable-attention, it seems like the the accuracy computed for the second model is not computed correctly.

    Am I missing something here?

    Here's a Colab.

    # Load model and vocab
    model = load_archive('https://allennlp.s3-us-west-2.amazonaws.com/models/esim-glove-snli- 
    2019.04.23.tar.gz').model
    model.eval().cuda()
    vocab = model.vocab
    
    # Get original accuracy before adding universal triggers
    utils.get_accuracy(model, subset_dev_dataset, vocab, trigger_token_ids=None, snli=True)
    
    100%|██████████| 53676932/53676932 [00:00<00:00, 62451888.15B/s]
    Without Triggers: 0.9095824571943527
    
    # Load model and vocab
    model2 = load_archive('https://s3-us-west-2.amazonaws.com/allennlp/models/decomposable-attention-2017.09.04.tar.gz').model
    model2.eval().cuda()
    vocab2 = model2.vocab
    
    # Get original accuracy before adding universal triggers
    utils.get_accuracy(model2, subset_dev_dataset, vocab2, trigger_token_ids=None, snli=True)
    
    100%|██████████| 38907176/38907176 [00:00<00:00, 77210263.22B/s]
    Did not use initialization regex that was passed: .*token_embedder_tokens\\\\._projection.*weight
    Without Triggers: 0.42805647341544006
    
    opened by daysm 3
  • A little confused about the training process

    A little confused about the training process

    For snli task, if i sample the subset of which the labels are 'entailment' and attack toward the same class(increase_loss=Flase), the accuracy should get higher right? cause we minimize the loss between the input and the target 'entailment', which is the true label.

    That wasn't the case though, the accuracy drops.

    opened by dwaydwaydway 3
  • How are “target_texts” generated?

    How are “target_texts” generated?

    Thank you for your wonderful work! I'd like to ask How are “target_texts” generated, in "Conditional Text Generation" ? Is there a strategy in this step?

    opened by zhaishengfang 2
  • sign from nearest neighbor attack?

    sign from nearest neighbor attack?

    Hi Wallace, thanks for sharing this work! As I am trying the different token replace strategies in the atttack.py I find a little hard to understand why for increase_loss = False, we apply, e(t+1)=e(t)+g (gradient ascent), while for increase_loss = True, we apply, e(t+1)=e(t)-g. In my understanding, the gradient descent tries to minimize the loss function? And I also try to flipping the sign the original code which gets better attack accuracy compared to the source code? Thanks!

    opened by Xinyu-ustc 2
  • get_best_candidates in snli.py

    get_best_candidates in snli.py

    Hi,

    In snli.py file, I'm wondering why get_best_candidates() function extracts the best candidate with the largest loss but at the same time, the model extracts candidates that can minimize model's loss (hotflip_attack() function with increase_loss=False)? Shouldn't the best candidate also minimize the loss? Thanks!

    opened by JieyuZhao 2
  • Questions regarding sentiment words

    Questions regarding sentiment words

    Hello Eric, in your paper you wrote that a sentiment word list is used to filter the words to substitute, but I could not find related code in the repository. Maybe the reason is that it is not important in generating the trigger words?

    opened by dugu9sword 2
Owner
Eric Wallace
Ph.D. Student at Berkeley working on ML and NLP.
Eric Wallace
Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

Robin Jia 38 Oct 16, 2022
An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Neural Attention Distillation This is an implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep

Yige-Li 84 Jan 4, 2023
Notification Triggers for Python

Notipyer Notification triggers for Python Send async email notifications via Python. Get updates/crashlogs from your scripts with ease. Installation p

Chirag Jain 17 May 16, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 1, 2023
Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

THUML @ Tsinghua University 229 Dec 23, 2022
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark Yong

null 19 Dec 17, 2022
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

null 50 Nov 26, 2022
Data augmentation for NLP, accepted at EMNLP 2021 Findings

AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni

Akbar Karimi 81 Dec 9, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
LBK 35 Dec 26, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

null 28 Dec 25, 2022
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

null 25 Dec 8, 2022
LBK 26 Dec 28, 2022
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 2, 2022
Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

Kim Seonghyeon 2.2k Jan 1, 2023
Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

Jiawei Ren 45 Dec 28, 2022
A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

3DB 112 Jan 1, 2023
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Noam Eshed 34 Oct 2, 2022