Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Overview

Universal Adversarial Triggers for Attacking and Analyzing NLP

This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for Attacking and Analyzing NLP. This repository contains the code for replicating our experiments and creating universal triggers.

Read our blog and our paper for more information on the method.

Dependencies

This code is written using PyTorch. The code for GPT-2 is based on HuggingFace's Transformer repo and the experiments on SQuAD, SNLI, and SST use AllenNLP. The code is flexible and should be generally applicable to most models (especially if its in AllenNLP), i.e., you can easily extend this code to work for the model or task you want.

The code is made to run on GPU, and a GPU is likely necessary due to the costs of running the larger models. I used one GTX 1080 for all the experiments; most experiments run in a few minutes. It is possible to run the SST and SNLI experiments without a GPU.

Installation

An easy way to install the code is to create a fresh anaconda environment:

conda create -n triggers python=3.6
source activate triggers
pip install -r requirements.txt

Now you should be ready to go!

Getting Started

The repository is broken down by task:

  • sst attacks sentiment analysis using the SST dataset (AllenNLP-based).
  • snli attacks natural language inference models on the SNLI dataset (AllenNLP-based).
  • squad attacks reading comprehension models using the SQuAD dataset (AllenNLP-based).
  • gpt2 attacks the GPT-2 language model using HuggingFace's model.

To get started, we recommend you start with snli or sst. In snli, we download pre-trained models (no training required) and create the triggers for the hypothesis sentence. In sst, we walk through training a simple LSTM sentiment analysis model in AllenNLP. It then creates universal adversarial triggers for that model. The code is well documented and walks you through the attack methodology.

The gradient-based attacks are written in attacks.py. The file utils.py contains the code for evaluating models, computing gradients, and evaluating the top candidates for the attack. utils.py is only used by the AllenNLP models (i.e., not for GPT-2).

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@inproceedings{Wallace2019Triggers,
  Author = {Eric Wallace and Shi Feng and Nikhil Kandpal and Matt Gardner and Sameer Singh},
  Booktitle = {Empirical Methods in Natural Language Processing},                            
  Year = {2019},
  Title = {Universal Adversarial Triggers for Attacking and Analyzing {NLP}}
}    

Contributions and Contact

This code was developed by Eric Wallace, contact available at [email protected].

If you'd like to contribute code, feel free to open a pull request. If you find an issue with the code, please open an issue.

Comments
  • accuracy not dropping but trigger keeps changing

    accuracy not dropping but trigger keeps changing

    Hi there, a little background on my project. I am currently doing a benign/malware app classifier based on API sequences, which can be quite similar to text classification (positive/negative).

    I am running the code based on sst.py. To prepare my dataset, I followed the allennlp to create instances for train and dev data. Everything seems fine when I run the main() function, the training is done but when it comes to the trigger part, the "words" seems to be changing but accuracy is not dropping. Do you have any idea why is this happening? The same behaviour can be seen with the different attacks (e.g hotflip, nearest_neighbor_grad etc..)

    Without Triggers: 0.9994070560332049 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/os/bundle;->putsparseparcelablearray, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/os/bundle;->putsparseparcelablearray, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/media/audiomanager;->adjuststreamvolume, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, lorg/apache/cordova/directorymanager;->gettempdirectorypath, landroid/hardware/sensormanager;->getsensorlist, ljava/lang/runtime;->runfinalization, : 0.9997035280166024 Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/os/environment;->isexternalstorageemulated, landroid/content/intent;->getcomponent, : 0.9997035280166024 Current Triggers: landroid/app/activity;->databaselist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/accounts/accountmanager;->getauthtoken, landroid/location/locationmanager;->removeproximityalert, landroid/os/bundle;->getparcelablearraylist, : 0.9997035280166024 Current Triggers: landroid/content/intent;->replaceextras, landroid/net/uri;->getfragment, landroid/net/uri;->getfragment, landroid/app/activitymanager;->killbackgroundprocesses, landroid/content/clipboardmanager;->getservice, landroid/os/bundle;->putparcelablearraylist, ljava/lang/system;->setsecuritymanager, : 0.9997035280166024 Current Triggers: landroid/content/res/assetmanager;->opennonassetfdnative, ljava/net/urlconnection;->getfilenamemap, ljava/net/urlconnection;->getfilenamemap, lorg/apache/xerces/impl/xmlentitymanager;->isentitydeclinexternalsubset, landroid/content/clipboardmanager;->reportprimaryclipchanged, landroid/app/fragmentmanager;->begintransaction, landroid/net/uri;->getencodedpath, : 0.9997035280166024 Current Triggers: ljava/lang/runtime;->runfinalization, landroid/net/uri;->tostring, lorg/apache/xerces/impl/xmlentitymanager;->closereaders, landroid/hardware/camera;->cancelautofocus, landroid/app/activitymanager;->getlocktaskmodestate, landroid/webkit/cookiesyncmanager;->resetsync, ljava/net/urlconnection;->getdooutput, : 0.9997035280166024 Current Triggers: landroid/content/intent;->getcomponent, landroid/bluetooth/rfcommsocket;->waitforasyncconnectnative, landroid/app/activity;->finalize, landroid/hardware/sensor;->getreportingmode, landroid/content/intent;->setdataandtype, landroid/hardware/camera;->startsmoothzoom, lorg/apache/cordova/file/directorymanager;->getfreediskspace, : 0.9997035280166024

    opened by bowtiejicode 7
  • how to slove the question?thank you

    how to slove the question?thank you

    when I run "python sst.py",this error always exist.So,please tell me how to slove this error.

    error: " from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto' "

    thank you @Eric-Wallace

    opened by Frank-LXR 6
  • getting best candidates (beam search)

    getting best candidates (beam search)

    Apologies for multiple questions, in this function (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L119) maybe I am missing something but shouldn't this line (https://github.com/Eric-Wallace/universal-triggers/blob/2e4bc9363ca547105ebdd9a09f37f6898a03f46a/utils.py#L138) go inside the first for loop? thanks

    opened by mehdimashayekhi 4
  • Loss thresholds for successful triggers on language models?

    Loss thresholds for successful triggers on language models?

    Hi Eric! Thanks for sharing this work. I've implemented this in Tensorflow to use with a dupe of the 124M GPT-2 model and was wondering if you could provide some details on the range of final "best loss" #s you were seeing with the smallest model and the triggers which worked (I'm working under the assumption that on a vocab size of 50k that cross entropy of ~10.8 ish would be equivalent to "random"). My current process isn't producing triggers which are successfully adversarial and I'm wondering if perhaps I'm just not finding very good triggers. Thanks!

    opened by mathemakitten 3
  • Error when running the squad script with up-to-date libraries

    Error when running the squad script with up-to-date libraries

    When I run pip install -r requirements.txt with the default requirements.txt file, I get the following error: Traceback (most recent call last): File "squad/squad.py", line 2, in <module> from allennlp.data.dataset_readers.reading_comprehension.squad import SquadReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/__init__.py", line 1, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module> from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module> from allennlp.data.instance import Instance File "/opt/conda/lib/python3.6/site-packages/allennlp/data/instance.py", line 3, in <module> from allennlp.data.fields.field import DataArray, Field File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/__init__.py", line 10, in <module> from allennlp.data.fields.knowledge_graph_field import KnowledgeGraphField File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/knowledge_graph_field.py", line 14, in <module> from allennlp.data.token_indexers.token_indexer import TokenIndexer, TokenType File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/__init__.py", line 5, in <module> from allennlp.data.token_indexers.dep_label_indexer import DepLabelIndexer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/dep_label_indexer.py", line 9, in <module> from allennlp.data.tokenizers.token import Token File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/__init__.py", line 8, in <module> from allennlp.data.tokenizers.pretrained_transformer_tokenizer import PretrainedTransformerTokenizer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/pretrained_transformer_tokenizer.py", line 5, in <module> from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto'

    So I install the latest version of allennlp with pip install allennlp --upgrade and get the following versions: allennlp-0.9.0 pytorch-transformers-1.1.0

    However, I still can't run the squad script because I get the following error: Traceback (most recent call last): File "squad.py", line 118, in <module> main() File "squad.py", line 19, in main model.eval().cuda() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda return self._apply(lambda t: t.cuda(device)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 127, in _apply self.flatten_parameters() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

    I am running the code on an AWS instance with CUDA and Python 3.6.9. I would appreciate it if you could help me figure out what I'm doing wrong.

    opened by taylorshin 3
  • Comparing model accuracies for SNLI

    Comparing model accuracies for SNLI

    When comparing the accuracy of esim-glove-snli with decomposable-attention, it seems like the the accuracy computed for the second model is not computed correctly.

    Am I missing something here?

    Here's a Colab.

    # Load model and vocab
    model = load_archive('https://allennlp.s3-us-west-2.amazonaws.com/models/esim-glove-snli- 
    2019.04.23.tar.gz').model
    model.eval().cuda()
    vocab = model.vocab
    
    # Get original accuracy before adding universal triggers
    utils.get_accuracy(model, subset_dev_dataset, vocab, trigger_token_ids=None, snli=True)
    
    100%|██████████| 53676932/53676932 [00:00<00:00, 62451888.15B/s]
    Without Triggers: 0.9095824571943527
    
    # Load model and vocab
    model2 = load_archive('https://s3-us-west-2.amazonaws.com/allennlp/models/decomposable-attention-2017.09.04.tar.gz').model
    model2.eval().cuda()
    vocab2 = model2.vocab
    
    # Get original accuracy before adding universal triggers
    utils.get_accuracy(model2, subset_dev_dataset, vocab2, trigger_token_ids=None, snli=True)
    
    100%|██████████| 38907176/38907176 [00:00<00:00, 77210263.22B/s]
    Did not use initialization regex that was passed: .*token_embedder_tokens\\\\._projection.*weight
    Without Triggers: 0.42805647341544006
    
    opened by daysm 3
  • A little confused about the training process

    A little confused about the training process

    For snli task, if i sample the subset of which the labels are 'entailment' and attack toward the same class(increase_loss=Flase), the accuracy should get higher right? cause we minimize the loss between the input and the target 'entailment', which is the true label.

    That wasn't the case though, the accuracy drops.

    opened by dwaydwaydway 3
  • How are “target_texts” generated?

    How are “target_texts” generated?

    Thank you for your wonderful work! I'd like to ask How are “target_texts” generated, in "Conditional Text Generation" ? Is there a strategy in this step?

    opened by zhaishengfang 2
  • sign from nearest neighbor attack?

    sign from nearest neighbor attack?

    Hi Wallace, thanks for sharing this work! As I am trying the different token replace strategies in the atttack.py I find a little hard to understand why for increase_loss = False, we apply, e(t+1)=e(t)+g (gradient ascent), while for increase_loss = True, we apply, e(t+1)=e(t)-g. In my understanding, the gradient descent tries to minimize the loss function? And I also try to flipping the sign the original code which gets better attack accuracy compared to the source code? Thanks!

    opened by Xinyu-ustc 2
  • get_best_candidates in snli.py

    get_best_candidates in snli.py

    Hi,

    In snli.py file, I'm wondering why get_best_candidates() function extracts the best candidate with the largest loss but at the same time, the model extracts candidates that can minimize model's loss (hotflip_attack() function with increase_loss=False)? Shouldn't the best candidate also minimize the loss? Thanks!

    opened by JieyuZhao 2
  • Questions regarding sentiment words

    Questions regarding sentiment words

    Hello Eric, in your paper you wrote that a sentiment word list is used to filter the words to substitute, but I could not find related code in the repository. Maybe the reason is that it is not important in generating the trigger words?

    opened by dugu9sword 2
Owner
Eric Wallace
Ph.D. Student at Berkeley working on ML and NLP.
Eric Wallace
Yuqing Xie 2 Feb 17, 2022
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

TextAttack ?? Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack

QData 2.2k Jan 3, 2023
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 4, 2021
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 9, 2022
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 8, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

Dani El-Ayyass 47 Sep 5, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in a matter of minutes. Based on our experiments with a wide range of benchmarks, ProteinBERT usually achieves state-of-the-art performance. ProteinBERT is built on TenforFlow/Keras.

null 241 Jan 4, 2023
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

EQT 21 Dec 15, 2022
Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Udit Arora 19 Oct 28, 2022
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Jifan Chen 22 Oct 21, 2022
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

Benedek Rozemberczki 201 Nov 9, 2022
This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe

Advent-of-cyber-2019-writeup This is the writeup of all the challenges from Advent-of-cyber-2019 of TryHackMe https://tryhackme.com/shivam007/badges/c

shivam danawale 5 Jul 17, 2022
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

null 4 Nov 11, 2021