Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

cookielee77

Last update: Jan 1, 2023

Related tags

Deep Learning CLARE

Overview

Contextualized Perturbation for Textual Adversarial Attack

Introduction

This is a PyTorch implementation of Contextualized Perturbation for Textual Adversarial Attack by Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun and Bill Dolan, NAACL 2021.

A third-party implementation of CLARE is available in the TextAttack.

Environment

The code is based on python 3.6, tensorflow 1.14 and Pytorch 1.4.0 version. The code is developed and tested using one NVIDIA GTX 1080Ti.

Please use Conda to setup your environment, and then run

conda install -y pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

bash install_requirement.sh

Data Preparation and Pretrained Classifier

You can download pretrained target classifier and full training data in here (Coming soon). Alternatively, you can prepare you own training set in the same format as the example under /data/training_data/${dataset}/dataset/. The format will look like:

label	text1	text2
2	At the end of 5 years ...	The healthcare agency will be able ...

For single sentence classification, there is an empty field in text2.

After this, please run:

python train_BERT_classifier.py --dataset ${dataset} --save_model.

It will save pretrained classifer under the director: /saved_model/${dataset}_uncased/. The default target classifer is bert, you can train other types by setting extra argument: --target_model textcnn. Please check out the arguments in config.py for more details.

The text samples to be attacked are store in /data/${dataset}.tsv with the same format.

Textual Adversarial Attack

Simply run:

python bert_attack_classification.py --dataset ${dataset} --sample_file ${dataset}

and it will save the results under /adv_results/.

To attack qnli dataset, please add an argument --attack_second as we attack the longer sentence in two-sentence classification.

You can also modify the attacking hyper-parameters in hyper_parameters.py to adjust the trade-off between different aspects. Other details can be refered in config.py.

To run the attack from the baseline textfooler:

python attack_classification.py --dataset ${dataset} --sample_file ${dataset}

Citing

if you find our work is useful in your research, please consider citing:

@InProceedings{li2021contextualized,
  title={Contextualized perturbation for textual adversarial attack},
  author={Li, Dianqi and Zhang, Yizhe and Peng, Hao and Chen, Liqun and Brockett, Chris and Sun, Ming-Ting and Dolan, Bill},
  booktitle={Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics},
  year={2021}
}

You might also like...

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

104 Dec 7, 2022

Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

148 Nov 20, 2022

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

56 Jan 2, 2023

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis 🙈 A more detailed re

4 Jun 9, 2021

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

L1-Refinement Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021) 🙈 A more detailed readme is co

4 Jun 9, 2021

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

37 Oct 30, 2022

Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

31 Oct 19, 2022

Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

Supporting Clustering with Contrastive Learning SCCL (NAACL 2021) Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ram

231 Jan 5, 2023

✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

19 Mar 15, 2022

Comments

Approximate time required per attack?

Hi, Thank you for the awesome work!

Do you have an estimate of how much time is required per attack on one of the dataset?
I was using the implemented version of CLARE in TextAttack and found out that it took a substantial amount of time to generate attacks. Around several hours on 1080Ti to generate < 100 attack samples. I am wondering if this is an expected behavior (given that ROBERTa is forwarded multiple times)? or an implementation issue on their side?

Thanks!

opened by bangawayoo 1
Could you add the pretrained target classifier and full training data?

Hi, thanks for this wonderful work. I can't get the same results with my pretrained model, could you add your pretrained target classifier or full training data? That will be very helpful.

opened by yuxy411 1
cannot install language-check & language-check's version's too old, and their solution

Problem1: can't install language-check. In installing the dependencies of CLARE, I faced a problem: ERROR: Failed building wheel for language-check, and terminal in Linux said: urllib.error.HTTPError: HTTP Error 403: Forbidden.

Problem2: the project of language-check is too old as it's latest update is in 3 years ago, so it cannot work well today. It is stuck on Java 8 and LanguageTool 3.2, and today's Java's version is up to 15 as well as LanguageTool's version is 5.5.

Solution of 1&2: not use language-check but use language_tool_python instead.

(1) language_tool_python is a new fork of language-check over at https://github.com/jxmorris12/language_tool_python. As it's contributor jxmorris12 said in issues in language-check:"Since this project has been abandoned...".

(2) Latest versions of language_tool_python are Java 14 and LanguageTool 5.5. And it's installation won't cause HTTP Error 403!

(3) Now it can replace language-check, because their API is very close.

How to install: (1) pip install language_tool_python

(2) make sure your environment has Java already (I insatlled Java8)

(3) manually download LanguageTool-stable.zip form https://pypi.org/project/language-tool-python/

(4) add : export LTP_PATH=/home/username/yournewfolder in your linux system bashrc

(5) unzip LanguageTool-stable.zip ,and put folder "LanguageTool-5.5" under the path where we just add into bashrc

(6) restart a terminal ,test language_tool_python follow the sample as https://pypi.org/project/language-tool-python/

In the last, I highly suggest author to update the dependencies of this project!

opened by BingkangShi 1

Owner

cookielee77

Ph.D. candidate at University of Washington

GitHub

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

Related tags

Overview

Contextualized Perturbation for Textual Adversarial Attack

Introduction

Environment

Data Preparation and Pretrained Classifier

Textual Adversarial Attack

Citing

You might also like...

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

Self-training with Weak Supervision (NAACL 2021)

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

Paddle implementation for "Cross-Lingual Word Embedding Refinement by ℓ1 Norm Optimisation" (NAACL 2021)

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Open-Ended Commonsense Reasoning (NAACL 2021)

Pytorch implementation of Supporting Clustering with Contrastive Learning, NAACL 2021

✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

Comments

Approximate time required per attack?

Could you add the pretrained target classifier and full training data?

cannot install language-check & language-check's version's too old, and their solution

Owner

cookielee77

Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet

NAACL2021 - COIL Contextualized Lexical Retriever

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall