Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming

YerevaNN

Last update: Nov 6, 2022

Related tags

Overview

🌀 WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 Paper WARP: Word-level Adversarial ReProgramming.

^{WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.}

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set	Model	CB		RTE
Set	Model	F₁	Acc.	Acc.
dev
	GPT-3 Small	26.1	42.9	52.3
	GPT-3 Med	40.4	58.9	48.4
	GPT-3	57.2	82.1	72.9
	PET (ALBERT)	59.4	85.1	69.8
	iPET (ALBERT)	92.4	92.9	74.0
	WARP_init (ALBERT)	84.0	87.5	71.8
test
	GPT-3	52.0	75.6	69.0
	PET (ALBERT)	60.2	87.2	67.2
	iPET (ALBERT)	79.9	88.8	70.8
	WARP_init (ALBERT)	70.2	82.4	69.1

^{Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server. We only show systems performing in a similar few-shot training setup using 32 examples.}

Setup

The code requires YerevaNN's internal version of allennlp

git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .

Training

Linear Probing

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [],
        "reorder_optimized": false,
        "max_batch_size": 8,
        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'
    python -m allennlp train \
    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done

WARP_0

"], "reorder_optimized": true, "max_batch_size": 8, "max_tokens_sq": 262144, "on_logits": "pre_decoder_layer_norm", "pooling_index": 1, "seed": 1 }' python -m allennlp train \ -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet done ">

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [null, "
   
    "],
   
        "reorder_optimized": true,
        "max_batch_size": 8,
        "max_tokens_sq": 262144,
        "on_logits": "pre_decoder_layer_norm",
        "pooling_index": 1,
        "seed": 1
    }'
    python -m allennlp train \
    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done

Training WARP

 ", "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"
  ",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29], "seed":1, "transformer_model":"roberta-large" }' python -m allennlp train \ -s .aim/t-${DATASET} configs/warp.jsonnet "> 
  export DATASET="rte"
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init":null,
    "dataset":"'$DATASET'",
    "ensure_whitespace_between":false,
    "lr":0.001,
    "max_batch_size":8,
    "max_tokens_sq":262144,
    "num_epochs":30,
    "prompt_better_init":"
    
     ",
    
    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"
    
     ",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
    
    "seed":1,
    "transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet 
 

WARP_init

Few-Shot Experiments

", [-20, ","], null, [-29, "!"],-30,-31], "seed":3, "str_cut_frac":0, "transformer_model":"albert-xxlarge-v2", "validation_metric": null }' python -m allennlp train \ -s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet ">

export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init": {
        "entailment": " yes",
        "not_entailment": " instead"
    },
    "dataset":"few_rte",
    "eval_mode":false,
    "lr":0.001,
    "max_batch_size":2,
    "max_tokens_sq":131072,
    "num_epochs":100,
    "num_gradient_accumulation_steps":2,
    "prompt_better_init": "[PAD]",
    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "
   
    ", [-20, ","], null, [-29, "!"],-30,-31],
   
    "seed":3,
    "str_cut_frac":0,
    "transformer_model":"albert-xxlarge-v2",
    "validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

",[-20,","],null,[-29,"!"],-30,-31], "seed":1, "str_cut_frac":0.06, "transformer_model":"albert-xxlarge-v2", "validation_metric":"+training_val_metric" }' python -m allennlp train \ -s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet ">

export HPARAMS='{
   "benchmark":"super_glue",
   "classifier_init":{
      "entailment":" yes",
      "not_entailment":" instead"
   },
   "dataset":"few_rte",
   "grad_norm":1,
   "lr":0.001,
   "max_batch_size":2,
   "max_tokens_sq":131072,
   "num_epochs":30,
   "num_gradient_accumulation_steps":2,
   "prompt_better_init":"[PAD]",
   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"
   
    ",[-20,","],null,[-29,"!"],-30,-31],
   
   "seed":1,
   "str_cut_frac":0.06,
   "transformer_model":"albert-xxlarge-v2",
   "validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.381",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}

Comments

bug

Hi, thanks for your nice work. However, I meet a problem when I run the code following the readme.

allennlp.common.checks.ConfigurationError: huggingface not in acceptable choices for dataset_reader.type: ['conll2003', 'interleaving', 'sequence_tagging', 'sharded', 'babi', 'text_classification_json']. You should either use the --include-package flag to make sure the correct module is loaded, or use a fully qualified class name in your config file like {"model": "my_module.models.MyModel"} to have it imported automatically.

I do not use the allennlp before, could you help me?

opened by Wangpeiyi9979 7

couldn't read from warp.jsonnet

File "/home/hanlin/warp/allennlp/allennlp/common/params.py", line 488, in from_file file_dict = json.loads(evaluate_file(params_file, ext_vars=ext_vars)) File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 169, in train_model_from_file params = Params.from_file(parameter_filename, overrides) File "/home/hanlin/warp/allennlp/allennlp/commands/train.py", line 119, in train_model_from_args file_friendly_logging=args.file_friendly_logging, File "/home/hanlin/warp/allennlp/allennlp/commands/init.py", line 119, in main args.func(args) File "/home/hanlin/warp/allennlp/allennlp/main.py", line 34, in run main(prog="allennlp") File "/home/hanlin/warp/allennlp/allennlp/main.py", line 38, in run()

opened by hlzhang109 4
No module named 'aim.sdk.session

Hi, thanks for your nice work. However, when I ran the follow code you provide: for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli' do export HPARAMS='{ "dataset": "'$DATASET'", "lr": 0.0001, "num_epochs": 20, "prompts": [], "reorder_optimized": false, "max_batch_size": 8, "max_tokens_sq": 262144, "on_logits": false, "pooling_index": null, "seed": 1}' python3 -m allennlp train
-s .aim/baseline-linear-${DATASET} configs/warp.jsonnet done , I encountered the follow problem: ModuleNotFoundError: No module named 'aim.sdk.session'. And I check the version of package "aim = 3.0.2" and the functions in aim.sdk. I did not find the function "session". Do I get the wrong aim version?

opened by jianzhengming 3
no output during evaluation
Hi, thanks for your job for the paper and code. I can train the model smoothly. However, when I run

python -m allennlp predict \ --silent --use-dataset-reader --cuda-device 0 \ --batch-size 50 \ --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

I'm confused about the parameters on the last line, as well as the data organization, cause the datasets were downloaded automatically to /home/dqx/.cache/huggingface/datasets/glue/mnli/train.arrow，but I'm not clear about the data in tsv format. Are they downloaded manually and just stored under WARP/ ? And how can I evaluate the model on my own testing data? If I upload rte.dev as the testing data to WARP/ and run the following command

python -m allennlp predict \ --silent --use-dataset-reader --cuda-device 0 \ --batch-size 50 \ --predictor super_glue --output-file test_output.tsv .aim/t-rte dev.tsv

the terminal output will be a list like "[...fixed_cross_validation_79', 'few_wsc.fixed_cross_validation_80', 'few_wsc.fixed_cross_validation_81', 'few_wsc.fixed_cross_validation_82', 'few_wsc.fixed_cross_validation_83', 'few_wsc.fixed_cross_validation_84', 'few_wsc.fixed_cross_validation_85', 'few_wsc.fixed_cross_validation_86', 'few_wsc.fixed_cross_validation_87', 'few_wsc.fixed_cross_validation_88', 'few_wsc.fixed_cross_validation_89', 'few_wsc.fixed_cross_validation_90', 'few_wsc.fixed_cross_validation_91', 'few_wsc.fixed_cross_validation_92', 'few_wsc.fixed_cross_validation_93', 'few_wsc.fixed_cross_validation_94', 'few_wsc.fixed_cross_validation_95', 'few_wsc.fixed_cross_validation_96', 'few_wsc.fixed_cross_validation_97', 'few_wsc.fixed_cross_validation_98', 'few_wsc.fixed_cross_validation_99', 'few_wsc.fixed_cross_validation_100']" but there's nothing in test_output.tsv.

I'm really confused about this, looking forward to your response, thanks😊✨
opened by dqxiu 2
WARP prompt/mask token position

Hi,

Thank you for your work and for releasing the code! I just have a few questions regarding the ordering of the tokens.

From the script that you shared for training WARP, the prompt should be in the format of [-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,[MASK],-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29]. If I understood correctly, the hypothesis and the premise should be placed in the null position and the other tokens should be considered as the prompt tokens. However, as I looked through the code I found that the input is given as [[MASK], -10,-11,-12,-13,-14,-15,-16,-17,-18,-19,-20,-21,-22,-23,-24,-25,-26,-27,-28,-29, null, null] with the mask token at the first position and the hypothesis and the premise at the end of the prompt. Did I understand the code correctly? Could you clarify the positions of the prompt tokens?

Thanks again! 😄

opened by heyjoonkim 2
Add missing datasets

This PR adds GLUE and SuperGLUE datasets in HuggingFace's datasets format. SuperGLUE also features FewGLUE - a few short version consisting of 32 examples.

opened by tmynn 1
Fix wording

Those withs were like a Winograd schema challenge. Let me know if I got the interpretation right

(Look at only the second commit to see the diff of the wording changes without the formatting changes)

opened by bittlingmayer 1
The detail about manual prompts

Hi,

Thank you for your work and for releasing the code! After reading your paper, I am confused about the manual prompts.

""" In addition to the regular models where we initialize with [MASK] tokens, we performed a run on the GLUE datasets with the same prompt [CLS] "S1"? [MASK]. "S2"! [SEP] for all the tasks """ In the manual prompts, I want to know where to insert the prompts. Wouldn't the original WARP also have [CLS], [SEP] and [MASK] special tokens? What is the difference between WARPinit and WARP8 in the insertion position of prompts?

I don't know much about this field, thank you very much for answering my question

opened by sinan106 2

Position Id

Hi, thanks for your nice work. When I read the source code, I have a simple question for the position id used in the code as follow,

parameters['position_ids'][0]

tensor([ 2, 47,  3,  4,  5,  6,  7, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 89,
        90, 91, 92, 93,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
        22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        40, 41, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
        69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
        87, 88, 94,  1,  1,  1], device='cuda:0')

I find that the position id is not ordered, and what are the benefits of such a position ID

opened by Wangpeiyi9979 1

Owner

YerevaNN

research lab

GitHub https://mahnerak.com/WARP

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 6, 2022

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

AMR-Dialogue An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here. Requirements python 3.6 pytorch

45 Dec 26, 2022

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

42 Dec 9, 2022

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the

108 Dec 23, 2022

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

380 Dec 19, 2022

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

109 Nov 29, 2022

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

63 May 22, 2021

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

32 Dec 13, 2022

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 1, 2023

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

38 Oct 16, 2022

Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

84 Nov 22, 2022

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

26 Dec 2, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022