InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Overview

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

This is the official code base for our ICLR 2021 paper:

"InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective".

Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Usage

Prepare your environment

Download required packages

pip install -r requirements.txt

ANLI and TextFooler

To run ANLI and TextFooler experiments, refer to README in the ANLI directory.

SQuAD

We will upload the code for the SQuAD experiments soon.

Citation

@inproceedings{
wang2021infobert,
title={InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective},
author={Wang, Boxin and Wang, Shuohang and Cheng, Yu and Gan, Zhe and Jia, Ruoxi and Li, Bo and Liu, Jingjing},
booktitle={International Conference on Learning Representations},
year={2021}}
Comments
  • Time taken to train the model from scratch

    Time taken to train the model from scratch

    Hi, this is more of a question than an issue. How long does it take for the model to train approximately on a single GPU without batch size of 16?

    Thanks

    opened by ishanisri 3
  • Runtime Error -

    Runtime Error - "Distributed package doesn't have NCCL" when running runexp

    When I run source setup.sh && runexp anli-full infobert roberta-large 2e-5 32 128 -1 1000 42 1e-5 5e-3 6 0.1 0 4e-2 8e-2 0 3 5e-3 0.5 0.9 as specified in the README in the ANLI directory, I encounter a RuntimeError: Distributed package doesn't have NCCL built in message.

    Do you have any advice on how to fix this?

    Thank you! I really admire your work.

    opened by eaberman 2
  • How are A1,A2, A3 scores calculated from r1,r2,r3 test results?

    How are A1,A2, A3 scores calculated from r1,r2,r3 test results?

    The results of evaluation show here in the README file vs the ones in the paper are based on different metrics. The paper shows A1,A2,A3 scores along with accuracy on adv-MNLi, adv-SNLi etc, whereas running the evaluation here shows us accuracy roundwise of both tests and dev data. Is there any formula to calculate the A1,A2,A3 scores from the test scores mentioned in the repo?

    Thanks in advance for the help :)

    opened by ishanisri 2
  • RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

    RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

    Hi, I'm trying to run the following command source setup.sh && runexp anli-part infobert roberta-base 2e-5 32 128 -1 1000 42 1e-5 5e-3 6 0.1 0 4e-2 8e-2 0 3 5e-3 0.5 0.9 But I got the following error. Traceback:

    04/08/2022 19:30:17 - INFO - datasets.anli -   Saving features into cached file anli_data/cached_dev_RobertaTokenizer_128_anli-part [took 0.690 s]
    04/08/2022 19:30:17 - INFO - filelock -   Lock 139893720074960 released on anli_data/cached_dev_RobertaTokenizer_128_anli-part.lock
    04/08/2022 19:30:17 - INFO - local_robust_trainer -   You are instantiating a Trainer but W&B is not installed. To use wandb logging, run `pip install wandb; wandb login` see https://docs.wandb.com/huggingface.
    04/08/2022 19:30:17 - INFO - local_robust_trainer -   ***** Running training *****
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Num examples = 942069
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Num Epochs = 3
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Instantaneous batch size per device = 32
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Total train batch size (w. parallel, distributed & accumulation) = 32
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Gradient Accumulation steps = 1
    04/08/2022 19:30:17 - INFO - local_robust_trainer -     Total optimization steps = 88320
    Iteration:   0%|                                                                                                                                                                                                                                                                                   | 0/29440 [00:00<?, ?it/s]
    Epoch:   0%|                                                                                                                                                                                                                                                                                           | 0/3 [00:00<?, ?it/s]
    Traceback (most recent call last):
      File "./run_anli.py", line 395, in <module>
        main()
      File "./run_anli.py", line 239, in main
        model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
      File "/root/InfoBERT/ANLI/local_robust_trainer.py", line 731, in train
        full_loss, loss_dict = self._adv_training_step(model, inputs, optimizer)
      File "/root/InfoBERT/ANLI/local_robust_trainer.py", line 1031, in _adv_training_step
        outputs = model(**inputs)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
        output = self.module(*inputs[0], **kwargs[0])
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/roberta.py", line 345, in forward
        inputs_embeds=inputs_embeds,
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/bert.py", line 822, in forward
        output_hidden_states=output_hidden_states,
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/bert.py", line 494, in forward
        output_attentions,
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/bert.py", line 416, in forward
        hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/bert.py", line 347, in forward
        hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/InfoBERT/ANLI/models/bert.py", line 239, in forward
        mixed_query_layer = self.query(hidden_states)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
        result = self.forward(*input, **kwargs)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
        return F.linear(input, self.weight, self.bias)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/nn/functional.py", line 1372, in linear
        output = input.matmul(weight.t())
    RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
    Traceback (most recent call last):
      File "/root/miniconda3/envs/infobert/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/root/miniconda3/envs/infobert/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
        main()
      File "/root/miniconda3/envs/infobert/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
        cmd=cmd)
    subprocess.CalledProcessError: Command '['/root/miniconda3/envs/infobert/bin/python', '-u', './run_anli.py', '--local_rank=0', '--model_name_or_path', 'roberta-base', '--task_name', 'anli-part', '--do_train', '--do_eval', '--data_dir', 'anli_data', '--max_seq_length', '128', '--per_device_train_batch_size', '32', '--learning_rate', '2e-5', '--max_steps', '-1', '--warmup_steps', '1000', '--weight_decay', '1e-5', '--seed', '42', '--beta', '5e-3', '--logging_dir', 'infobert-roberta-base-anli-part-sl128-lr2e-5-bs32-ts-1-ws1000-wd1e-5-seed42-beta5e-3-alpha5e-3--cl0.5-ch0.9-alr4e-2-amag8e-2-anm0-as3-hdp0.1-adp0-version6', '--output_dir', 'infobert-roberta-base-anli-part-sl128-lr2e-5-bs32-ts-1-ws1000-wd1e-5-seed42-beta5e-3-alpha5e-3--cl0.5-ch0.9-alr4e-2-amag8e-2-anm0-as3-hdp0.1-adp0-version6', '--version', '6', '--evaluate_during_training', '--logging_steps', '500', '--save_steps', '500', '--hidden_dropout_prob', '0.1', '--attention_probs_dropout_prob', '0', '--overwrite_output_dir', '--adv_lr', '4e-2', '--adv_init_mag', '8e-2', '--adv_max_norm', '0', '--adv_steps', '3', '--alpha', '5e-3', '--cl', '0.5', '--ch', '0.9']' returned non-zero exit status 1.
    

    Do you know how to fix this? Thank you so much.

    Other Information:

    • OS: Ubuntu 20.04.3 LTS
    • GPU: NVIDIA A100
    • Python 3.7.13
    opened by dangne 1
  • Training a Model from Huggingface with InfoBERT

    Training a Model from Huggingface with InfoBERT

    Hi, I am attempting to use an ALBERT model, e.g. albert-base-v2 from https://huggingface.co/albert-base-v2, and usingalbert-base-v2 as the argument when I run source setup.sh && runexp.

    Unfortunately, I am getting an AttributeError: 'AlbertForSequenceClassification' object has no attribute 'clear_mask'

    Are there any functions I need to implement in order for this to work, or any advice about getting an approach like this to work with your code base?

    Thank you!

    opened by eaberman 1
Owner
AI Secure
UIUC Secure Learning Lab
AI Secure
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 3, 2023
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-population, and their combinations to provide a comprehensive robustness analysis.

TextFlint 587 Dec 20, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Covid-19-BOT Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation. This bot uses torc

Neeraj Majhi 2 Nov 5, 2021
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

null 475 Jan 4, 2023
code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling This repository contains PyTorch evaluation code, training code and pretrain

Facebook Research 94 Oct 26, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

null 164 Jan 2, 2023
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

Šarūnas Navickas 60 Sep 26, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language ⚖️ The library of Natural Language Processing for Brazilian legal lang

Felipe Maia Polo 125 Dec 20, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
NL. The natural language programming language.

NL A Natural-Language programming language. Built using Codex. A few examples are inside the nl_projects directory. How it works Write any code in pur

null 2 Jan 17, 2022