[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

Overview

ADAPET

This repository contains the official code for the paper: "Improving and Simplifying Pattern Exploiting Training".

The model improves and simplifies PET with a decoupled label objective and label-conditioned MLM objective.

Model

                       Decoupled Label Loss                                                Label Conditioned Masked Language Modelling

Updates

  • [November 2021] You can run ADAPET on your own dataset now! See instructions here

Setup

Setup environment by running source bin/init.sh. This will

  • Download the FewGLUE and SuperGLUE datasets in data/fewglue/{task} and data/superglue/{task} respectively.
  • Install and setup environment with correct dependencies.

Training

First, create a config JSON file with the necessary hyperparameters. For reference, please see config/BoolQ.json.

Then, to train the model, run the following commands:

sh bin/setup.sh
sh bin/train.sh {config_file}

The output will be in the experiment directory exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/. Once the model has been trained, the following files can be found in the directory:

exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/
    |
    |__ best_model.pt
    |__ dev_scores.json
    |__ config.json
    |__ dev_logits.npy
    |__ src

To aid reproducibility, we provide the JSON files to replicate the paper's results at config/{task_name}.json.

Evaluation

To evaluate the model on the SuperGLUE dev set, run the following command:

sh bin/dev.sh exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/

The dev scores can be found in exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/dev_scores.json.

To evaluate the model on the SuperGLUE test set, run the following command.

sh bin/test.sh exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/

The generated predictions can be found in exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/test.json.

Train your own ADAPET

  • Setup your dataset in the data folder as
data/{dataset_name}/
    |
    |__ train.jsonl
    |__ val.jsonl
    |__ test.jsonl

Each jsonl file consists of lines of dictionaries. Each dictionaries should have the following format:

{
    "TEXT1": (insert text), 
    "TEXT2": (insert text), 
    "TEXT3": (insert text), 
    ..., 
    "TEXTN": (insert text), 
    "LBL": (insert label)
}
  • Run the experiment
python cli.py --data_dir data/{dataset_name} \
              --pattern '(INSERT PATTERN)' \
              --dict_verbalizer '{"lbl_1": "verbalizer_1", "lbl_2": "verbalizer_2"}'

Here, INSERT PATTERN consists of [TEXT1], [TEXT2], [TEXT3], ..., [LBL]. For example, if the new dataset had two text inputs and one label, a sample pattern would be [TEXT1] and [TEXT2] imply [LBL].

Fine-tuned Models

Our fine-tuned models can be found in this link.

To evaluate these fine-tuned models for different tasks, run the following command:

python src/run_pretrained.py -m {finetuned_model_dir}/{task_name} -c config/{task_name}.json -k pattern={best_pattern_for_task}

The scores can be found in exp_out/fewglue/{task_name}/albert-xxlarge-v2/{timestamp}/dev_scores.json. Note: The best_pattern_for_task can be found in Table 4 of the paper.

Contact

For any doubts or questions regarding the work, please contact Derek ([email protected]) or Rakesh ([email protected]). For any bug or issues with the code, feel free to open a GitHub issue or pull request.

Citation

Please cite us if ADAPET is useful in your work:

@inproceedings{tam2021improving,
          title={Improving and Simplifying Pattern Exploiting Training},
          author={Tam, Derek and Menon, Rakesh R and Bansal, Mohit and Srivastava, Shashank and Raffel, Colin},
          journal={Empirical Methods in Natural Language Processing (EMNLP)},
          year={2021}
}
Comments
  • ValueError: could not broadcast input array from shape (4,) into shape (1,)

    ValueError: could not broadcast input array from shape (4,) into shape (1,)

    Thanks for the very interesting paper and code. I'm trying to train adapet on custom data (in Google Colab). I've prepared the data in the format as indicated in the documentation. my task is a binary sentiment classification task "positive" vs. "negative".

    !python cli.py --data_dir data/sentiment-news-econ \
                   --pattern 'The quote: "[TEXT1]". The quote is overall [LBL]' \
                   --dict_verbalizer '{"positive": "verbalizer_1", "negative": "verbalizer_2"}'
    

    I get the following error:

    Traceback (most recent call last):
      File "cli.py", line 54, in <module>
        train(config)
      File "/content/drive/MyDrive/Colab Notebooks/Few-shot-experiments/ADAPET/src/train.py", line 93, in train
        loss, dict_val_update = model(sup_batch)
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/content/drive/MyDrive/Colab Notebooks/Few-shot-experiments/ADAPET/src/adapet.py", line 213, in forward
        pet_disc_loss = self.get_decoupled_label_loss(batch)
      File "/content/drive/MyDrive/Colab Notebooks/Few-shot-experiments/ADAPET/src/adapet.py", line 178, in get_decoupled_label_loss
        lbl_logits = self.get_single_logits(pet_mask_ids, mask_idx, list_lbl) # [bs, num_lbl]
      File "/content/drive/MyDrive/Colab Notebooks/Few-shot-experiments/ADAPET/src/adapet.py", line 70, in get_single_logits
        lbl_ids[i, :len(i_lbl_ids)] = i_lbl_ids
    ValueError: could not broadcast input array from shape (4,) into shape (1,)
    

    The error seems to come from the following part in the source code. Where the token IDs for the labels are retrieved. https://github.com/rrmenon10/ADAPET/blob/6d2bfef5405ca53112235021d9fb8b36f641be9d/src/adapet.py#L70

    I couldn't figure out what it causing it though :/

    opened by MoritzLaurer 12
  • Train your own data

    Train your own data

    Hi,

    I want to use ADAPET on my own data. So, I was trying to follow your recommendation with a toy dataset. I took a small subset of data/BoolQ/train.jsonl and removed the "idx" key, so it looks like my own data.

    So, the file lines look like this {"question": "is ghost in the shell based on the anime", "passage": "Ghost in the Shell -- Animation studio Production I.G has produced ....", "label": false} . . . .

    I used the command you provided in the README.md as follows: python cli.py --data_dir $data_dir --pattern '"[TEXT1]" has an answer in "[TEXT2]"? "[LBL]"' --dict_verbalizer '{"true": "yes", "false": "no"}'

    and that command throws this error: Traceback (most recent call last): File "cli.py", line 52, in <module> train(config) File "~/ADAPET/src/train.py", line 59, in train batcher = Batcher(config, tokenizer, config.dataset) File "~/ADAPET/src/data/Batcher.py", line 21, in __init__ self.dataset_reader = DatasetReader(config, tokenizer, dataset) File "~/ADAPET/src/data/DatasetReader.py", line 44, in __init__ self.dataset_reader = GenericReader(self.config, tokenizer) File "~/ADAPET/src/data/GenericReader.py", line 24, in __init__ self.check_pattern(self.config.pattern) File "~/ADAPET/src/data/GenericReader.py", line 45, in check_pattern raise ValueError("Need at least one text ") ValueError: Need at least one text

    I would highly appreciate guiding me on what I am doing wrong.

    Thank you!

    opened by Afnan-Sultan 9
  • Training ADAPET on data too large to fit in RAM.

    Training ADAPET on data too large to fit in RAM.

    I am training an ADAPET model on 800gb of text. However, I cannot load this data into a variable since I don't have nearly enough RAM to do that. Is there any way to keep loading the data chunk-by-chunk and training it bit-by-bit? Online learning of some sorts, maybe? How to do this with an ADAPET model?

    opened by ghost 7
  • evaluate finetuned models

    evaluate finetuned models

    I am trying to evaluate your fine-tuned model on BoolQ task. I am using this command :

    python src/run_pretrained.py -m {finetuned_model_dir}/{task_name} -c config/{task_name}.json -k pattern={best_pattern_for_task}

    when trying to run it I get the following warning:

    Token indices sequence length is longer than the specified maximum sequence length for this model (737 > 512). Running this sequence through the model will result in indexing errors

    I want to know why is this happening? and how can I fix it to get the evaluation results?

    Note: I tried with another task (CB), and it is working fine.

    opened by Wafaa014 4
  • Empty batch for WSC

    Empty batch for WSC

    For WSC, the Batcher sometimes produces batches of size 0. Do you have any idea why that is? In particular, on this line in WSCReader.py, I sometimes find that len(list_input_ids) == 0.

    I don't think this happens with the original FewGLUE WSC data, but it does happen for me on other versions of FewGLUE that I have sampled myself, for example the WSC train file below: train.txt

    I have made some additional small changes on top of the ADAPET repo, so it's possible that those changes are causing the issue, but I just wanted to check if you knew if something in the original ADAPET repo that could be causing the issue.

    opened by ethanjperez 4
  • Multi-Task Multi-Pattern Training example

    Multi-Task Multi-Pattern Training example

    Hi,

    In your paper, you tried training the model with multiple patterns at once (C.2), my question is how can I do that with your code, if I want to train a personalised ADAPET.

    Thanks!

    opened by xruifan 3
  • How to evalutate on the test data for novel dataset?

    How to evalutate on the test data for novel dataset?

    Thanks for the great repo! two quick questions I am hoping you can help me with.

    1. How do I evalutate on the test data for a novel dataset?
    2. I do not believe multilabel classification is supported, is that correct?

    Thanks so much!

    opened by LuketheDukeBates 3
  • src/dev.py error loading model

    src/dev.py error loading model

    For the model checkpoint I've saved, I'm getting an error when loading them with src/dev.py -- basically the model_state_dict field doesn't appear to exist in the *.pt saved model checkpoint file. The issue is caused by the line below (line number may be different in this repo):

    Traceback (most recent call last):
      File "src/dev.py", line 35, in <module>
        model.load_state_dict(torch.load(os.path.join(args.exp_dir, checkpoint))["model_state_dict"])
    KeyError: 'model_state_dict'
    

    The model loads successfully if I remove ["model_state_dict"], but just wanted to check that that's the correct fix here.

    opened by ethanjperez 3
  • Which patterns are used?

    Which patterns are used?

    Does the code automatically use the single pattern that performed best on dev (used to report results in Table 1)? Also, were those patterns the ones used for the SuperGLUE test evaluation (Table 2)?

    opened by ethanjperez 3
  • Batch number for sPET and ADAPET

    Batch number for sPET and ADAPET

    Hi,

    in your paper you trained the sPET and ADAPET models using the same experimental setups, I wonder how did you set sPET's batch number? Setting batch number for ADAPET is straight forward, but if I remember right in PET's code they don't set up for it, by contrast, they have flags for number of train examples and train batch size, and also some settings related to numbers and batch sizes of unlabelled data.

    Fan.

    opened by xruifan 2
  • Train ADAPET with other pretrained models

    Train ADAPET with other pretrained models

    Hi,

    I am trying to train ADAPET with my own data. If I want to use other pretrained language models, how do I do that? Can I just change the line of pretrained_weight argument in cli.py file?

    Thanks.

    opened by xruifan 2
  • Training without validation set

    Training without validation set

    Hi,

    Thanks for sharing this work. I am wondering if there is a simple way to run this code without having a validation set? Also, if I run ADAPET on generic dataset, do you recommend doing hyperparameter search?

    Thanks.

    opened by dhkhey 1
  • Fix bug on COPA evaluation

    Fix bug on COPA evaluation

    The COPA training process fails with the following exception.

    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/workspace/adapet/src/train.py", line 135, in <module>
        train(config)
      File "/workspace/adapet/src/train.py", line 118, in train
        dev_acc, dev_logits = dev_eval(config, model, batcher, batch_idx, dict_avg_val)
      File "/workspace/adapet/src/eval/eval_model.py", line 65, in dev_eval
        eval(config, model, train_iter, train_scorer)
      File "/workspace/adapet/src/eval/eval_model.py", line 28, in eval
        pred_lbl, lbl_logits = model.predict(batch)
      File "/workspace/adapet/src/adapet.py", line 392, in predict
        return self.predict_helper(batch, self.get_pattern())
      File "/workspace/adapet/src/adapet.py", line 364, in predict_helper
        lbl_logits = self.get_eval_multilbl_logits(pet_mask_ids, mask_idx, list_lbl)
      File "/workspace/adapet/src/adapet.py", line 282, in get_eval_multilbl_logits
        tok_pos = torch.min(torch.nonzero(mask_idx[idx] == mask_pos)[0])
    TypeError: nonzero() received an invalid combination of arguments - got (bool), but expected (Tensor input, *, bool as_tuple)
    

    It causes the problem that the code checks data type on the wrong variable.

    opened by ruddyscent 0
Owner
Rakesh R Menon
Rakesh R Menon
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 4, 2023
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 5, 2023
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

null 213 Nov 12, 2022
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 218 Dec 31, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

null 86 Oct 5, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Distilling Knowledge via Knowledge Review, CVPR 2021

ReviewKD Distilling Knowledge via Knowledge Review Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia This project provides an implementation for the

DV Lab 194 Dec 28, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 2, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

OpenCV-CameraCalibration-Example FishEyeCameraCalibration.mp4 OpenCVを用いたカメラキャリブレーションのサンプルです 2021/06/21時点でPython実装のある以下3種類について用意しています。 通常カメラ向け 魚眼レンズ向け(

KazuhitoTakahashi 34 Nov 17, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 5, 2022
APS 6º Semestre - UNIP (2021)

UNIP - Universidade Paulista Ciência da Computação (CC) DESENVOLVIMENTO DE UM SISTEMA COMPUTACIONAL PARA ANÁLISE E CLASSIFICAÇÃO DE FORMAS Link do git

Eduardo Talarico 5 Mar 9, 2022
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 4, 2023