An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

Overview

MetaICL: Learning to Learn In Context

This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi.

Check out our demo at qa.cs.washington.edu:2021!

This README is mainly for how to reproduce MetaICL and Channel MetaICL in the paper, but also describe how to reproduce our baselines, including Multi-task zero-shot and various raw LM methods. All methods used in the paper are available in this repo (please see the below table).

For any questions about the paper or the code, please contact the first author (email) or leave issues.

If you find our code or paper useful, please cite the paper:

@article{ min2021metaicl,
    title={ Meta{ICL}: Learning to Learn In Context },
    author={ Min, Sewon and Lewis, Mike and Zettlemoyer, Luke and Hajishirzi, Hannaneh },
    journal={ arXiv preprint },
    year={ 2021 }
}

Content

  1. Installation
  2. Quick Start
  3. Data
  4. Training
  5. Inference
  6. Downloading Checkpoints

Installation

These are installation guidelines mainly for running baselines. Requirements for data are provided here. All codes are tested with Python 3.8.

pip install torch==1.9.0
pip install git+https://github.com/huggingface/transformers.git@c37573806ab3526dd805c49cbe2489ad4d68a9d7

To train the model, we use an 8-bit optimizer and mixed precision that significantly save the memory. To use them, please use the following commands (but skip if you will run inference only using released checkpoints):

# For 8-bit optimization: see https://github.com/facebookresearch/bitsandbytes for more details
pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda102 # modify based on your CUDA version

# For mixed precision training: see https://github.com/NVIDIA/apex for more details
# make sure your nvcc is working (e.g. `nvcc --version`)
cd .. # move outside of this project directory
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ../MetaICL # come back to this project directory

Quick Start

This is an example with a dataset financial_phrasebank.

First, prepare a list of training examples

train_data = [{"input": INPUT_1, "output": OUTPUT_1},
              {"input": INPUT_2, "output": OUTPUT_2},
              ...
              {"input": INPUT_K, "output": OUTPUT_K}]

If you prefer, you can download our training data by running the command python -m utils.download_data --demo_data then loading the downloaded file as follows.

with open("data/financial_phrasebank/financial_phrasebank_16_100_train.jsonl", "r") as f:
    train_data = []
    for line in f:
        train_data.append(json.loads(line))

Then, you can use our model as follows.

from metaicl.data import MetaICLData
from metaicl.model import MetaICLModel

# Load the model
data = MetaICLData(method="channel", max_length=1024, max_length_per_example=256)
model = MetaICLModel()
model.load("channel-metaicl")
model.cuda()
model.eval()

# Make a prediction for `input1`
input1 = "Both operating profit and net sales for the six-month period increased as compared to the corresponding period in 2007."
data.tensorize(train_data, [input1], options=["positive", "neutral", "negative"])
prediction = model.do_predict(data)[0]
print (prediction) # positive

# Make another prediction for `input2`
input2 = "The deal will have no significant effect on the acquiring company's equity ratio."
data.tensorize(train_data, [input2], options=["positive", "neutral", "negative"])
prediction = model.do_predict(data)[0]
print (prediction) # neutral

Data

As described in the paper, we use a collection of 142 tasks taken from CrossFit and UnifiedQA. We experiment with seven different settings, where there is no overlap in meta-training and target tasks. Download/Preprocessing guidelines are here.

Setting name alias (for command) # meta-train tasks # meta-train examples # target tasks
High Resource → Low Resource hr_to_lr 61 819,200 26
Classification → Classification class_to_class 43 384,022 20
Non-Classification → Classification non_class_to_class 37 368,768 20
QA → QA qa_to_qa 37 486,143 22
Non-QA → QA non_qa_to_qa 33 521,342 22
Non-NLI → NLI non_nli_to_nli 55 463,579 8
Non-Paraphrase Detection → Paraphrase Detection non_paraphrase_to_paraphrase 59 496,106 4

To run experiments for each setting, use "alias (for command)" for commands in the Training section and the Inference section.

All settings above do not use any templates/instructions. If you want to use instruction version as in ablations in the paper, use settings in the following table.

Setting name alias (for command) # instructions / meta-train task # meta-train tasks # meta-train examples # target tasks
High Resource → Low Resource without instructions hr_to_lr_noinst 0 32 492,655 12
High Resource → Low Resource with instructions (1 per task) hr_to_lr_inst 1 32 492,655 12
High Resource → Low Resource with instructions (all) hr_to_lr_inst_all 8.3 32 492,655 12

If you use these data resources, please make sure to cite CrossFit and UnifiedQA.

@inproceedings{ ye2021crossfit,
    title={ {C}ross{F}it: A Few-shot Learning Challenge for Cross-task Generalization in NLP },
    author={ Ye, Qinyuan and Lin, Bill Yuchen and Ren, Xiang },
    booktitle={ EMNLP },
    year={ 2021 }
}
@inproceedings{ khashabi2020unifiedqa,
    title={ {U}nified{QA}: Crossing Format Boundaries With a Single QA System },
    author={ Khashabi, Daniel and Min, Sewon and Khot, Tushar and Sabharwal, Ashish and Tafjord, Oyvind and Clark, Peter and Hajishirzi, Hannaneh },
    booktitle={ Findings of EMNLP },
    year={ 2020 }
}

If you use the instruction version, please make sure to cite the T0 paper.

@article{ sanh2021multitask,
    title={ Multitask Prompted Training Enables Zero-Shot Task Generalization },
    author={ Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush },
    journal={ arXiv preprint arXiv:2110.08207 },
    year={ 2021 }
}

How to Download and Preprocess

The code is modified from the original CrossFit repo. First, install requirements:

pip install datasets==1.4.0 wget

Warning: we found that datasets==1.4.0 is not compatible with Transformers version we use for training and inference. Please use a separate environement for data preprocessing and model training/inference.

cd preprocess
# preprocess from crossfit
python _build_gym.py --build --n_proc=40 --do_test
python _build_gym.py --build --n_proc=40 --do_train # skip if you won't run training yourself
# preprocess from unifiedqa
python unifiedqa.py --do_train --do_test # skip `--do_train` if you won't run training yourself

By default, preprocessed data is saved at data/.

Process instruction version

The instruction version is for settings using instructions. We use instructions from BigScience PromptSource. First, fetch instructions (prompts) from PromptSource by doing the following.

# assuming you are still inside `preprocess` directory
cd ../.. # go outside of your project directory
git clone https://github.com/bigscience-workshop/promptsource.git
cd promptsource
git checkout 4e67a38d9642bde222cb90e36e8a66fd6e4a861a
mv promptsource ../MetaICL/preprocess/ # move promptsource directory under `preprocess` directory
cd ../MetaICL/preprocess # comte back to `preprocess` directory
pip install pandas jinja2 "pyyaml>=5"

Note that this is a workaround that does not use python-pip to install the promptsource packages because it requires to use python<=3.7, while all other codes in this repo use python 3.8. If promptsource starts supporting python 3.8, please install the package following the guidelines in the original repo.

Then, download the data via:

python _build_gym.py --build --n_proc=20 --do_test --inst
python _build_gym.py --build --n_proc=20 --do_train --inst # skip if you won't run training yourself

Training

First, run the command to tensorize the text data and save them.

python train.py \
  --task $task --k 16384 --test_k 16 --seed 100 --use_demonstrations --method channel \
  --do_tensorize --n_gpu 8 --n_process 40
  • --task: name of the setting, like hr_to_lr, class_to_class, non_class_to_class, etc
  • --k: # of examples per meta-training task
  • --test_k: # of examples to be used at inference
  • --seed: data seed for training data
  • --method: direct / channel
  • --n_gpu: the number of gpus you will use for training
  • --n_process: the number of processed for preprocessing

Then, run the following command to train the model.

python -m torch.distributed.launch --nproc_per_node=8 train.py \
  --task $task --k 16384 --test_k 16 --seed 100 --train_seed 1 --use_demonstrations --method channel --n_gpu 8 \
  --batch_size 1 --lr 1e-05 --fp16 --optimization 8bit-adam --out_dir checkpoints/channel-metaicl/$task
  • --fp16: for mixed precision training
  • --optimization 8bit-adam: for 8-bit approximations for Adam optimizer
  • --batch_size: batch size per GPU; we use 1, so that the global batch size is 8
  • --num_training_steps: number of training steps; 30000 by default
  • --log_file: you can optionally specify this to save logs as a text file

Training takes around 4.5 hours

If you want to train Multi-task zero-shot model that is one of our baselines in the paper, you can use similar commands for both tensorizing and training, but without --use_demonstrations and --test_k. Training takes around 3 hours.

Inference

python test.py --task $task --k 16 --split test --seed 100 --test_batch_size 16 \
    --method {channel|direct} --use_demonstrations \
    --out_dir checkpoints/metaicl/$task \
    --global_step 30000

Instead of specifying --global_step, you can specify --checkpoint for path to the checkpoint if you want to use checkpoint stored in somewhere else (for example, if you have downloaded the released checkpoints and want to use them). You must specify one of checkpoint and global_step.

  • --seed: seed for training data you will use at inference
  • --test_batch_size: batch size for inference; you can use 16 with a 32GB GPU
  • --unseen_domain_only: specify if you would like to run inference on unseen domain only
  • --log_file: Similar to in training, specify the path to the file where you want to save logs

If you want to run inference for Multi-task zero-shot baseline, you can use a similar command but without --use_demonstrations and --k. For this baseline, you can use --test_batch_size 64 with a 32GB GPU.

If you want to run raw LM baselines in the paper, you do not need to specify --checkpoint or --global_step. Instead, specify --do_zeroshot, and then:

  • For 0-shot, run the command --method direct
  • For PMI 0-shot, run the command using --is_null, and then run the command using --use_calibration (for both, with --method direct)
  • For Channel 0-shot, run the command using --method channel
  • For In-context/PMI In-context/Channel In-context, do the same as above except always adding --use_demonstrations

You can use the same out_dir for all raw LM baselines if you are using the same GPT2 model, e.g., checkpoints/raw-gpt2-large

Downloading Checkpoints

You can run the inference script by specifying --checkpoint {model_name}, and the script will automatically download the corresponding checkpoint under the checkpoints/ directory. {model_name} can either be

  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}: corresponding method trained in the hr_to_lr setting
  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}-instruction: corresponding method trained in the hr_to_lr_inst_all setting
  • {metaicl|channel-metaicl|multitask-zero|channel-multitask-zero}/{setting_name}: corresponding method trained in the corresponding setting (for setting_name, see the Table in the data section)

Alternatively, you can download all checkpoints via:

python -m utils.download --checkpoints --setting all --method all

If you want to download one of settings only, specify --setting {setting_name} (using "alias for command" in the setting table above) If you want to download one of methods only, specify --method {method_name} where method_name is one of metaicl, channel-metaicl, multitask-zero, channel-multitask-zero.

Simply reproducing all results in the paper

You can use the following commands (based on a 32GB GPU):

# raw LM zero-shot baselines (0-shot, PMI 0-shot, Channel 0-shot)
bash reproduce.sh {setting_name} {zero|pmi-zero|channel-zero} 100 64

# raw LM in-context baselines (in-context, PMI in-context, Channel in-context)
bash reproduce.sh {setting_name} {ic|pmi-ic|channel-ic} 100,13,21,42,87 16

# Multi-task 0-shot baselines
bash reproduce.sh {setting_name} {multitask-zero|channel-multitask-zero} 100 64

# MetaICL
bash reproduce.sh {setting_name} {metaicl|channel-metaicl} 100,13,21,42,87 16

License

MetaICL is CC-BY-NC 4.0 licensed.

Comments
  • CHILD PROCESS FAILED WITH NO ERROR_FILE

    CHILD PROCESS FAILED WITH NO ERROR_FILE

    Hi There, any idea why I'm getting this error by running the training script: python -m torch.distributed.run --nproc_per_node=8 train.py --task hr_to_lr --k 16384 --test_k 16 --seed 100 --train_seed 1 --use_demonstrations --method channel --n_gpu 8 --batch_size 1 --lr 1e-05 --fp16 --optimization 8bit-adam --out_dir checkpoints/channel-metaicl/hr_to_lr

    after successfully running the following script for tensorizing: python train.py --task hr_to_lr --k 16384 --test_k 16 --seed 100 --use_demonstrations --method channel --do_tensorize --n_gpu 4 --n_process 40

    my log file:


               CHILD PROCESS FAILED WITH NO ERROR_FILE                
    

    CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 17057 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example:

    from torch.distributed.elastic.multiprocessing.errors import record

    @record def trainer_main(args): # do train


    warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/monajati/miniconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/monajati/miniconda3/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 637, in main() File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 629, in main run(args) File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 621, in run elastic_launch( File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/home/monajati/miniconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


            train.py FAILED            
    

    ======================================= Root Cause: [0]: time: 2022-02-04_20:50:02 rank: 0 (local_rank: 0) exitcode: 1 (pid: 17057) error_file: <N/A> msg: "Process failed with exitcode 1"

    Other Failures: [1]: time: 2022-02-04_20:50:02 rank: 1 (local_rank: 1) exitcode: 1 (pid: 17058) error_file: <N/A> msg: "Process failed with exitcode 1" [2]: time: 2022-02-04_20:50:02 rank: 2 (local_rank: 2) exitcode: 1 (pid: 17059) error_file: <N/A> msg: "Process failed with exitcode 1" [3]: time: 2022-02-04_20:50:02 rank: 3 (local_rank: 3) exitcode: 1 (pid: 17060) error_file: <N/A> msg: "Process failed with exitcode 1" [4]: time: 2022-02-04_20:50:02 rank: 4 (local_rank: 4) exitcode: 1 (pid: 17061) error_file: <N/A> msg: "Process failed with exitcode 1" [5]: time: 2022-02-04_20:50:02 rank: 5 (local_rank: 5) exitcode: 1 (pid: 17062) error_file: <N/A> msg: "Process failed with exitcode 1" [6]: time: 2022-02-04_20:50:02 rank: 6 (local_rank: 6) exitcode: 1 (pid: 17063) error_file: <N/A> msg: "Process failed with exitcode 1" [7]: time: 2022-02-04_20:50:02 rank: 7 (local_rank: 7) exitcode: 1 (pid: 17064) error_file: <N/A> msg: "Process failed with exitcode 1"


    opened by monajati 5
  • Questions around the data preprocessing

    Questions around the data preprocessing

    Hi again! I would like to run the training procedure with my own custom datasets, but I'm finding the data setup quite confusing.

    In particular, I'm trying to understand the preprocessing done to generate the files in the MetaICL/data directory. Since I am not using HuggingFace datasets, I think the easiest route for me is to adapt unifiedqa.py to take my own input and output the right format.

    However, looking at the files that have been generated in my MetaICL/data directory, I see a lot of files and I do not understand how they are used:

    $ tree MetaICL/data
    data
    ├── ade_corpus_v2-classification
    │   ├── ade_corpus_v2-classification_16_100_dev.jsonl
    │   ├── ade_corpus_v2-classification_16_100_test.jsonl
    │   ├── ade_corpus_v2-classification_16_100_train.jsonl
    │   ├── ade_corpus_v2-classification_16_13_dev.jsonl
    │   ├── ade_corpus_v2-classification_16_13_test.jsonl
    │   ├── ade_corpus_v2-classification_16_13_train.jsonl
    │   ├── ade_corpus_v2-classification_16_21_dev.jsonl
    │   ├── ade_corpus_v2-classification_16_21_test.jsonl
    │   ├── ade_corpus_v2-classification_16_21_train.jsonl
    │   ├── ade_corpus_v2-classification_16384_100_dev.jsonl
    │   ├── ade_corpus_v2-classification_16384_100_train.jsonl
    │   ├── ade_corpus_v2-classification_16_42_dev.jsonl
    │   ├── ade_corpus_v2-classification_16_42_test.jsonl
    │   ├── ade_corpus_v2-classification_16_42_train.jsonl
    │   ├── ade_corpus_v2-classification_16_87_dev.jsonl
    │   ├── ade_corpus_v2-classification_16_87_test.jsonl
    │   └── ade_corpus_v2-classification_16_87_train.jsonl
    ├── ade_corpus_v2-dosage
    │   ├── ade_corpus_v2-dosage_16_100_dev.jsonl
    ...
    

    I understand that the files are named with {task}_{k}_{seed}_{split}.jsonl, but I am confused how these files are used / which are used during train / test.

    My main questions are:

    1. Can you please explain how each of those files is used during training and testing?

    2. Why do you generate so many files, instead of simply 3 files for train, dev and test?

    In case it's not covered in the general explanation, I also have some additional questions from looking through the code:

    1. With the default setup, it seems like only *_16384_100_train.jsonl is used during training. So if I want to train on a custom dataset, I can just put my file in data/my_task/my_task_16384_100_train.jsonl without any of the other files, and that should be enough to run the train procedure?

    2. The *_16_{seed}_train.jsonl files are always 16 lines long, whereas *_16_{seed}_test.jsonl files are always much longer. Why?

    3. As far as I can tell, the *_dev.jsonl files are never used?

    4. Is there something special about seed=100? Looking at this and this.

    Thank you very much in advance!

    opened by JunShern 5
  • Confusion about the initial evaluation results of the GPT2-LARGE model

    Confusion about the initial evaluation results of the GPT2-LARGE model

    Hello, I have a puzzle, I use gpt2-large pre-training language model to evaluate unseen_domain_test in-context, config file is configs/class_to_class.json, method is direct, k=16, found in-context f1 result is 38.63( the paper‘result is 30.6) 。in the paper or config file, we can know the unseen_domain_test dataset include "poem_sentiment", "climate_fever", "medical_questions_pairs", "financial_phrasebank" , all is 2 categories or 3 categories, even if a random pick should be at least 33.3(1/3), why is the paper 30.6? A little confused.

    opened by Lisennlp 4
  • Setting k for pre-processing train data

    Setting k for pre-processing train data

    How can we change the amount of training samples produced by the preprocessing script? It seems many of the files (e.g. ade_effect.py, anli.py, etc) have k hardcoded and thus are not producing the number of examples passed in the arguments.

    Thank you!

    opened by thomaspzollo 4
  • Using MetaICL for Unconstrained Generation

    Using MetaICL for Unconstrained Generation

    Currently, the "options" field is required in any test set .json(l) for the data to load without throwing exceptions: https://github.com/facebookresearch/MetaICL/blob/ec0f1c199965f54d4b4d35556e5ca9ba0a6719ab/metaicl/data.py#L189

    While the original use-case is to evaluate on multiple-choice or classification problems only, I'd also be interested in evaluating the non-QA models on datasets like SQuAD or NaturalQuestions, which are not multiple-choice. It would also be interesting to observe what the model generates for conditional generation tasks in general.

    I'd be willing to implement this in a PR if you think this feature would be useful!

    opened by aaronmueller 3
  • Additional information on `run_model` method from the `MetaICLModel` class

    Additional information on `run_model` method from the `MetaICLModel` class

    Hi @shmsw25

    thank you very much for publishing the MetaICL code base.

    I'm currently going through the code to understand it better and I stumbled over the following lines in the MetaICLModel class of the run_model method: https://github.com/facebookresearch/MetaICL/blob/main/metaicl/model.py#L273 (removal of the last element) https://github.com/facebookresearch/MetaICL/blob/main/metaicl/model.py#L277:L278 (removal of the first element)

    I'm not sure why the last and first element is removed, one guess would be that it is due to the addition of newlines or spaces in the _prepro_each_datapoint function (https://github.com/facebookresearch/MetaICL/blob/main/metaicl/data.py#L116)?

    (Another guess is that it is related to EOS and BOS tokens, but those are not used in the MetaICLData class because in the prepro_sentence_pair_single function they are commented out https://github.com/facebookresearch/MetaICL/blob/main/metaicl/data.py#L457:L460 )

    Maybe I'm missing something important to understand the run_model method?

    opened by MicPie 3
  • Reproducibility of PMI methods?

    Reproducibility of PMI methods?

    Hi, I tried running reproduce.sh exactly as described https://github.com/JunShern/MetaICL#simply-reproducing-all-results-in-the-paper, and so far the results I get do exactly match the reported results in the README, except for the pmi-zero and pmi-ic settings.

    From the README:

      | hr_to_lr | class_to_class | non_class_to_class | qa_to_qa | non_qa_to_qa | non_nli_to_nli | non_paraphrase_to_paraphrase -- | -- | -- | -- | -- | -- | -- | -- zero | 34.9 | 34.2 | 34.2 | 40.4 | 40.4 | 25.5 | 34.2 pmi-zero | 34.8 | 33.2 | 33.2 | 40.4 | 40.4 | 27.9 | 39.2 channel-zero | 36.8 | 37.2 | 37.2 | 39.2 | 39.2 | 33.9 | 39.5 ic | 38.2 | 37.4 | 37.4 | 40.2 | 40.2 | 34 | 33.7 pmi-ic | 38.9 | 38.3 | 38.3 | 40.5 | 40.5 | 33 | 38.6

    The results I get:

      | hr_to_lr | class_to_class | non_class_to_class | qa_to_qa | non_qa_to_qa | non_nli_to_nli | non_paraphrase_to_paraphrase -- | -- | -- | -- | -- | -- | -- | -- zero | 34.9 | 34.2 | 34.2 | 40.4 | 40.4 | 25.5 | 34.2 pmi-zero | 31.3 | 24.1 | 24.1 | 36.4 | 36.4 | 26 | 33.1 channel-zero | 36.8 | 37.2 | 37.2 | 39.2 | 39.2 | 33.9 | 39.5 ic | 38.2 | 37.4 | 37.4 | 40.2 | 40.2 | 34 | 33.7 pmi-ic | 37.6 | 35.8 | -- | -- | -- | 31.7 | 32.9

    (Ignore the -- above, I have not gotten results for those yet.)

    If you compare the tables, you will see that all the rows are identical except for pmi-zero and pmi-ic. In particular, if you compare class_to_class and non_class_to_class settings for pmi-zero, the difference is as much as 9%.

    Is this expected / Do you have any guess why this happens?

    opened by JunShern 3
  • Confusion with classification when having multi tokens

    Confusion with classification when having multi tokens

    Hello, I am confused with the method of classifying when the option includes multi-token labels.

    Let's assume that the classification task has two options for an answer which are [favor, against] and has the input as "I do not have an opinion about this move".

    If we assume a prompt template as

    Input: I do not have an opinion about this move
    Output: 
    

    what I understand is MetaICL calculates each loss of

    Input: I do not have an opinion about this move
    Output: favor
    

    and

    Input: I do not have an opinion about this move
    Output: against
    

    and calculate average losses where favor and against are located, and compare the average losses. However, I am wondering whether it is a fair classification method. To illustrate, let's say that 'favor' is tokenized by ['fa', 'vor'] and 'against' is tokenized by ['against'].

    Then, I think that the loss according to the 'vor' could be heavily affected by prior 'fa' and may be significantly small.

    This can lead to smaller average loss and give 'favor' more advantage than 'against'.

    I am curious about your comments.

    Best regard, Wookje Han

    opened by wookjeHan 2
  • Using MetaICL for multi-label classification

    Using MetaICL for multi-label classification

    I may have missed this in the paper, but from my understanding, the setup used for MetaICL doesn't lend itself too well for multi-label classification, where you can classify a given input as multiple labels (for example picture of cat -> mammal, cat, animal, given options mammal, cat, animal, reptile, plant, etc.).

    I say this because I imagine that evaluating the log likelihood for every option combination sounds unfeasible. You would have to evaluate [mammal], [mammal, cat], [mammal, cat, animal], [reptile], [reptile, cat], etc... You can binarise this to getting 2^N completion options for a problem with N labels, which for N>10 is already in the thousands, running into prompt length limits

    My question is, have the authors considered a way to circumvent this issue? Is this even an issue or is my understanding incorrect? What are your thoughts?

    Thank you.

    opened by thesofakillers 2
  • Why using [SEP] token while using gpt2 tokenizer?

    Why using [SEP] token while using gpt2 tokenizer?

    Hi, I have a question about data preprocessing.

    I noticed that the code used [SEP] when proprocessing dataset such as superglue rte or wic.

    However, I am wondering why metaICL selects [SEP] token while it uses gpt2 tokenizer.

    The reason why I think this is strange is because according to my understanding, gpt2 tokenizer doesn't handle [SEP] token.

    It just tokenizes to [, SE, P, ].

    Would you please explain the reason of using [SEP] token?

    Thanks in advance. Best regards, Wookje Han

    opened by wookjeHan 2
  • Fix download function name to match utils.py

    Fix download function name to match utils.py

    I encountered a bug where from utils import normalize_answer, download_file would fail, because download_file seems to have been renamed in utils.py.

    Changing download_file to download_from_google_drive fixes the problem.

    CLA Signed 
    opened by JunShern 2
  • Can you release the checkpoints for the smaller models?

    Can you release the checkpoints for the smaller models?

    Hi,

    to facilitate research on fewer resources, could you you please release the checkpoints for the smaller model sizes mentioned in Appendix C.2 of the paper? I was only able to download the checkpoints for the 774M param model it seems.

    Thank you!

    opened by thesofakillers 0
Owner
Meta Research
Meta Research
Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

picinpics Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of

RodrigoCMoraes 1 Oct 24, 2021
Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Min-Max Adversarial Attacks [Paper] [arXiv] [Video] [Slide] Adversarial Attack Generation Empowered by Min-Max Optimization Jingkang Wang, Tianyun Zha

Jingkang Wang 12 Nov 23, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

Channel LM Prompting (and beyond) This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Cha

Sewon Min 92 Jan 7, 2023
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch >= 0.2.0 torchvision >= 0.1.8 fcn >= 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 7, 2023
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022
This repository is all about spending some time the with the original problem posed by Minsky and Papert

This repository is all about spending some time the with the original problem posed by Minsky and Papert. Working through this problem is a great way to begin learning computer vision.

Jaissruti Nanthakumar 1 Jan 23, 2022
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Dennis Bappert 104 Nov 25, 2022
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 1, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

null 75 Dec 2, 2022
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
The original weights of some Caffe models, ported to PyTorch.

pytorch-caffe-models This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are: GoogLeNet (Going Deeper wit

Katherine Crowson 9 Nov 4, 2022
A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

MPItrampoline MPI wrapper library: MPI trampoline library: MPI integration tests: MPI is the de-facto standard for inter-node communication on HPC sys

Erik Schnetter 31 Dec 22, 2022
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

Description A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization co

AoxiangFan 9 Nov 10, 2022