ACL'2021: LM-BFF: Better Few-shot Fine-tuning of Language Models

Overview

LM-BFF (Better Few-shot Fine-tuning of Language Models)

This is the implementation of the paper Making Pre-trained Language Models Better Few-shot Learners. LM-BFF is short for better few-shot fine-tuning of language models.

Quick links

Overview

In this work we present LM-BFF, a suite of simple and complementary techniques for fine-tuning pre-trained language models on a small number of training examples. Our approach includes:

  1. Prompt-based fine-tuning together with a novel pipeline for automating prompt generation.
  2. A refined strategy for incorporating demonstrations into context.

You can find more details of this work in our paper.

Requirements

To run our code, please install all the dependency packages by using the following command:

pip install -r requirements.txt

NOTE: Different versions of packages (like pytorch, transformers, etc.) may lead to different results from the paper. However, the trend should still hold no matter what versions of packages you use.

Prepare the data

We pack the original datasets (SST-2, SST-5, MR, CR, MPQA, Subj, TREC, CoLA, MNLI, SNLI, QNLI, RTE, MRPC, QQP, STS-B) here. Please download it and extract the files to ./data/original, or run the following commands:

cd data
bash download_dataset.sh

Then use the following command (in the root directory) to generate the few-shot data we need:

python tools/generate_k_shot_data.py

See tools/generate_k_shot_data.py for more options. For results in the paper, we use the default options: we take K=16 and take 5 different seeds of 13, 21, 42, 87, 100. The few-shot data will be generated to data/k-shot. In the directory of each dataset, there will be folders named as $K-$SEED indicating different dataset samples. You can use the following command to check whether the generated data are exactly the same as ours:

cd data/k-shot
md5sum -c checksum

NOTE: During training, the model will generate/load cache files in the data folder. If your data have changed, make sure to clean all the cache files (starting with "cache").

Run LM-BFF

Quick start

Our code is built on transformers and we use its 3.4.0 version. Other versions of transformers might cause unexpected errors.

Before running any experiments, create the result folder by mkdir result to save checkpoints. Then you can run our code with the following example:

python run.py \
    --task_name SST-2 \
    --data_dir data/k-shot/SST-2/16-42 \
    --overwrite_output_dir \
    --do_train \
    --do_eval \
    --do_predict \
    --evaluate_during_training \
    --model_name_or_path roberta-large \
    --few_shot_type prompt-demo \
    --num_k 16 \
    --max_steps 1000 \
    --eval_steps 100 \
    --per_device_train_batch_size 2 \
    --learning_rate 1e-5 \
    --num_train_epochs 0 \
    --output_dir result/tmp \
    --seed 42 \
    --template "*cls**sent_0*_It_was*mask*.*sep+*" \
    --mapping "{'0':'terrible','1':'great'}" \
    --num_sample 16 \

Most arguments are inherited from transformers and are easy to understand. We further explain some of the LM-BFF's arguments:

  • few_shot_type: There are three modes
    • finetune: Standard fine-tuning
    • prompt: Prompt-based fine-tuning.
    • prompt-demo: Prompt-based fine-tuning with demonstrations.
  • num_k: Number of training instances for each class. We take num_k=16 in our paper. This argument is mainly used for indexing logs afterwards (because the training example numbers are actually decided by the data split you use).
  • template: Template for prompt-based fine-tuning. We will introduce the template format later.
  • mapping: Label word mapping for prompt-based fine-tuning. It is a string of dictionary indicating the mapping from label names to label words. NOTE: For RoBERTa, the model will automatically add space before the word. See the paper appendix for details.
  • num_sample: When using demonstrations during inference, the number of samples for each input query. Say num_sample=16, then we sample 16 different sets of demonstrations for one input, do the forward seperately, and average the logits for all 16 samples as the final prediction.

Also, this codebase supports BERT-series and RoBERTa-series pre-trained models in Huggingface's transformers. You can check Huggingface's website for available models and pass models with a "bert" or "roberta" in their names to --model_name_or_path. Some examples would be bert-base-uncased, bert-large-uncased, roberta-base, roberta-large, etc.

To easily run our experiments, you can also use run_experiment.sh (this command runs prompt-based fine-tuning with demonstrations, no filtering, manual prompt):

TAG=exp TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh

We have already defined the templates and label word mappings in it, so you only need manipulate several hyper-parameters and TAG (you can use whatever tag you want and it just makes finding results easier). See run_experiment.sh for more options of these environment variables. Besides, you can add extra arguments by

TAG=exp TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--output_dir result/exp --max_seq_length 512"

Experiments with multiple runs

To carry out experiments with multiple data splits, as the evaluation protocol detailed in $3.3 of our paper (grid-search for each seed and aggregate the results over 5 different seeds), you can use the following scripts:

for seed in 13 21 42 87 100
do
    for bs in 2 4 8
    do
        for lr in 1e-5 2e-5 5e-5
        do
            TAG=exp \
            TYPE=prompt-demo \
            TASK=SST-2 \
            BS=$bs \
            LR=$lr \
            SEED=$seed \
            MODEL=roberta-large \
            bash run_experiment.sh
        done
    done
done

All the results will be stored in ./log. To gather all the results, run the following command:

python tools/gather_result.py --condition "{'tag': 'exp', 'task_name': 'sst-2', 'few_shot_type': 'prompt-demo'}"

Then the program will find all the trials that satisfy the condition in ./log, and print the mean/std of the final results. Note that the task names are all lower-cased and if the task has more than one metric, you need to specify the major metric (used for taking the best validation trial) in the name (e.g., mnli, mnli-mm, mrpc/acc, mrpc/f1, qqp/acc, qqp/f1, sts-b/pearson, sts-b/spearman).

Using demonstrations with filtering

To use the filtering mechanism when using demonstrations, we need to first generate Sentence-BERT embeddings. To generate embeddings for datasets in our paper, you can directly run

bash tools/get_sbert_embedding.sh roberta-large

roberta-large can also be replaced by bert-base, bert-large, roberta-base and distilbert-base (see Sentence Transformers for details). See tools/get_sbert_embedding.sh and tools/get_sbert_embedding.py if you want to add more datasets.

After generating the embeddings (embeddings are saved as numpy files in the data folders), we can run the following commands to do prompt-based fine-tuning with demonstrations with filtering:

TAG=exp TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--demo_filter --demo_filter_model sbert-roberta-large"

Automatically searched prompt

We provide our automatic search results in auto_template and auto_label_mapping. There are three types of files:

  • SST-2/16-42.txt: Initial search results for SST-2 dataset, K=16 and SEED=42.
  • SST-2/16-42.sort.txt: Do prompt-based fine-tuning on initial results and sort them based on dev set performance.
  • SST-2/16-42.score.txt: Same as above, but with dev set scores.

To use the best automatic template (auto-T in the paper), use the following command:

TAG=exp TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--template_path auto_template/SST-2/16-42.sort.txt --template_id 0"

You can also use the i-th automatic result by specifying different template_id.

Similarly, to use automatic label (auto-L in the paper), use the following command:

TAG=exp TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--mapping_path auto_label_mapping/SST-2/16-42.sort.txt --mapping_id 0"

NOTE: Make sure to use the corresponding automatic search results with different data split seeds.

Our final results (LM-BFF) take prompt-based fine-tuning with demonstrations, filtering and automatic template, for example:

for seed in 13 21 42 87 100
do
    for bs in 2 4 8
    do
        for lr in 1e-5 2e-5 5e-5
        do
            TAG=LM-BFF \
            TYPE=prompt-demo \
            TASK=SST-2 \
            BS=$bs \
            LR=$lr \
            SEED=$seed \
            MODEL=roberta-large \
            bash run_experiment.sh "--template_path auto_template/SST-2/16-$seed.sort.txt --template_id 0 --demo_filter --demo_filter_model sbert-roberta-large"
        done
    done
done

python tools/gather_result.py --condition "{'tag': 'LM-BFF', 'task_name': 'sst-2', 'few_shot_type': 'prompt-demo'}"

Search for automatic templates

If you want to try automatically generating templates by yourself, here are the instructions. Note that it is an extremely long process :)

To get automatic templates, we first generate template candidates by using T5:

python tools/generate_template.py \
    --output_dir my_auto_template \
    --task_name SST-2 \
    --seed 13 21 42 87 100 \
    --t5_model t5-3b \
    --beam 100

Where --t5_model specifies the pre-trained T5 checkpoint to use and --beam specifies the beam search width. Note that t5-3b model will take approximately 15GB GPU memory, and if your GPU does not support it, you can try smaller T5 models (e.g., t5-base).

Then we do prompt-based fine-tuning of all the templates

for template_id in {0..99}
do
    for seed in 13 21 42 87 100
    do
        # To save time, we fix these hyper-parameters
        bs=8
        lr=1e-5

        # Since we only use dev performance here, use --no_predict to skip testing
        TAG=exp-template \
        TYPE=prompt \
        TASK=SST-2 \
        BS=$bs \
        LR=$lr \
        SEED=$seed \
        MODEL=roberta-large \
        bash run_experiment.sh "--template_path my_auto_template/SST-2/16-$seed.txt --template_id $template_id --no_predict"
    done
done

... and sort them based on dev set performance:

python tools/sort_template.py --condition "{'tag': 'exp-template', 'task_name': 'sst-2'}" --template_dir my_auto_template

The sorted results will be saved in my_auto_template, with the same format as described in Automatically searched prompt.

Search for automatic label word mappings

Similar to the process of automatic template search, we first generate candidate label word mappings by running:

bash tools/run_generate_labels.sh

You can modify the options in tools/run_generate_labels.sh to run this for different datasets or save mappings to different directories. After running the generation, the candidate label mappings will be saved in my_auto_label_mapping/manual_template.

Then we do prompt-based fine-tuning of all the mappings by:

for mapping_id in {0..99}
do
    for seed in 13 21 42 87 100
    do
        # To save time, we fix these hyper-parameters
        bs=8
        lr=1e-5

        # Since we only use dev performance here, use --no_predict to skip testing
        TAG=exp-mapping \
        TYPE=prompt \
        TASK=SST-2 \
        BS=$bs \
        LR=$lr \
        SEED=$seed \
        MODEL=roberta-large \
        bash run_experiment.sh "--mapping_path my_auto_label_mapping/manual_template/SST-2/16-$seed.txt --mapping_id $mapping_id --no_predict"
    done
done

... and sort them based on dev set performance:

python tools/sort_mapping.py --condition "{'tag': 'exp-mapping', 'task_name': 'sst-2'}" --mapping_dir my_auto_label_mapping/manual_template

The sorted results will be saved in my_auto_label_mapping/manual_template, with the same format as described in Automatically searched prompt.

Auto T + L: We can also do a joint search of templates and label word mappings following these steps:

  1. First, do the automatic template search following Search for automatic templates.
  2. The following steps are similar to automatic label mapping except a few arguments. When running tools/run_generate_labels.sh, change LOAD_TEMPLATES to true in it and the template + mapping candidates will be written in my_auto_label_mapping/auto_template
  3. For the following fine-tuning, change --mapping_path and --mapping_id to --prompt_path and --prompt_id.
  4. In the end, for re-ranking all the prompts, change tools/sort_mapping.py to tools/sort_prompt.py to get the final lists.

Ensemble model

First we need to train models with different templates:

mkdir ensemble_predict_results
for template_id in {0..19} # Use top 20 templates
do
    array_id=0
    for seed in 13 21 42 87 100
    do
        for bs in 2 4 8
        do
            for lr in 1e-5 2e-5 5e-5
            do
                TAG=exp-ensemble \
                TYPE=prompt-demo \
                TASK=SST-2 \
                BS=$bs \
                LR=$lr \
                SEED=$seed \
                MODEL=roberta-large \
                bash run_experiment.sh "--template_path auto_template/SST-2/16-$seed.sort.txt --template_id $template_id --model_id $template_id --array_id $array_id --save_logit --save_logit_dir ensemble_predict_results"

                array_id=$(expr $array_id + 1)
            done
        done
    done
done

Looks a little complicated? It's actually pretty easy to understand: --model_id and --array_id is used to distinguish different runs, and --save_logit tells the program to save the prediction results for ensemble.

After finishing the experiments, use the following command to get the ensemble results:

python tools/ensemble.py --condition "{'tag': 'exp-ensemble', 'task_name': 'sst-2', 'few_shot_type': 'prompt-demo'}" --n_models 20

where --n_models specify how many models you want to use for ensemble (should be kept the same as the number of templates you use in experiments).

Zero-shot experiments

It's easy to run zero-shot experiments: just add the --no_train argument:

TAG=zero-shot TYPE=prompt TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--no_train"

To do "GPT-3 style" in-context learning:

TAG=gpt3-in-context TYPE=prompt-demo TASK=SST-2 BS=2 LR=1e-5 SEED=42 MODEL=roberta-large bash run_experiment.sh "--no_train --num_sample 1 --gpt3_in_context_head --gpt3_in_context_num 32 --truncate_head --use_full_length"

How to design your own templates

Here are two template examples:

For SST-2: *cls**sent_0*_It_was*mask*.*sep+* => [CLS] {S0} It was [MASK]. [SEP]

For MNLI: *cls**sent-_0*?*mask*,*+sentl_1**sep+* => [CLS] {S0}? [MASK], {S1} [SEP]

The template is composed of special tokens and variables (surrounded by *) and text (e.g., It_was, where space is replaced by _). Special tokens and variables contain:

  • *cls*, *sep*, *sep+* and *mask*: Special tokens of CLS, SEP and MASK (different for different pre-trained models and tokenizers). *sep+* means the contents before and after this token have different segment embeddings (only for BERT).
  • *sent_i*: The i-th sentence.
  • *sent-_i*: The i-th sentence, discarding the last character.
  • *sentl_i*: The i-th sentence, lower-casing the first letter.
  • *sentl-_i*: The i-th sentence, discarding the last character and lower-casing the first letter.
  • *+sent_i*: The i-th sentence, adding an extra space at the beginning.
  • *+sentl_i*: The i-th sentence, adding an extra space at the beginning and lower-casing the first letter.

Bugs or questions?

If you have any questions related to the code or the paper, feel free to email Tianyu ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

Please cite our paper if you use LM-BFF in your work:

@inproceedings{gao2021making,
   title={Making Pre-trained Language Models Better Few-shot Learners},
   author={Gao, Tianyu and Fisch, Adam and Chen, Danqi},
   booktitle={Association for Computational Linguistics (ACL)},
   year={2021}
}
Comments
  • Testing: New Data for GLUE Tasks

    Testing: New Data for GLUE Tasks

    Now, I can see that there was another issue similar to this. However, I am still not clear on how to deal with OOD Test Data.

    I want to train and validation on original train.tsv and dev.tsv in the folder ORIGINAL. But, I want to test on an out of distribution dataset.

    So, let's say I want to test SST-2 on IMDB for roberta-base. How should I go about it? Currently, I replace test.tsv in ORIGINAL folder and generate K shot data. The I run the file using the commands given on README on the repo page. However, the test eval accuracy is the same as the original SST-2 test dataset. I don't know what is happening here. To reiterate:

    My objective:

    1. Test IMDB on roberta-base 42 seed SST-2. But train and validate on original data provided with repo.

    Action:

    1. Replace test.tsv of ORIGINAL SST-2 with IMDB.

    Observed Behaviour:

    1. Same test eval accuracy as original one as if not replaced test.tsv.

    Expected Behaviour:

    1. Same test and dev accuracy, different test accuracy.

    Request:

    1. Please help :) We changed the original test.tsv and then generated K shot again, but there was no change.
    opened by YashBit 6
  • Is there a way to deal with label words with multiple tokens?

    Is there a way to deal with label words with multiple tokens?

    Hi,

    It seems like the model mainly deals with English and most labels contain only 1 token. However in Chinese tasks it's quite common that labels contain multiple tokens.

    I found in https://github.com/princeton-nlp/LM-BFF/blob/main/src/models.py#L75, the code says,

    sequence_output, pooled_output = outputs[:2]
    sequence_mask_output = sequence_output[torch.arange(sequence_output.size(0)), mask_pos]
    

    In which mask_pos have shape [batch_size,]. Is there a way I can make mask_pos into shape [batch_size, label_word_length] and use it to calculate loss of multi-token labels?

    opened by ryangawei 6
  • num_k parameter is not used

    num_k parameter is not used

    run.py accepts parameter num_k, which should control how many training examples per class we have. However, it's not used anywhere in the code. It seems that num_sample is used instead.

    opened by dyukha 4
  • Bug in dataloader ?

    Bug in dataloader ?

    Hi guys, I am trying to reproducing your work. In the dataloader, I found this code:

    for sample_idx in range(self.num_sample):
        for query_idx in range(len(self.query_examples)):
            # If training, exclude the current example. Else keep all.
            if self.use_demo and args.demo_filter:
                # Demonstration filtering
                candidate = [support_idx for support_idx in support_indices
                               if support_idx != query_idx or mode != "train"]
                sim_score = []
                for support_idx in candidate:
                    sim_score.append((support_idx, util.pytorch_cos_sim(self.support_emb[support_idx], self.query_emb[query_idx])))
                sim_score.sort(key=lambda x: x[1], reverse=True)
                if self.num_labels == 1:
                    # Regression task
                    limit_each_label = int(len(sim_score) // 2 * args.demo_filter_rate)
                    count_each_label = {'0': 0, '1': 0}
                    context_indices = []
    
                    if args.debug_mode:
                        print("Query %s: %s" % (self.query_examples[query_idx].label, self.query_examples[query_idx].text_a)) # debug
                    for support_idx, score in sim_score:
                        if count_each_label['0' if float(self.support_examples[support_idx].label) <= median_mapping[args.task_name] else '1'] < limit_each_label:
                            count_each_label['0' if float(self.support_examples[support_idx].label) <= median_mapping[args.task_name] else '1'] += 1
                            context_indices.append(support_idx)
                            if args.debug_mode:
                                print("    %.4f %s | %s" % (score, self.support_examples[support_idx].label, self.support_examples[support_idx].text_a)) # debug
                else:
                    limit_each_label = int(len(sim_score) // self.num_labels * args.demo_filter_rate)
                    count_each_label = {label: 0 for label in self.label_list}
                    context_indices = []
    
                    if args.debug_mode:
                        print("Query %s: %s" % (self.query_examples[query_idx].label, self.query_examples[query_idx].text_a)) # debug
                    for support_idx, score in sim_score:
                        if count_each_label[self.support_examples[support_idx].label] < limit_each_label:
                            count_each_label[self.support_examples[support_idx].label] += 1
                            context_indices.append(support_idx)
                            if args.debug_mode:
                                print("    %.4f %s | %s" % (score, self.support_examples[support_idx].label, self.support_examples[support_idx].text_a)) # debug
            else:
                # Using demonstrations without filtering
                context_indices = [support_idx for support_idx in support_indices
                           if support_idx != query_idx or mode != "train"]
    
            # We'll subsample context_indices further later.
            self.example_idx.append((query_idx, context_indices, sample_idx))
    

    Here it is calculating the similarity. But I don't know why you use this loop: for sample_idx in range(self.num_sample) at outermost, the sample_idx is only used when you add the result into self.sample_idx

    This codes is really slow, since you set the num_sample=16

    I think you can remove for sample_idx in range(self.num_sample) and change the last line as

    for query_idx in range(len(self.query_examples)):
        ....
        # We'll subsample context_indices further later.
        for sample_idx in range(self.num_sample):
            self.example_idx.append((query_idx, context_indices, sample_idx))
    

    I don't know whether am I right.

    opened by Brandonnogithub 3
  • Question about prompt-based finetuning and automatic selection of label words

    Question about prompt-based finetuning and automatic selection of label words

    In the paper, it mentions "Let M: Y → V be a mapping from the task label space to individual words in the vocabulary V of L." Here, V is the set of "individual words" or "individual sub-words"?

    I noticed that many auto-generated label words, such as "unforgettable/extraordinary/good/better/terrible" in SST-5 (Table E.1), are very long and should not be a single sub-word (from the view of a Roberta tokenizer). Then it seems that each label may contain multiple sub-words. In this case, the following sentence is confusing: "Then for each xin, let the manipulation xprompt = T (xin) be a masked language modeling (MLM) input which contains one [MASK] token." I'm not sure how one [MASK] token can reconstruct multiple tokens (sub-words), like "unforgettable".

    This issue is also related to the automatic selection of label words, to determine whether we are searching over all the sub-words or all the words.

    Could the authors clarify this detail?

    opened by pzzhang 3
  • ImportError: cannot import name 'BertOnlyMLMHead' from 'transformers'

    ImportError: cannot import name 'BertOnlyMLMHead' from 'transformers'

    I installed the environment using pip install -r requirements.txt

    However, when I ran the example code,

    python run.py \
        --task_name SST-2 \
        --data_dir data/k-shot/SST-2/16-42 \
        --overwrite_output_dir \
        --do_train \
        --do_eval \
        --do_predict \
        --evaluate_during_training \
        --model_name_or_path roberta-large \
        --few_shot_type prompt-demo \
        --num_k 16 \
        --max_steps 1000 \
        --eval_steps 100 \
        --per_device_train_batch_size 2 \
        --learning_rate 1e-5 \
        --num_train_epochs 0 \
        --output_dir result/tmp \
        --seed 42 \
        --template "*cls**sent_0*_It_was*mask*.*sep+*" \
        --mapping "{'0':'terrible','1':'great'}" \
        --num_sample 16 \
    

    I encountered the following error:

    Traceback (most recent call last):
      File "run.py", line 19, in <module>
        from src.models import BertForPromptFinetuning, RobertaForPromptFinetuning, resize_token_type_embeddings
      File "/home/xuhui/project/LM-BFF/src/models.py", line 6, in <module>
        from transformers import BertPreTrainedModel, BertForSequenceClassification, BertModel, BertOnlyMLMHead
    ImportError: cannot import name 'BertOnlyMLMHead' from 'transformers'
    

    I make sure that transformers==3.4.0. Any chance you may have more insight on this?

    opened by XuhuiZhou 3
  • ValueError Some specified arguments are not used by the HfArgumentParser

    ValueError Some specified arguments are not used by the HfArgumentParser

    When running the run.py in the Quick Start session under Run LM-BFF, I get

    raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining args}"

    I believe it is complaining about --evaluate_during_training. Is this deprecated? Should this simply be removed, or is there a replacement? I have had to fix the paths to Bert and Roberta in the imports so I think this just has to do with the evolution of transformers since this was originally published.

    opened by demongolem-biz 3
  • [SST-2] Different test.tsv for original and k-shot data

    [SST-2] Different test.tsv for original and k-shot data

    Hello, I noticed that test.tsv for k-shot data contains 872 sentences, whereas original folder contains 1820 for SST-2 task. Is it correct behavior? Perhaps to save on computational time?

    opened by skull8888888 3
  • Bug in FewShotDataset module when accessing TextClassificationProcessor

    Bug in FewShotDataset module when accessing TextClassificationProcessor

    Hi, there is a bug in this line. For the text classification datasets (mr, sst-5, subj, trec, cr, mpqa), the processor module requires the task name as input. Right now the above line throws error.

    opened by ramakanth-pasunuru 3
  • Bug in get_sbert_embedding.py?

    Bug in get_sbert_embedding.py?

    Hi,

    When I run get_bert_embedding.bash, I get an error bellow.

    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/sentence-transformers/roberta-large-nli-stsb-mean-tokens (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))

    Is this your code's problem or sentence-transformers' problem? Thanks a lot !

    opened by lancorrect 2
  • Bug in tools/ensemble.py/get_labels

    Bug in tools/ensemble.py/get_labels

    The function returns label_ids. However, when print_name in ['sst-5', 'mr', 'cr', 'mpqa', 'subj', 'trec'], label_ids is not defined. I guess there should be return labels in this branch?

    opened by dyukha 2
  • The issue for the loss of regression tasks

    The issue for the loss of regression tasks

    https://github.com/princeton-nlp/LM-BFF/blob/c282f521001f9c299d29eec7b459266f2b14fbaf/src/models.py#L184

    Hi Tianyu, Thank you for releasing the code. I found one problem with the loss of regression tasks.

    "logits" is through the operation "logsoftmax" and is in [-infinite, 0], while "labels" is not through that operation and is always greater than 0. They are not in the same space, so I think here log_target cannot be True. Or the "labels" should be operated by "torch.log(labels+very small number)".

    What do you think of it?

    Look forward to your reply.

    Best wishes, Chuan

    opened by ChuanMeng 1
Owner
Princeton Natural Language Processing
Princeton Natural Language Processing
Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

Bo Zheng 42 Dec 9, 2022
Codes for "Template-free Prompt Tuning for Few-shot NER".

EntLM The source codes for EntLM. Dependencies: Cuda 10.1, python 3.6.5 To install the required packages by following commands: $ pip3 install -r requ

null 77 Dec 27, 2022
EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

Ruiqi Zhong 42 Nov 3, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021. Introduction We proposed a novel model training paradi

Lucas 103 Dec 14, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
Few-NERD: Not Only a Few-shot NER Dataset

Few-NERD: Not Only a Few-shot NER Dataset This is the source code of the ACL-IJCNLP 2021 paper: Few-NERD: A Few-shot Named Entity Recognition Dataset.

THUNLP 319 Dec 30, 2022
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 4, 2023
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 4, 2023
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN ?? : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 4, 2023
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Transformer-vocabulary-transfer Implementation of the paper "Fine-Tuning Transfo

LEYA 13 Nov 30, 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning This repository is official Tensorflow implementation of paper: Ensemb

Seunghyun Lee 12 Oct 18, 2022
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

One model to speak them all ?? Audio Language Text ▷ Chinese 人人生而自由,在尊严和权利上一律平等。 ▷ English All human beings are born free and equal in dignity and rig

Mutian He 60 Nov 14, 2022
An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

Channel LM Prompting (and beyond) This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Cha

Sewon Min 92 Jan 7, 2023
PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

Jiaming Song 90 Dec 27, 2022