EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Ruiqi Zhong

Last update: Nov 3, 2022

Related tags

Deep Learning Meta-tuning

Overview

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Ruiqi Zhong, Kristy Lee^*, Zheng Zhang^*, Dan Klein

EMNLP 2021 Findings, https://arxiv.org/abs/2104.04670

Data

Please download the dataset from here: https://drive.google.com/file/d/1hrLlpk6Pla95Bnv_e1MAhCx7uJSDgA-w/view?usp=sharing

If you are using this dataset, please cite all the papers in the custom_citations.txt, anthology_citations.txt, urls.txt file in the citations folder. Thanks!

Each datapoint is represented as a dictionary.

{"q": [label description], "c": [text input], "a": [0 or 1]},

where "q" stands for question, which contains label information, "c" stands for context, which contains the input text, "a" stands for answer, which is either 1 (Yes) or 0 (No).

training_dicts/ contains all the datasets for training, and each of the .pkl file is a list of datapoints. testing_dicts/ contains all the datasets for evaluation, and each of the .pkl file is a map from (label, label descriptions) to a list of datapoints.

Datasets that have the same group number in front of their filenames are considered similar. Notice that, for each dataset, there might be overlapping datapoints between the training and testing split, but it is okay since we never train and test on the same dataset.

Additionally, to speedup evaluation, we performed subsampling for many of the test datasets, so the numbers will not be directly comparable to those in the other paper.

Specialized Models are Better

Meta-tune a model that is initialized with T5-large and test it on unseen (non-similar) datasets

python3 default_train.py large

Test UnifiedQA on all datatsets used for evaluation

python3 baseline.py large

Evaluate and compare the meta-tuned model and the UnifiedQA baseline with AUC-ROC for each label description.

python3 evaluate_and_plot.py large

We should expect to see that meta-tuned model is better than the UnifiedQA model on the majority of label descriptions.

Larger Models are Better

We can train another smaller-sized model using the command

python3 default_train.py base

and then we can compare the large vs. base model modifying evaluate_and_plot.py.

Comments

Questions about the data / models
Hi @ruiqi-zhong, thank you for such an awesome work and for releasing the data and the code!

I am very interested in using your setup, and have a few questions.

Do you have the corresponding original data, before being converted to a QA format? (so, just a classification format) I know you used the public data but I'm curious if there's a 1-1 mapping between the original datapoint and the converted datapoint.

You've mentioned that test datasets were subsampled. Are test datasets in the released data already subsampled, or are they the original ones before being subsampled? If it is the latter, do you plan to release the subsampled one?

I have a question on what the model does at inference time. Consider a binary classification task with an input sentence "This movie is amazing." Based on my understanding, this datapoint can be converted to two question-answer pairs:

{q1="This movie is amazing. Positive Review?", a1="Yes"}

{q2="This movie is amazing. Negative Review?", a2="No"} My understanding is you randomly & uniformly sample between two pairs during meta-tuning. What happens at inference time (both without and with meta-tuning)? I guess you can compute four values: P(Yes|q1), P(No,q1), P(Yes|q2), P(No|q2). How do you use these four values to make a decision? (Apologies if this is already clarified in the paper and I've missed it.) (Plus, more clarification for the format of the test data would be appreciated - I was not able to interpret it and how they are fed into the model.)

Thanks!
opened by shmsw25 2

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Related tags

Overview

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Data

Specialized Models are Better

Larger Models are Better

You might also like...

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

Learning to Prompt for Vision-Language Models.

The Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Meta Language-Specific Layers in Multilingual Language Models

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

Comments

Questions about the data / models

Owner

Ruiqi Zhong

Codes for "Template-free Prompt Tuning for Few-shot NER".

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Code for the paper "Adapting Monolingual Models: Data can be Scarce when Language Similarity is High"

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

GEP (GDB Enhanced Prompt) - a GDB plug-in for GDB command prompt with fzf history search, fish-like autosuggestions, auto-completion with floating window, partial string matching in history, and more!

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

The Power of Scale for Parameter-Efficient Prompt Tuning