Meta-learning for NLP


Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Code for training the meta-learning models and fine-tuning on downstream tasks. If you use this code please cite the paper.

Paper: Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

  title={Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks},
  author={Bansal, Trapit and Jha, Rishikesh and Munkhdalai, Tsendsuren and McCallum, Andrew},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},

Trained Models


  • Python version 3.6.6 or higher
  • Tensorflow version 1.12.0 (higher versions might not work)
  • Numpy 1.16.4 or higher
  • six 1.12.0

pip install -r requirements.txt should install required depedencies. It is recommended to use a conda environment and make sure to use the pip installed in the environment.


A script is provided to run fine-tuning for a target task, by default it runs fine-tuning on CoNLL. The script will download all necessary data and models, note that in case downloads fail please download the files manually using the links.

Fine-tuning runs on a single GPU and typically takes a few minutes.

Run the script as: ./

Modify the following parameters in to run on a different task, or a different k-shot, or a different file split for the task:

  • TASK_NAME: should be one of: airline, conll, disaster, emotion, political_audience, political_bias, political_message, rating_books, rating_dvd, rating_electronics, rating_kitchen, restaurant, scitail, sentiment_books, sentiment_dvd, sentiment_electronics, sentiment_kitchen
  • DATA_DIR: path to data directory (eg., data/leopard-master/data/tf_record/${TASK_NAME})
  • F: file train split id, should be in [0, 9]
  • K: which k-shot experiment to run, should be in {4, 8, 16, 32}
  • N: number of classes in the task (see paper if not known)

So, the fine-tuning run command to run on a particular split for a task is: ./ TASK_NAME F K N

To change the output directory or other arguments, edit the corresponding arguments in

Hyper-parameters for Hybrid-SMLMT

  • K = 4:

  • K = 8:

  • K = 16:

  • K = 32:

Data for fine-tuning

The data for the fine-tuning tasks can be downloaded from

Fine-tuning on other tasks

To run fine-tuning on a different task than provided with the code, you will need to set up the train and test data for the task in a tf_record file, similar to the data for the provided tasks.

The features in the tf_record are:

name_to_features = {
      "input_ids": tf.FixedLenFeature([128], tf.int64),
      "input_mask": tf.FixedLenFeature([128], tf.int64),
      "segment_ids": tf.FixedLenFeature([128], tf.int64),
      "label_ids": tf.FixedLenFeature([], tf.int64),


  • input_ids: the input sequence tokenized using the BERT tokenizer
  • input_mask: mask of 0/1 corresponding to the input_ids
  • segment_ids: 0/1 segment ids following BERT
  • label_ids: classification label

Note that the above features are same as that used in the code of BERT fine-tuning for classification, so code in the BERT github repository can be used for creating the tf_record files.

The followiing arguments to need to be set:

  • task_eval_files: train_tf_record, eval_tf_record
    • where train_tf_record is the train file for the task and eval_tf_record is the test file
  • test_num_labels: number of classes in the task

LEOPARD Fine-tuning

Hyper-parameters for the LEOPARD model:

  • K = 4:

  • K = 8:
    --num_train_epochs=200*N --train_batch_size=2*N

  • K = 16:
    --num_train_epochs=200*N --train_batch_size=4*N

  • K = 32:
    --num_train_epochs=50*N --train_batch_size=2*N

In addition, set the argument warp_layers=false for fine-tuning the LEOPARD model.


This requires large training time and typically should be run on multiple GPU.

SMLMT data file name should begin with "meta_pretain" and end with the value of N for the tasks in that file (on file per N), for example "meta_pretrain_3.tf_record" for 3-way tasks. The training code will take train_batch_size many examples at a time starting from the beginning of the files (without shuffling) and treat that as one task for training.

Meta-training can be run using the following command:

python \
    --do_train=true \
    --task_train_files=${TRAIN_FILES} \
    --num_train_epochs=1 \
    --save_checkpoints_steps=5000 \
    --max_seq_length=128 \
    --task_eval_files=${TASK_EVAL_FILES} \
    --tasks_per_gpu=1 \
    --num_eval_tasks=1 \
    --num_gpus=4 \
    --learning_rate=1e-05 \
    --train_lr=1e-05 \
    --keep_prob=0.9 \
    --attention_probs_dropout_prob=0.1 \
    --hidden_dropout_prob=0.1 \
    --SGD_K=1 \
    --meta_batchsz=80 \
    --num_batches=8 \
    --train_batch_size=90 \
    --min_layer_with_grad=0 \
    --train_word_embeddings=true \
    --use_pooled_output=true \
    --output_layers=2 \
    --update_only_label_embedding=true \
    --use_euclidean_norm=false \
    --label_emb_size=256 \
    --stop_grad=true \
    --eval_batch_size=90 \
    --eval_examples_per_task=2000 \
    --is_meta_sgd=true \
    --data_sqrt_sampling=true \
    --deep_set_layers=0 \
    --activation_fn=tanh \
    --clip_lr=true \
    --inner_epochs=1 \
    --warp_layers=true \
    --min_inner_steps=5 \
    --average_query_every=3 \
    --weight_query_loss=true \
    --output_dir=${output_dir} \


Code is based on the public repository:

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.

You might also like...
EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

Official Pytorch implementation of Meta Internal Learning

Official Pytorch implementation of Meta Internal Learning

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

ST This is the code of NeurIPS 2021 paper "Towards Enabling Meta-Learning from Target Models". If you use any content of this repo for your work, plea

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning
[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

PAML PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021. (Continuously updating ) Int

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

  •  The task-specific weights for h_phi are not adapted in the inner loop

    The task-specific weights for h_phi are not adapted in the inner loop

    Hello, sir. I run the code in the repository and got some questions about it :

    1. I use the script to run finetuning on the CoNLL task and I notice that the weights of the 2-layer MLP (which is denoted as h_phi in the paper) do not change as the training goes on. And I found that the reason is that the per-layer learning rates for h_phi are initialized to 0 and are set to untrainable as follows: image

    This means that the task-specific weights for h_phi are not adapted in the inner loop. It is inconsistent with what the paper says.

    1. The learning rates for the 2-layer MLP (which is denoted as g_psi in the paper) seems redundant as they haven't been used in the adaption phase.
    opened by xuegsh 1
  • Unavailability of K=32 shot data

    Unavailability of K=32 shot data


    From, it looks like data for K=32 is missing. Is there a workaround for this issue?

    When I try to fine-tune the models, I get this result -

    2022-05-07 12:15:51.681381: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2022-05-07 12:15:51.685349: I tensorflow/core/platform/profile_utils/] CPU Frequency: 2199995000 Hz
    2022-05-07 12:15:51.685615: I tensorflow/compiler/xla/service/] XLA service 0x2a7e100 executing computations on platform Host. Devices:
    2022-05-07 12:15:51.685654: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): <undefined>, <undefined>
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)]) UserWarning: Flag --data_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
      flags.mark_flag_as_required("data_dir") UserWarning: Flag --task_train_files has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
      flags.mark_flag_as_required("task_train_files") UserWarning: Flag --vocab_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
      flags.mark_flag_as_required("vocab_file") UserWarning: Flag --bert_config_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
      flags.mark_flag_as_required("bert_config_file") UserWarning: Flag --output_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
    INFO:tensorflow:GPU available: False
    INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
    WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
    WARNING:tensorflow:From tf_record_iterator (from is deprecated and will be removed in a future version.
    Instructions for updating:
    Use eager execution and: 
    Traceback (most recent call last):
      File "", line 1768, in <module>
      File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/", line 125, in run
      File "", line 1688, in main
        neval_examples, nexamples_per_file_eval_train = read_data_sizes_from_tfrecord([task_eval_files[0]])
      File "", line 1537, in read_data_sizes_from_tfrecord
        for record in tf.python_io.tf_record_iterator(fn):
      File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/", line 174, in tf_record_iterator
        compat.as_bytes(path), 0, compat.as_bytes(compression_type), status)
      File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/", line 528, in __exit__
    tensorflow.python.framework.errors_impl.NotFoundError: data/leopard-master/data/tf_record/airline/airline_train_4_32.tf_record; No such file or directory
    opened by avyavkumar 0
  • how do you set these hyperparameters?

    how do you set these hyperparameters?


    Thanks for your organized repository and for sharing your code. Your paper about Self-Supervised Meta-Learning is interesting. I have a few questions and I appreciate it if you could help me understand more about the details of your paper.

    In your paper, you set "Support samples per task = 80", "Query samples per task = 10", "Adaptation Steps (G) = 7" , "Meta-training Epochs = 1". Meanwhile, in your codes, you fix eval epochs to 5 by setting "inner_epochs = 5". I wonder how you choose these hyperparameters.

    Besides, is it means that every SMLMT task used for meta training contains exactly 80 + 10 examples ?


    opened by eddieee7 0
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

Kate Rakelly 516 Jan 5, 2023
Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

L2F - Learning to Forget for Meta-Learning Sungyong Baik, Seokil Hong, Kyoung Mu Lee Source code for CVPR 2020 paper "Learning to Forget for Meta-Lear

Sungyong Baik 29 May 22, 2022
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
Code for "Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation". [AAAI 2021]

Graph Evolving Meta-Learning for Low-resource Medical Dialogue Generation Code to be further cleaned... This repo contains the code of the following p

Shuai Lin 29 Nov 1, 2022
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

DeepMetaHandles (CVPR2021 Oral) [paper] [animations] DeepMetaHandles is a shape deformation technique. It learns a set of meta-handles for each given

Liu Minghua 73 Dec 15, 2022
Meta Representation Transformation for Low-resource Cross-lingual Learning

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning This repo hosts the code for MetaXL, published at NAACL 2021. [Meta

Microsoft 36 Aug 17, 2022
Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

MetaSDF: Meta-learning Signed Distance Functions Project Page | Paper | Data Vincent Sitzmann*, Eric Ryan Chan*, Richard Tucker, Noah Snavely Gordon W

Vincent Sitzmann 100 Jan 1, 2023
NeRF Meta-Learning with PyTorch

NeRF Meta Learning With PyTorch nerf-meta is a PyTorch re-implementation of NeRF experiments from the paper "Learned Initializations for Optimizing Co

Sanowar Raihan 78 Dec 18, 2022
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye

MilaGraph 36 Nov 22, 2022