Meta-learning for NLP

IESL

Last update: Nov 8, 2022

Related tags

Deep Learning metanlp

Overview

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Code for training the meta-learning models and fine-tuning on downstream tasks. If you use this code please cite the paper.

Paper: Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

@inproceedings{bansal2020self,
  title={Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks},
  author={Bansal, Trapit and Jha, Rishikesh and Munkhdalai, Tsendsuren and McCallum, Andrew},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={522--534},
  year={2020}
}

Trained Models

Dependencies

Python version 3.6.6 or higher
Tensorflow version 1.12.0 (higher versions might not work)
Numpy 1.16.4 or higher
six 1.12.0

pip install -r requirements.txt should install required depedencies. It is recommended to use a conda environment and make sure to use the pip installed in the environment.

Fine-Tuning

A script is provided to run fine-tuning for a target task, by default it runs fine-tuning on CoNLL. The script will download all necessary data and models, note that in case downloads fail please download the files manually using the links.

Fine-tuning runs on a single GPU and typically takes a few minutes.

Run the script as: ./run_finetune.sh

Modify the following parameters in run_finetune.sh to run on a different task, or a different k-shot, or a different file split for the task:

TASK_NAME: should be one of: airline, conll, disaster, emotion, political_audience, political_bias, political_message, rating_books, rating_dvd, rating_electronics, rating_kitchen, restaurant, scitail, sentiment_books, sentiment_dvd, sentiment_electronics, sentiment_kitchen
DATA_DIR: path to data directory (eg., data/leopard-master/data/tf_record/${TASK_NAME})
F: file train split id, should be in [0, 9]
K: which k-shot experiment to run, should be in {4, 8, 16, 32}
N: number of classes in the task (see paper if not known)

So, the fine-tuning run command to run on a particular split for a task is: ./run_finetune.sh TASK_NAME F K N

To change the output directory or other arguments, edit the corresponding arguments in run_finetune.sh

Hyper-parameters for Hybrid-SMLMT

K = 4:
--num_train_epochs=150*N
--train_batch_size=4*N
K = 8:
--num_train_epochs=175*N
--train_batch_size=8*N
K = 16:
--num_train_epochs=200*N
--train_batch_size=4*N
K = 32:
--num_train_epochs=100*N
--train_batch_size=8*N

Data for fine-tuning

The data for the fine-tuning tasks can be downloaded from https://github.com/iesl/leopard

Fine-tuning on other tasks

To run fine-tuning on a different task than provided with the code, you will need to set up the train and test data for the task in a tf_record file, similar to the data for the provided tasks.

The features in the tf_record are:

name_to_features = {
      "input_ids": tf.FixedLenFeature([128], tf.int64),
      "input_mask": tf.FixedLenFeature([128], tf.int64),
      "segment_ids": tf.FixedLenFeature([128], tf.int64),
      "label_ids": tf.FixedLenFeature([], tf.int64),
  }

where:

input_ids: the input sequence tokenized using the BERT tokenizer
input_mask: mask of 0/1 corresponding to the input_ids
segment_ids: 0/1 segment ids following BERT
label_ids: classification label

Note that the above features are same as that used in the code of BERT fine-tuning for classification, so code in the BERT github repository can be used for creating the tf_record files.

The followiing arguments to run_classifier_pretrain.py need to be set:

task_eval_files: train_tf_record, eval_tf_record
- where train_tf_record is the train file for the task and eval_tf_record is the test file
test_num_labels: number of classes in the task

LEOPARD Fine-tuning

Hyper-parameters for the LEOPARD model:

K = 4:
--num_train_epochs=150*N
--train_batch_size=2*N
K = 8:
--num_train_epochs=200*N --train_batch_size=2*N
K = 16:
--num_train_epochs=200*N --train_batch_size=4*N
K = 32:
--num_train_epochs=50*N --train_batch_size=2*N

In addition, set the argument warp_layers=false for fine-tuning the LEOPARD model.

Meta-Training

This requires large training time and typically should be run on multiple GPU.

SMLMT data file name should begin with "meta_pretain" and end with the value of N for the tasks in that file (on file per N), for example "meta_pretrain_3.tf_record" for 3-way tasks. The training code will take train_batch_size many examples at a time starting from the beginning of the files (without shuffling) and treat that as one task for training.

Meta-training can be run using the following command:

python run_classifier_pretrain.py \
    --do_train=true \
    --task_train_files=${TRAIN_FILES} \
    --num_train_epochs=1 \
    --save_checkpoints_steps=5000 \
    --max_seq_length=128 \
    --task_eval_files=${TASK_EVAL_FILES} \
    --tasks_per_gpu=1 \
    --num_eval_tasks=1 \
    --num_gpus=4 \
    --learning_rate=1e-05 \
    --train_lr=1e-05 \
    --keep_prob=0.9 \
    --attention_probs_dropout_prob=0.1 \
    --hidden_dropout_prob=0.1 \
    --SGD_K=1 \
    --meta_batchsz=80 \
    --num_batches=8 \
    --train_batch_size=90 \
    --min_layer_with_grad=0 \
    --train_word_embeddings=true \
    --use_pooled_output=true \
    --output_layers=2 \
    --update_only_label_embedding=true \
    --use_euclidean_norm=false \
    --label_emb_size=256 \
    --stop_grad=true \
    --eval_batch_size=90 \
    --eval_examples_per_task=2000 \
    --is_meta_sgd=true \
    --data_sqrt_sampling=true \
    --deep_set_layers=0 \
    --activation_fn=tanh \
    --clip_lr=true \
    --inner_epochs=1 \
    --warp_layers=true \
    --min_inner_steps=5 \
    --average_query_every=3 \
    --weight_query_loss=true \
    --output_dir=${output_dir} \
    --pretrain_task_weight=0.5

References:

Code is based on the public repository: https://github.com/google-research/bert

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805, 2018.

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections Ruiqi Zhong, Kristy Lee*, Zheng Zhang*, Dan Klein EMN

42 Nov 3, 2022

Official Pytorch implementation of Meta Internal Learning

10 Aug 24, 2022

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

ST This is the code of NeurIPS 2021 paper "Towards Enabling Meta-Learning from Target Models". If you use any content of this repo for your work, plea

7 Dec 6, 2022

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

66 Jan 4, 2023

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

15 Nov 21, 2022

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

16 Dec 26, 2022

CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

2 Dec 9, 2021

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

PAML PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021. (Continuously updating ) Int

15 Nov 18, 2022

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Jittor: a Just-in-time(JIT) deep learning framework Quickstart | Install | Tutorial | Chinese Jittor is a high-performance deep learning framework bas

2.7k Jan 3, 2023

Comments

The task-specific weights for h_phi are not adapted in the inner loop
Hello, sir. I run the code in the repository and got some questions about it :

I use the script to run finetuning on the CoNLL task and I notice that the weights of the 2-layer MLP (which is denoted as h_phi in the paper) do not change as the training goes on. And I found that the reason is that the per-layer learning rates for h_phi are initialized to 0 and are set to untrainable as follows:

This means that the task-specific weights for h_phi are not adapted in the inner loop. It is inconsistent with what the paper says.

The learning rates for the 2-layer MLP (which is denoted as g_psi in the paper) seems redundant as they haven't been used in the adaption phase.
opened by xuegsh 1

Unavailability of K=32 shot data

Hi,

From https://github.com/iesl/leopard, it looks like data for K=32 is missing. Is there a workaround for this issue?

When I try to fine-tune the models, I get this result -

2022-05-07 12:15:51.681381: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-05-07 12:15:51.685349: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2022-05-07 12:15:51.685615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2a7e100 executing computations on platform Host. Devices:
2022-05-07 12:15:51.685654: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
run_classifier_pretrain.py:1763: UserWarning: Flag --data_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  flags.mark_flag_as_required("data_dir")
run_classifier_pretrain.py:1764: UserWarning: Flag --task_train_files has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  flags.mark_flag_as_required("task_train_files")
run_classifier_pretrain.py:1765: UserWarning: Flag --vocab_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  flags.mark_flag_as_required("vocab_file")
run_classifier_pretrain.py:1766: UserWarning: Flag --bert_config_file has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  flags.mark_flag_as_required("bert_config_file")
run_classifier_pretrain.py:1767: UserWarning: Flag --output_dir has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  flags.mark_flag_as_required("output_dir")
INFO:tensorflow:GPU available: False
INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
WARNING:tensorflow:From run_classifier_pretrain.py:1537: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
Traceback (most recent call last):
  File "run_classifier_pretrain.py", line 1768, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "run_classifier_pretrain.py", line 1688, in main
    neval_examples, nexamples_per_file_eval_train = read_data_sizes_from_tfrecord([task_eval_files[0]])
  File "run_classifier_pretrain.py", line 1537, in read_data_sizes_from_tfrecord
    for record in tf.python_io.tf_record_iterator(fn):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/tf_record.py", line 174, in tf_record_iterator
    compat.as_bytes(path), 0, compat.as_bytes(compression_type), status)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: data/leopard-master/data/tf_record/airline/airline_train_4_32.tf_record; No such file or directory

opened by avyavkumar 0

how do you set these hyperparameters?

Hi,

Thanks for your organized repository and for sharing your code. Your paper about Self-Supervised Meta-Learning is interesting. I have a few questions and I appreciate it if you could help me understand more about the details of your paper.

In your paper, you set "Support samples per task = 80", "Query samples per task = 10", "Adaptation Steps (G) = 7" , "Meta-training Epochs = 1". Meanwhile, in your codes, you fix eval epochs to 5 by setting "inner_epochs = 5". I wonder how you choose these hyperparameters.

Besides, is it means that every SMLMT task used for meta training contains exactly 80 + 10 examples ?

Thanks.

opened by eddieee7 0

Owner

IESL

GitHub

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

57 Dec 15, 2022

Meta-learning for NLP

Related tags

Overview

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trained Models

Dependencies

Fine-Tuning

Data for fine-tuning

Fine-tuning on other tasks

LEOPARD Fine-tuning

Meta-Training

References:

You might also like...

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Official Pytorch implementation of Meta Internal Learning

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

CNN Based Meta-Learning for Noisy Image Classification and Template Matching

PyTorch implementation of the paper: "Preference-Adaptive Meta-Learning for Cold-Start Recommendation", IJCAI, 2021.

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Comments

The task-specific weights for h_phi are not adapted in the inner loop

Unavailability of K=32 shot data

how do you set these hyperparameters?

Owner

IESL

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

Code for "Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation". [AAAI 2021]

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

Meta Representation Transformation for Low-resource Cross-lingual Learning

Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

NeRF Meta-Learning with PyTorch

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs