Multi-Task Deep Neural Networks for Natural Language Understanding

Xiaodong

Last update: Dec 30, 2022

Related tags

Deep Learning microsoft deep-learning named-entity-recognition ranking bert natural-language-understanding multi-task-learning

Overview

New Release
We released Adversarial training for both LM pre-training/finetuning and f-divergence.

Large-scale Adversarial training for LMs: ALUM code.
If you want to use the old version, please use following cmd to clone the code:
git clone -b v0.1 https://github.com/namisan/mt-dnn.git

Multi-Task Deep Neural Networks for Natural Language Understanding

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:

Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
ACL 2019
*: Equal contribution

Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
arXiv version

Pengcheng He, Xiaodong Liu, Weizhu Chen and Jianfeng Gao
Hybrid Neural Network Model for Commonsense Reasoning
arXiv version

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Jiawei Han
On the Variance of the Adaptive Learning Rate and Beyond
arXiv version

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
arXiv version

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version

Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon and Jianfeng Gao
Adversarial Training for Large Neural Language Models
arXiv version

Hao Cheng and Xiaodong Liu and Lis Pereira and Yaoliang Yu and Jianfeng Gao
Posterior Differential Regularization with f-divergence for Improving Model Robustness
arXiv version

Quickstart

Setup Environment

Install via pip:

python3.6
Reference to download and install : https://www.python.org/downloads/release/python-360/
install requirements
> pip install -r requirements.txt

Use docker:

Pull docker
> docker pull allenlao/pytorch-mt-dnn:v0.5
Run docker
> docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v1.2 bash
Please refer to the following link if you first use docker: https://docs.docker.com/

Train a toy MT-DNN model

Download data
> sh download.sh
Please refer to download GLUE dataset: https://gluebenchmark.com/
Preprocess data
> sh experiments/glue/prepro.sh
Training
> python train.py

Note that we ran experiments on 4 V100 GPUs for base MT-DNN models. You may need to reduce batch size for other GPUs.

GLUE Result reproduce

MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.
- Preprocess GLUE data via the aforementioned script
- Training:
  >scripts\run_mt_dnn.sh
Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
Here, we provide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.
- Finetune on the STS-B task
  > scripts\run_stsb.sh
  You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation.
- Finetune on the RTE task
  > scripts\run_rte.sh
  You should get about 83.8 on RTE dev in terms of accuracy.

SciTail & SNIL Result reproduce (Domain Adaptation)

Domain Adaptation on SciTail
>scripts\scitail_domain_adaptation_bash.sh
Domain Adaptation on SNLI
>scripts\snli_domain_adaptation_bash.sh

Sequence Labeling Task

Preprocess data
a) Download NER data to data/ner including: {train/valid/test}.txt
b) Convert NER data to the canonical format: > python experiments\ner\prepro.py --data data\ner --output_dir data\canonical_data
c) Preprocess the canonical data to the MT-DNN format: > python prepro_std.py --root_dir data\canonical_data --task_def experiments\ner\ner_task_def.yml --model bert-base-uncased
Training
> python train.py --data_dir <data-path> --init_checkpoint <bert-base-uncased> --train_dataset squad,squad-v2 --test_dataset squad,squad-v2 --task_def experiments\squad\squad_task_def.yml

Question Answer Task

Preprocess data
a) Download SQuAD data to data/squad including: {train/valid}.txt and then change file name to: {squad_train/squad_dev}.json
b) Convert data to the MT-DNN format: > python experiments\squad\squad_prepro.py --root_dir data\canonical_data --task_def experiments\squad\squad_task_def.yml --model bert-base-uncased
Training
> python train.py --data_dir <data-path> --init_checkpoint <bert-model> --train_dataset ner --test_dataset ner --task_def experiments\ner\ner_task_def.yml

SMART

Adv training at the fine-tuning stages: > python train.py --data_dir <data-path> --init_checkpoint <bert/mt-dnn-model> --train_dataset mnli --test_dataset mnli_matched,mnli_mismatched --task_def experiments\glue\glue_task_def.yml --adv_train --adv_opt 1

HNN

The code to reproduce HNN is under hnn folder, to reproduce the results of HNN, run

> hnn/script/hnn_train_large.sh

Extract embeddings

Extracting embeddings of a pair text example
>python extractor.py --do_lower_case --finput input_examples\pair-input.txt --foutput input_examples\pair-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt
Note that the pair of text is split by a special token |||. You may refer input_examples\pair-output.json as example.
Extracting embeddings of a single sentence example
>python extractor.py --do_lower_case --finput input_examples\single-input.txt --foutput input_examples\single-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt

Speed up Training

Gradient Accumulation
If you have small GPUs, you may need to use the gradient accumulation to make training stable.
For example, if you use the flag: --grad_accumulation_step 4 during the training, the actual batch size will be batch_size * 4.
FP16 The current version of MT-DNN also supports FP16 training, and please install apex.
You just need to turn on the flag during the training: --fp16
Please refer the script: scripts\run_mt_dnn_gc_fp16.sh

Convert Tensorflow BERT model to the MT-DNN format

Here, we go through how to convert a Chinese Tensorflow BERT model into mt-dnn format.

Download BERT model from the Google bert web: https://github.com/google-research/bert
Run the following script for MT-DNN format
python scripts\convert_tf_to_pt.py --tf_checkpoint_root chinese_L-12_H-768_A-12\ --pytorch_checkpoint_path chinese_L-12_H-768_A-12\bert_base_chinese.pt

TODO

Publish pretrained Tensorflow checkpoints.

FAQ

Did you share the pretrained mt-dnn models?

Yes, we released the pretrained shared embedings via MTL which are aligned to BERT base/large models: mt_dnn_base.pt and mt_dnn_large.pt.
To obtain the similar models:

run the >sh scripts\run_mt_dnn.sh, and then pick the best checkpoint based on the average dev preformance of MNLI/RTE.
strip the task-specific layers via scritps\strip_model.py.

Why SciTail/SNLI do not enable SAN?

For SciTail/SNLI tasks, the purpose is to test generalization of the learned embedding and how easy it is adapted to a new domain instead of complicated model structures for a direct comparison with BERT. Thus, we use a linear projection on the all domain adaptation settings.

What is the difference between V1 and V2

The difference is in the QNLI dataset. Please refere to the GLUE official homepage for more details. If you want to formulate QNLI as pair-wise ranking task as our paper, make sure that you use the old QNLI data.
Then run the prepro script with flags: > sh experiments/glue/prepro.sh --old_glue
If you have issues to access the old version of the data, please contact the GLUE team.

Did you fine-tune single task for your GLUE leaderboard submission?

We can use the multi-task refinement model to run the prediction and produce a reasonable result. But to achieve a better result, it requires a fine-tuneing on each task. It is worthing noting the paper in arxiv is a littled out-dated and on the old GLUE dataset. We will update the paper as we mentioned below.

Notes and Acknowledgments

BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT: https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc

Related Projects/Codebase

Pretrained UniLM: https://github.com/microsoft/unilm
Pretrained Response Generation Model: https://github.com/microsoft/DialoGPT
Internal MT-DNN repo: https://github.com/microsoft/mt-dnn

How do I cite MT-DNN?

@inproceedings{liu2019mt-dnn,
    title = "Multi-Task Deep Neural Networks for Natural Language Understanding",
    author = "Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1441",
    pages = "4487--4496"
}


@article{liu2019mt-dnn-kd,
  title={Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding},
  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1904.09482},
  year={2019}
}


@article{he2019hnn,
  title={A Hybrid Neural Network Model for Commonsense Reasoning},
  author={He, Pengcheng and Liu, Xiaodong and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1907.11983},
  year={2019}
}


@article{liu2019radam,
  title={On the Variance of the Adaptive Learning Rate and Beyond},
  author={Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
  journal={arXiv preprint arXiv:1908.03265},
  year={2019}
}


@article{jiang2019smart,
  title={SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization},
  author={Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo},
  journal={arXiv preprint arXiv:1911.03437},
  year={2019}
}


@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}


@article{liu2020alum,
  title={Adversarial Training for Large Neural Language Models},
  author={Liu, Xiaodong and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Yu and Poon, Hoifung and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2004.08994},
  year={2020}
}

@article{cheng2020posterior,
  title={Posterior Differential Regularization with f-divergence for Improving Model Robustness},
  author={Cheng, Hao and Liu, Xiaodong and Pereira, Lis and Yu, Yaoliang and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2010.12638},
  year={2020}
}

Contact Information

For help or issues using MT-DNN, please submit a GitHub issue.

For personal communication related to this package, please contact Xiaodong Liu ([email protected]), Yu Wang ([email protected]), Pengcheng He ([email protected]), Weizhu Chen ([email protected]), Jianshu Ji ([email protected]), Hao Cheng ([email protected]) or Jianfeng Gao ([email protected]).

Comments

No such file or directory: 'ner/ner_train.json'

ls ner/ chunk_dev.tsv chunk_test.tsv chunk_train.tsv ner_dev.tsv ner_test.tsv ner_train.tsv pos_dev.tsv pos_test.tsv pos_train.tsv test.txt train.txt valid.txt

python3 train.py --data_dir ner/ --init_checkpoint ./mt_dnn_models/bert_model_large_uncased.pt --train_dataset ner --test_dataset ner --task_def experiments/ner/ner_task_def.yml Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex. Namespace(adam_eps=1e-06, answer_att_hidden_size=128, answer_att_type='bilinear', answer_dropout_p=0.1, answer_mem_drop_p=0.1, answer_mem_type=1, answer_merge_opt=1, answer_num_turn=5, answer_opt=0, answer_rnn_type='gru', answer_sum_att_type='bilinear', answer_weight_norm_on=False, batch_size=8, batch_size_eval=8, bert_dropout_p=0.1, bert_l2norm=0.0, cuda=True, data_dir='ner/', data_sort_on=False, dropout_p=0.1, dropout_w=0.0, dump_state_on=False, ema_gamma=0.995, ema_opt=0, embedding_opt=0, encoder_type=<EncoderModelType.BERT: 1>, epochs=5, fp16=False, fp16_opt_level='O1', freeze_layers=-1, global_grad_clipping=1.0, glue_format_on=False, grad_accumulation_step=1, grad_clipping=0, have_lr_scheduler=True, init_checkpoint='./mt_dnn_models/bert_model_large_uncased.pt', init_ratio=1, label_size='3', learning_rate=5e-05, log_file='mt-dnn-train.log', log_per_updates=500, lr_gamma=0.5, max_seq_len=512, mem_cum_type='simple', mix_opt=0, model_ckpt='checkpoints/model_0.pt', momentum=0, mtl_opt=0, multi_gpu_on=False, multi_step_lr='10,20,30', name='farmer', optimizer='adamax', output_dir='checkpoint', ratio=0, resume=False, save_per_updates=10000, save_per_updates_on=False, scheduler_type='ms', seed=2018, task_def='experiments/ner/ner_task_def.yml', tensorboard=False, tensorboard_logdir='tensorboard_logdir', test_datasets=['ner'], train_datasets=['ner'], update_bert_opt=0, vb_dropout=True, warmup=0.1, warmup_schedule='warmup_linear', weight_decay=0) 10/19/2019 12:30:52 0 10/19/2019 12:30:52 Launching the MT-DNN training 10/19/2019 12:30:52 Loading ner/ner_train.json as task 0 Traceback (most recent call last): File "train.py", line 434, in main() File "train.py", line 207, in main train_data = BatchGen(BatchGen.load(train_path, True, task_type=task_type, maxlen=args.max_seq_len), File "/home/ant/multitask/mt-dnn/mt_dnn/batcher.py", line 53, in load with open(path, 'r', encoding='utf-8') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'ner/ner_train.json'

opened by antgr 7
Why fine-tuning in single-task setting (not as stated in the paper)?

In the arxiv paper it is stated:

In the multi-task fine-tuning stage, we use minibatch based stochastic gradient descent (SGD) to learn the parameters of our model (i.e., the parameters of all shared layers and task-specific layers) as shown in Algorithm 1. In each epoch, a mini-batch b_t is selected(e.g., among all 9 GLUE tasks), and the model is updated according to the task-specific objective for the task t. This approximately optimizes the sum of all multi-task objectives.

If I understand it correctly, in your code this multi-task fine-tuning stage is called MTL refinement. Then why do you fine-tune for each task in single task setting in your fine-tuning stage? There is no such stage in the original paper. Also, in run_mt_dnn.sh there are lines: train_datasets="mnli,rte,qqp,qnli,mrpc,sst,cola,stsb" test_datasets="mnli_matched,mnli_mismatched,rte" Why do you only test on mnli and rte and not test on all other tasks? I would also like to ask if I can switch from BERT large to BERT base there because i only have one 1080 GTX card.

Thank you.
good first issue

opened by svboeing 7
add more detail about how to train a roberta+alum?

Through the paper I know that is first to pretrain a roberta(it is already done by fb), second to continue train with alum, but I wonder how to merge to code below the alum into fairseq to run the trainning code?

opened by RyanHuangNLP 6
Could you release MT-DNN no-fine-tune checkpoint with task layers?

Hi,

Is it possible to release MT-DNN no-fine-tune checkpoint?

Currently released mt-dnn-large.pt only contains shared layers, lacks task-specific layers.

Thanks!

opened by yaolu 6

Hotfix load_qnnli in glue_utils.py

I receive two asset error on prepro.py. Specifically, it is in glue_utils.py called from prepro.py. The errors are shown below. And I fixed the errors.

Traceback (most recent call last):
  File "prepro.py", line 352, in <module>
    main(args)
  File "prepro.py", line 193, in main
    qnnli_train_data = load_qnnli(qnli_train_path, GLOBAL_MAP['qnli'])
  File "./mt-dnn/data_utils/glue_utils.py", line 113, in load_qnnli
    assert len(lines) % 2 == 0
AssertionError

Traceback (most recent call last):
  File "prepro.py", line 352, in <module>
    main(args)
  File "prepro.py", line 195, in main
    qnnli_test_data = load_qnnli(qnli_test_path, GLOBAL_MAP['qnli'], is_train=False)
  File "./mt-dnn/data_utils/glue_utils.py", line 122, in load_qnnli
    assert block1[1] == block2[1]
AssertionError

opened by tommy19970714 6

test: {"metrics": {}, "predictions": <== metrics is empty

In checkpoints folder, I see that for the test set, there is no result for metrics field. Compare this to dev set, which is ok e.g {"metrics": {"ACC": 76.69902912621359}

opened by antgr 5

modify load data method to fit billion level data, but memory leak, help!!

We have a billion pieces of data， so, source code put all data to memory is not work. i use dataloader + yield , every batch load data to memory, but gpu memory continue increase. i just test load data, not begin train. I try all clear memory methods（torch.cuda.empty_cache() ... ）, but not work. This is my read data code

class CustomIterableDataset(IterableDataset):
    def __init__(self, task_def, task_id, batch_size=32,
                 gpu=True, is_train=True, epochs=10,
                 maxlen=128, dropout_w=0.005):
        super(CustomIterableDataset).__init__()
        ...省略n行

    def _get_max_len(self, batch, key='token_id'):
        tok_len = max(len(x[key]) for x in batch)
        return tok_len

    def __if_pair__(self, data_type):
        return data_type in [DataFormat.PremiseAndOneHypothesis, DataFormat.PremiseAndMultiHypothesis]

    def __random_select__(self, arr):
        if self.dropout_w > 0:
            return [UNK_ID if random.uniform(0, 1) < self.dropout_w else e for e in arr]
        else: return arr

    def patch(self, v):
        v = v.cuda(non_blocking=True)
        return v

    def _get_batch_size(self, batch):
        return len(batch)

    def _prepare_model_input(self, batch_def):
         #ignore
        ...
        return batch_data, batch_info

    def _process(self, batch_def):
        #省略n行....
        if self.gpu:
            for i, item in enumerate(batch_data):
                batch_data[i] = self.patch(item)

         #省略n行....
        return batch_info, batch_data

    def _line_mapper(self, lines):
        samples = []
        for line in lines:
            sample = json.loads(line.strip())
            sample['factor'] = 1.0
            samples.append(sample)
        batch_def = {"data": samples,
                     "task_type": self.task_def["task_type"],
                     "task_id": self.task_id,
                     "data_type": self.task_def["data_type"],
                     "encoder_type": self.task_def["encoder_type"],
        }
        return self._process(batch_def)

    def _dir_iter(self, file_list):
        if len(file_list) == 0:
            return None
        #file_list = random.shuffle(file_list)
        for f in file_list:
            with open(f) as reader:
                lines = []
                for line in reader:
                    if len(lines) >= self.batch_size:
                        yield lines
                        lines = []
                        torch.cuda.empty_cache()
                    lines.append(line)
                yield lines

    def __iter__(self):
        if self.is_train:
            dataset_dir = self.task_def['train_dataset_dir']
        else:
            dataset_dir = task_def['test_dataset_dir']
        file_list = os.listdir(dataset_dir)
        for i, data in enumerate(file_list):
            data = os.path.join(dataset_dir, data)
            file_list[i] = data

        line_iter = self._dir_iter(file_list)
        # Create an iterator
        mapped_itr = map(self._line_mapper, line_iter)

        return mapped_itr

this is run independently code pytorch 1.2

import os
import sys
import json
import torch
import random
import resource
from torch.utils.data import IterableDataset
from torch.utils.data import DataLoader
from itertools import tee

UNK_ID=100
BOS_ID=101

class CustomIterableDataset(IterableDataset):
    def __init__(self, task_def, task_id, batch_size=32,
                 gpu=True, is_train=True, epochs=10,
                 maxlen=128, dropout_w=0.005):
        super(CustomIterableDataset).__init__()
        self.task_def = task_def
        self.task_id = task_id
        self.batch_size = batch_size
        self.maxlen = maxlen
        self.is_train = is_train
        self.epochs = 1 if not is_train else epochs
        self.gpu = gpu
        self.dropout_w = dropout_w
        self.pairwise_size = 1


    def patch(self, v):
        v = v.cuda(non_blocking=True)
        return v

    def _dir_iter(self):
        for i in range(0,3):
            lines = []
            reader = [i for i in range(0,1000000)]
            for line in reader:
                line = torch.LongTensor(self.batch_size, 10).fill_(0)
                if len(lines) >= self.batch_size:
                    yield lines
                    lines = []
                    torch.cuda.empty_cache ()
                lines.append(self.patch(line))
            del reader
            yield lines

    def __iter__(self):
        mapped_itr =self._dir_iter()
        return mapped_itr


if __name__ == '__main__':
    dataset_iter = CustomIterableDataset(task_def=None, task_id=None)
    data_generator = DataLoader(dataset_iter, batch_size=None)
    train_generator_list = []
    train_generator_list.append(iter(data_generator))
    copy_iter_list = []
    for id, first_it in enumerate(train_generator_list):
        first_itr, second_itr = tee(first_it)
        train_generator_list[id] = first_itr
        copy_iter_list.append(second_itr)
    i= 0
    while True:
        try:
            i += 1
            batch_data = next(train_generator_list[0])
            if i % 100 == 0:
                max_mem_used = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
                print("Mem: {:.2f} MB".format(max_mem_used / 1024))
        except StopIteration:
            # end of one epoch
            print('again')
            _, train_generator_list[0] = tee(copy_iter_list[0])

opened by xiangxianzhang 5

prepro_std.py does not support sequence tasks

running prepro_std.py for sequence task, gives an error: (also the spelling is incorrect: Seqence --> Sequence)

python prepro_std.py --model bert-base-uncased --root_dir ner --task_def experiments/ner/ner_task_def.yml 
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
11/04/2019 09:31:52 Task ner
data_format:  DataFormat.Seqence
Traceback (most recent call last):
  File "prepro_std.py", line 551, in <module>
    main(args)
  File "prepro_std.py", line 536, in main
    label_mapper)
  File "prepro_std.py", line 433, in load_data
    raise ValueError(data_format)
ValueError: DataFormat.Seqence

The missing implementation I think is in load data, in final else (implementation for data_format Seqence is missing):

    if task_type == TaskType.Ranking:
        assert data_format == DataFormat.PremiseAndMultiHypothesis

    rows = []
    for line in open(file_path, encoding="utf-8"):
        fields = line.strip("\n").split("\t")
        if data_format == DataFormat.PremiseOnly:
            assert len(fields) == 3
            row = {"uid": fields[0], "label": fields[1], "premise": fields[2]}
        elif data_format == DataFormat.PremiseAndOneHypothesis:
            assert len(fields) == 4
            row = {
                "uid": fields[0],
                "label": fields[1],
                "premise": fields[2],
                "hypothesis": fields[3]}
        elif data_format == DataFormat.PremiseAndMultiHypothesis:
            assert len(fields) > 5
            row = {"uid": fields[0], "ruid": fields[1].split(","), "label": fields[2], "premise": fields[3],
                   "hypothesis": fields[4:]}
        else:
            print ("data_format: ", data_format)
            raise ValueError(data_format) <---- here is the exception

opened by antgr 5

Issue in preprocessing data

I downloaded the data using the download script and ran the prepro.sh script as given in the readme and it runs into the following error:

sh experiments/glue/prepro.sh
08/28/2019 11:18:45 Loaded 23596 SciTail train samples
08/28/2019 11:18:45 Loaded 1304 SciTail dev samples
08/28/2019 11:18:45 Loaded 2126 SciTail test samples
08/28/2019 11:18:47 Loaded 549367 SNLI train samples
08/28/2019 11:18:47 Loaded 9842 SNLI dev samples
08/28/2019 11:18:47 Loaded 9824 SNLI test samples
08/28/2019 11:18:48 Loaded 392702 MNLI train samples
08/28/2019 11:18:48 Loaded 9815 MNLI matched dev samples
08/28/2019 11:18:48 Loaded 9832 MNLI mismatched dev samples
08/28/2019 11:18:48 Loaded 9796 MNLI matched test samples
08/28/2019 11:18:48 Loaded 9847 MNLI mismatched test samples
08/28/2019 11:18:48 Loaded 3668 MRPC train samples
08/28/2019 11:18:48 Loaded 408 MRPC dev samples
08/28/2019 11:18:48 Loaded 1725 MRPC test samples
08/28/2019 11:18:49 Loaded 104743 QNLI train samples
08/28/2019 11:18:49 Loaded 5463 QNLI dev samples
08/28/2019 11:18:49 Loaded 5463 QNLI test samples
08/28/2019 11:18:50 Loaded 363846 QQP train samples
08/28/2019 11:18:50 Loaded 40430 QQP dev samples
08/28/2019 11:18:50 Loaded 390965 QQP test samples
08/28/2019 11:18:50 Loaded 2490 RTE train samples
08/28/2019 11:18:50 Loaded 277 RTE dev samples
08/28/2019 11:18:50 Loaded 3000 RTE test samples
08/28/2019 11:18:50 Loaded 635 WNLI train samples
08/28/2019 11:18:50 Loaded 71 WNLI dev samples
08/28/2019 11:18:50 Loaded 146 WNLI test samples
08/28/2019 11:18:50 Loaded 67349 SST train samples
08/28/2019 11:18:50 Loaded 872 SST dev samples
08/28/2019 11:18:50 Loaded 1821 SST test samples
08/28/2019 11:18:50 Loaded 8551 COLA train samples
08/28/2019 11:18:50 Loaded 1043 COLA dev samples
08/28/2019 11:18:50 Loaded 1063 COLA test samples
08/28/2019 11:18:50 Loaded 5749 STS-B train samples
08/28/2019 11:18:50 Loaded 1500 STS-B dev samples
08/28/2019 11:18:50 Loaded 1379 STS-B test samples
08/28/2019 11:18:50 done with scitail
08/28/2019 11:18:52 done with snli
08/28/2019 11:18:54 done with mnli
08/28/2019 11:18:54 done with mrpc
08/28/2019 11:18:54 done with qnli
08/28/2019 11:18:57 done with qqp
08/28/2019 11:18:57 done with rte
08/28/2019 11:18:57 done with wnli
08/28/2019 11:18:57 done with sst
08/28/2019 11:18:58 done with cola
08/28/2019 11:18:58 done with stsb
Traceback (most recent call last):
  File "prepro_std.py", line 7, in <module>
    from pytorch_pretrained_bert.tokenization import BertTokenizer
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/pytorch_pretrained_bert/__init__.py", line 7, in <module>
    from .modeling import (BertConfig, BertModel, BertForPreTraining,
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 218, in <module>
    from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
    from apex.interfaces import (ApexImplementation,
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
    class ApexImplementation(object):
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    implements(IApex)
  File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/zope/interface/declarations.py", line 483, in implements
    raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.

opened by zeeshansayyed 4

RuntimeError: CUDA error: device-side assert triggered
In training model, when I use python train.py --init_checkpoint='mt_dnn_models/bert_model_base_chinese.pt', Some error occur: RuntimeError: CUDA error: device-side assert triggered

However, when I replace this hyper-parameter with, for example, bert_model_base_uncased.pt; bert_model_base_uncased.pt, it can train normally.

But, it is obviously not what I want, I train it for a simple four-class classification task in chinese dataset, and want to use bert_model_base_chinese.py you provided.

Note that:

all model mentioned above are download by download.sh

I have check out this issue previously in pytorch repo, also it could not give me any helps.
opened by mazicwong 4
How to change the code for a specific task?

I am trying to run task 1. Now science I removed the data and json file related to other tasks, I am getting an error for missing file. But I don't want to run the whole task as I am very specific just for one task. Please help me.

opened by Rajratnpranesh 4
Older version of Pytorch unavailable

I am quite new to deep learning and coding in general but am keen to make use of this library to explore a possible dissertation topic on MT. I have cloned the repository onto my hard drive but when I try to install with pip I get an error to the effect that pip cannot find version 1.5.0 of pytorch - the earliest available is 1.7.0. Does it need to be 1.5.0 or can I tweak requirements.txt to choose a different version and if so, which version should be used? Many thanks for your support. Rob

opened by robfuller7 0
Can you provide the pretrain files of Hugging Face?

It seems that it is not convenient to put mtdnn in other work, and it can be more convenient to use mtdnn with the hugging face. But I searched the official website and found that no one published relevant content. Can you provide the corresponding file format? thank you!

opened by chaser-girl 0
Readme.md is updated?

First observation: the Python version on the instructions is 3.6, but the transformer version in the requirements.txt is 4.20.0, which is only available for python >= 3.7. The second is: I cannot find the scripts directory. I'm looking for the embedding extractor. So, I'm assuming that is the /experiments/dump_embedding directory. It's correct?

And finally, nice job! You have very interesting results.

opened by LarissaGuder 0
mt-dnn on Windows?

Hello, I am trying to download and test the repository, but I keep getting an error with docker nvidia runtime. So I am just guessing: is it possible to run the project on Windows? Thank you in advance! Simone

opened by SimoneGherardi 0
question about task split and pretrain model

Hello! Your work is very effective. I wonder if tasks must be separated from the benchmark model before they can be used as pre training language models for other tasks? Does the provided. PT file split the task?

opened by chaser-girl 0

Project dependencies may have API risk issues

Hi, In mt-dnn, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

numpy
torch==1.5.0
tqdm
colorlog
boto3
pytorch-pretrained-bert==v0.6.0
regex
scikit-learn
pyyaml
pytest
sentencepiece
tensorboardX
tensorboard
future
apex
seqeval==0.0.12
transformers==4.6.0

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project, The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3. The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0. The version constraint of dependency pytorch-pretrained-bert can be changed to >=0.1.1,<=0.3.0. The version constraint of dependency scikit-learn can be changed to >=0.15.0,<=0.20.4. The version constraint of dependency future can be changed to >=0.12.0,<=0.18.2. The version constraint of dependency transformers can be changed to >=2.0.0,<=4.1.1.

The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the numpy

numpy.ma.masked_array

The calling methods from the tqdm

tqdm.tqdm

The calling methods from the pytorch-pretrained-bert

pytorch_pretrained_bert.modeling.BertLayerNorm

The calling methods from the scikit-learn

sklearn.metrics.f1_score
random.choice
sklearn.metrics.accuracy_score
sklearn.metrics.matthews_corrcoef
sklearn.metrics.roc_auc_score
sklearn.metrics.confusion_matrix

The calling methods from the future

time.strftime
datetime.datetime.now

The calling methods from the transformers

transformers.AutoTokenizer.from_pretrained

The calling methods from the all methods

plt.clf
_make_qid_to_has_ans
grad.abs.add_
sorted.append
torch.nn.Tanh
score.reshape.tolist.numpy
os.path.isdir
torch.erf
logging.StreamHandler.setFormatter
self._task_def_dic.items
layer_module
module.register_forward_pre_hook
subprocess.call.strip
iter
make_qid_to_has_ans.items
self._prepare_model_input
rnn_cls
m.contiguous
src.items
set
data_utils.log_wrapper.create_logger.warning
numpy.ma.masked_array
extra_indices.tolist.tolist
mask.size
output_dir.os.path.join.open.read
self.shadow.items
find_best_thresh
self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose
find_all_best_thresh
min
truncate_seq_pair
data.get.str.lower
score.numpy.numpy
set_config
self.dropout.transpose
transformers.get_polynomial_decay_schedule_with_warmup
key_padding_mask.unsqueeze.unsqueeze
input_lengths.append
self.tok2ind.keys
final_predict.append
passage.replace.replace
prefix.format.opt.get.upper
logging.StreamHandler
f.readlines
logging.StreamHandler.setLevel
_get_raw_scores
torch.nn.functional.cross_entropy
line.strip.split
tqdm.tqdm
prepare_validation_features
AttentionWrapper
range
format.startswith
self._tokenizer.vocab.keys
tokens_b.extend
scores.data.masked_fill_
qb.new_dict.cpu.numpy
rnn_type.upper.upper
delattr
score.numpy.tolist
data_utils.task_def.EncoderModelType
mlm_p.view.size
shutil.rmtree
load_stsb
self.AdamaxW.super.__init__
self.kd_task_loss_criterion.append
self.dropout.bmm
partial_feature
data_utils.log_wrapper.create_logger.error
x.size
self.SmartPerturbation.super.__init__
torch.tanh
self.eff_perturb.update
copy.deepcopy.item
rows.append
key_padding_mask.unsqueeze
score.reshape.tolist
self.linear.contiguous
test_prefix.lower
self.score_func
load_qqp
self.SimpleFlatSim.super.__init__
self.BilinearSum.super.__init__
make_qid_to_has_ans.values
json.dumps
extract_answers_from_features
x_flat.self.linear.view
self.rnn.new
os.path.exists
TaskDef
text.lower
round
self.Pooler.super.__init__
self.decoder
self.dropout
copy.deepcopy.extend
self.DropoutWrapper.super.__init__
self.y_linear
logging.FileHandler
inputs.append
data_utils.vocab.Vocabulary
SanPooler
numpy.random.seed
x1.contiguous.view
args.train_datasets.split
self._global_map.get
i.multi_task_train_data.len.i.start.datetime.now.str.split
search_index
torch.float32.target.detach.F.softmax.sqrt_
kb.new_dict.cpu.numpy
mt_dnn.batcher.Collater
torch.device
open
key.transpose.size
masked_lm_positions.append
collections.OrderedDict
torch.max
torch.sum
text.split
random.randint
self.dropout.contiguous
torch.unbind
x1_flat.self.proj_1.self.f.view.bmm
self.qkv_head_dim.self.num_heads.bsz.src_len.v.contiguous.view.transpose.contiguous
experiments.superglue.superglue_utils.save
all_encoder_layers.append
y.unsqueeze.expand_as.unsqueeze
self.optimizer.zero_grad
data_utils.tokenizer_utils.create_tokenizer
self.encode
qa.get
mt_dnn.batcher.DistTaskDataset
batch_meta.batch_data.detach.cpu.numpy.tolist
self.classifier
sklearn.metrics.roc_auc_score
self.SANBertNetwork.super.__init__
grad.abs.max
self.optimizer.load_state_dict
self.qkv_head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose
self.init
self.setup
self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.transpose
self.alpha.data.new
torch.autograd.grad
common.activation
kb.new_dict.cpu
load_json
super
labels.append
tasks.get_task_obj.train_forward
super.__init__
beta2.exp_inf.mul_.unsqueeze
logits.size.logits.data.new.zero_.torch.diag.byte
train_datasets.append
torch.nn.functional.kl_div
self.f
self.train_loss.update
seqeval.metrics.classification_report
FlatSimV2
embed.detach.abs.mean
subprocess.check_output
scipy.stats.spearmanr
x.torch.std.expand_as
numpy.argmax
make_eval_dict
mt_dnn.batcher.DistMultiTaskBatchSampler
score.reshape.tolist.reshape
mt_dnn.model.MTDNNModel.update
k.contiguous.view
token.startswith
p.transpose
module.my_optim.weight_norm
noise.detach.requires_grad_
self.DotProduct.super.__init__
mt_dnn.batcher.Collater.patch_data
load_mrpc
SimpleFlatSim
eff_noise.detach
plt.hist
adv_logits.view.view
load_copa_mtdnn
run_precision_recall_analysis
trim
torch.zeros_like
batch_info.pin_memory.to
self.DeepAttentionWrapper.super.__init__
target.contiguous.view.view
int
self.parameters
self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.contiguous
torch.manual_seed
merge_eval
x2_flat.self.proj_2.self.f.view.transpose
scores.items
shutil.copytree
plot_pr_curve
self.__str__
grad.sign
json.loads.get
time.strftime
BilinearFlatSim
compute_f1
reduce_features_to_examples.append
closure
end.numpy.tolist.contiguous
data_utils.load_data
self.bert
create_bins
self.SANClassifier.super.__init__
get_tokens
logging.FileHandler.setLevel
self._tokenizer.convert_tokens_to_ids.get
y.x.torch.abs.y.x.y.x.torch.cat.contiguous
x.size.idx.mask.view.expand_as
self.rnn.size
embedding_weights.size
torch.nn.functional.mse_loss
uid.split
torch.LongTensor
sample.get.strip
os.getcwd
self._rnn
att_scores.F.softmax.unsqueeze
temp_answers.append
test_prepro_std
self.MLPSelfAttn.super.__init__
self.FlatSimV2.super.__init__
torch.nn.ModuleList
main
load_data
os.mkdir
self.compute_weight
self._setup_lossmap
setattr
x1.size.x1.size.x1.contiguous.view.self.x_dot_linear.view.expand_as.bmm
plt.savefig
epsilon.epsilon.y.detach
module.similarity.SelfAttnWrapper
torch.distributed.all_reduce
ground_truth.normalize_answer.split
batch_meta.batch_data.size
self.mnetwork.bert
self.query_wsum
prediction.normalize_answer.split
dim.p.transpose._norm.transpose
self._norm_grad
torch.nn.Linear
start_scores.squeeze.squeeze
adv_lc
model_config
remove_punc
self.config.update
torch.isnan
batch_meta.batch_data.detach
para.strip
tensorboardX.SummaryWriter.close
data_utils.roberta_utils.update_roberta_keys
sub_part.pin_memory.to
start.data.cpu
self.x_dot_linear
NotImplementedError
self.attn
set.add
sequence_outputs.append
slen.idx.yidx.embeddings.tolist
self.load
model.to.to
exp_avg.mul_
self.task_types.append
self.encoder
numpy.array
task_id.self.scoring_list.split
self.qkv_head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose.contiguous
task_id.self.scoring_list
y.x.torch.abs.y.x.y.x.torch.cat.contiguous.view
y.view.view
module.common.activation
self.score_func.size
experiments.common_utils.dump_rows
self.emb_val.update
value.transpose.transpose
list.extend
tasks.get_task_obj.train_build_task_layer
start.append
vars.update
transformers.get_cosine_schedule_with_warmup
train_data_list.append
transformers.get_linear_schedule_with_warmup
predict.reshape.tolist
dataset.get_task_id
self._tokenizer.convert_tokens_to_ids
end.append
self.WeightNorm.super.__init__
self.linear
idx.lines.strip
torch.mean
argparse.ArgumentParser.print_help
mt_dnn.optim.AdamaxW
score.np.argmax.reshape
qb.new_dict.cpu
self._dataset.get_task_id
plt.xlim
line.strip
self.network.eval
dump_data
train_config.parse_args
vw.new_dict.cpu
end.numpy.tolist.numpy
load
alpha.unsqueeze.bmm
collections.defaultdict
self.num_hid.size.size.tmp_output.view.max
dropout_layer
question.strip
self._get_param_groups
bool
newbatch.append
s.normalize_answer.split
out_f.write
self.qkv_head_dim.self.num_heads.bsz.src_len.v.contiguous.view.transpose
input_ids.index
normalize_answer
any
key_padding_mask.size
torch.nn.utils.weight_norm.unsqueeze
machines.append
tasks.get_task_obj.input_parse_label
stable_kl
pred.pop
adv_loss.item
tensor.to
delta_grad.norm
torch.nn.utils.weight_norm
issubclass
self.FlatSimilarityWrapper.super.__init__
e.to
start.numpy.tolist.numpy
score.data.cpu
max
x.att_scores.F.softmax.unsqueeze.torch.bmm.squeeze
tasks.get_task_obj
torch.nn.parallel.DistributedDataParallel
self.config.get.backward
self.Bilinear.super.__init__
task_id.self.task_loss_criterion
logging.FileHandler.setFormatter
float
experiments.exp_def.TaskDefs.get_task_names
load_multirc_mtdnn
len.get
result.strip.split
model.to.parameters
_norm
self.dropout.size
torch.norm
hasattr
torch.diag
temp_1.pop
self.LinearSelfAttn.super.__init__
data_utils.mrc_eval.squadv2_evaluate_func
TypeError
self.scalar.expand_as
initialize_distributed
task_cls
LinearSelfAttn
sorted
start.numpy.tolist.contiguous
torch.utils.data.DataLoader
model_class.from_pretrained
state.items
enumerate
sys.exit
self.bert.generate
self._setup_kd_lossmap
self._gen_task_indices
plt.title
data_utils.mrc_eval.squadv1_evaluate_func
weight.cpu
join
os.path.abspath
numpy.max
tasks.get_task_obj.test_predict
target.size.target.view.float
pdb.set_trace
key.transpose.transpose
torch.cat
random.uniform
literal_model_type.lower.lower
exp_inf.new
get_raw_scores
self.adv_task_loss_criterion.append
epsilon.epsilon.p.detach.log
emb_val.item
features.append
x.torch.mean.expand_as
new_data.zero_.size
qw.new_dict.cpu
load_mlm_data
tok_len.batch_size.torch.BoolTensor.fill_
self.proj_1
torch.abs
json.loads.split
dump
q.contiguous.view
eff_noise.detach.abs
metric_fn
self.network.cuda
batch_meta.get
search_bin
bais.cpu.numpy
type_ids_list.append
make_precision_recall_eval
create_masked_lm_predictions
attn_weights.data.masked_fill_
logging.Formatter
y.x.torch.cat.contiguous.view
self.proj
ValueError
logits.size.logits.data.new.zero_.torch.diag.byte.unsqueeze.expand_as
i.tgt.extend
query.replace
time.gmtime
load_mnli
histogram_na_prob
collections.Counter
logit.size.logit.view.float.size
generate_mask
print_message
local_task_idx.self._datasets.get_task_id
torch.load.size
part.pin_memory
positions.append
self.activation
torch.pow
assert_file_equal
tasks.get_task_obj.train_prepare_label
attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte
torch.isinf
x1.size.x1.contiguous.view.self.x_dot_linear.view
numpy.array_equal
torch.nn.utils.clip_grad_norm_
grounds.items
Vocabulary
attn_weights.float.masked_fill
self.embed_encode
type
cls
mt_dnn.model.MTDNNModel.load
mt_dnn.model.MTDNNModel.save
self.__if_pair__
task.split
format
module._parameters.keys
merge
logits.detach
logit.F.log_softmax.exp
model.detach
subprocess.call
dropout_wrapper.DropoutWrapper
eval
config_class.from_dict
isinstance
scaled_loss.backward
datetime.datetime.now
self.dropout.view
self._get_max_len
kd_lc
args.test_datasets.split
dataset.items
input_ids_list.append
logits.size.logits.data.new.zero_.torch.diag.byte.unsqueeze
tokenizer.vocab.keys
load_record_mtdnn
evaluate
logging.getLogger.addHandler
beta1.exp_avg.mul_.add_
load_sst
batch_info.pin_memory
locals
qw.new_dict.cpu.numpy
new_batch.append
state.keys
batch_data.append
expected_file.open.read
end_position.append
laod_function
FlatSim
attn_weights.data.new
torch.nn.functional.log_softmax
i.self.attn_list
self.embeddings
process_data
plt.xlabel
load_xnli
mt_dnn.model.MTDNNModel.cuda
mt_dnn.batcher.MultiTaskBatchSampler
scores.extend
map
score.contiguous.view
self.rnn
self.proj_2
vars
data_utils.load_score_file.keys
temp_1.append
self._get_index_batches
mt_dnn.model.MTDNNModel
y.x.y.x.torch.cat.contiguous.view
data_utils.metrics.calc_metrics
MultiheadAttentionWrapper
feature.example_id_to_index.features_per_example.append
para.clone
self.beta.expand_as
opt.EncoderModelType.name.lower
scores_for_ground_truths.append
sentence.append
self.pooler
self.network.train
numpy.zeros
label.append
scores_list.append
os.getenv
copy.deepcopy
tokens.append
p.norm
numpy.exp
loss.stable_kl
self._dropout_p_map.get
self.AttentionWrapper.super.__init__
json.loads
new_sequence_outputs.append
parse_args
re.compile
transformers.AutoTokenizer.from_pretrained.tokenize
prefix.format.opt.get.lower
x_flat.self.FC.self.f.self.linear.view
plt.ylabel
docs.append
torch.float32.input.detach.F.softmax.sqrt_
ry.rp.p.sum
data_config
list
v.contiguous.view
pytorch_pretrained_bert.modeling.BertEmbeddings
idx.mask.view
apply_no_ans_threshold
self.adv_teacher.forward
module.dropout_wrapper.DropoutWrapper
v.cpu
config.get
dataset.split
idx.all_encoder_layers.detach
train_config.add_argument
self.FC
sample_id_2_label_dic.keys
weight.reshape
proj.unsqueeze.x.bmm.squeeze
self.SelfAttnWrapper.super.__init__
rng.randint
data_utils.load_score_file
numpy.exp.sum
self.dropout_list.append
data_utils.log_wrapper.create_logger
torch.cuda.set_device
dev_data_list.append
query.transpose.size
mt_dnn.inference.eval_model
target.contiguous.view
h0.size.h0.new.zero_
locals.items
TASK_REGISTRY.get
plt.step
n_best_size.start_logits.np.argsort.tolist
Vocabulary.add
attn.transpose.transpose
tokens.extend
temp_2.pop
experiments.ner.ner_utils.load_conll_pos
self.Trilinear.super.__init__
logits.data.new
numpy.ones_like
p.nelement
self.network.named_parameters
pool.map
numpy.power
self.network.parameters
rvl.append
random.sample
self.attn_list.append
compare_output
self.FlatSim.super.__init__
self.network.encode
torch.bernoulli
assert_dir_equal
torch.save
json.dump
self.SimilarityWrapper.super.__init__
y_pred.append
pooled_output.contiguous.view
torch.distributed.init_process_group
task_def.get
args.task.lower
embed.data.new
vw.new_dict.cpu.numpy
xWy.data.masked_fill_
self.DotProductProject.super.__init__
next
batch_meta.batch_data.detach.cpu
MaskedLmInstance
self.num_hid.size.size.tmp_output.view.max.view
Classifier
x2.x1.abs
score.np.argmax.tolist
predict.strip.reshape
self.reset
query.transpose.transpose
DotProductProject
self._setup_adv_lossmap
vb.new_dict.cpu
generate_noise
torch.cuda.device_count
experiments.mlm.mlm_utils.load_loose_json.append
batch_meta.batch_data.cuda
torch.optim.SGD
model.state_dict.items
logit.size.logit.view.float
self.x_linear
rng.shuffle
task_id.self.dropout_list.size
sum
tensorboardX.SummaryWriter
os.access
sys.path.append
encoding_0.size.input_length.torch.LongTensor.fill_
x.size.x.size.x.data.new.zero_
sigmoid
uid.gold_map.append
x2_flat.self.proj_2.self.f.view
rng.random
golds.append
weight.cpu.numpy
tokenizer
argparse.ArgumentParser
WeightNorm
p.size.p.contiguous.view.norm
task_def_list.append
self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.size
experiments.exp_def.TaskDefs
self._setup_adv_training
line.strip.startswith
self.optimizer.step
self.adv_loss.update
logit.size.logit.view.float.view
args.model.split
TrainingInstance
torch.ones
config_class.from_pretrained
flat_scores.contiguous.view
collections.namedtuple
torch.log
scipy.stats.pearsonr
logging.getLogger.info
grounds.append
x1.size.x1.size.x1.contiguous.view.self.x_dot_linear.view.expand_as
lang_map.items
tasks.get_task_obj.input_is_valid_sample
attn.transpose.contiguous
self.att
torch.optim.AdamW
mt_dnn.perturbation.SmartPerturbation
doc.split
data_utils.tokenizer_utils.create_tokenizer._convert_token_to_id
tokenizer.append
embed.size.embed.data.new.normal_
torch.load
numpy.concatenate
p.dim
torch.nn.functional.softmax
build_data
y.unsqueeze.expand_as
self.Classifier.super.__init__
SimilarityWrapper
logits.size.logits.data.new.zero_
attn.transpose.contiguous.view
attn_weights.size.attn_weights.data.new.zero_
uid.predict_map.append
key_padding_mask.unsqueeze.expand_as
self._layer_norm
self.EMA.super.__init__
start_position.append
p.contiguous.view
ent_strs.append
torch.BoolTensor
idx.lines.strip.split
multiprocessing.Pool
load_qnli
torch.cuda.is_available
sklearn.metrics.accuracy_score
x.alpha.unsqueeze.bmm.squeeze
logging.getLogger
lower
mask.size.score.np.argmax.reshape.tolist
score.numpy.reshape
group.setdefault
mt_dnn.batcher.DistSingleTaskBatchSampler
load_wnli
human.items
self.decoder_opt.append
y.x.y.x.torch.cat.contiguous
os.path.splitext
pred.items
predictions.append
random.Random
self.rebatch
apex.amp.master_params
numpy.ma.masked_array.mean
mt_dnn.model.MTDNNModel.extract
end.numpy.tolist
load_scitail
numpy.random.choice
test_encoder
tokenizer.pop
attn_weights.float.F.softmax.type_as
MLPSelfAttn
data_utils.vocab.Vocabulary.add
compute
writer.write
self.linear.unsqueeze
tok_len.batch_size.torch.LongTensor.fill_
self.mnetwork
model.to.state_dict
end.data.cpu
mt_dnn.inference.extract_encoding
WeightNorm.compute_weight
sklearn.metrics.f1_score
target.contiguous.view.float
instances.append
output_file.open.read
load_squad
target.contiguous.view.detach
experiments.glue.glue_utils.submit
attention_mask_list.append
model.encode
load_qnnli
self.nsp
self.BilinearFlatSim.super.__init__
tqdm.auto.tqdm
proj
self.AdamaxW.super.__setstate__
mlm_y.view.view
hyp_mask.size
expected_dir.os.path.join.open.read
masked_lm_labels.append
p.contiguous
epsilon.epsilon.y.detach.log
test_metrics.items
idx.all_encoder_layers.detach.cpu.numpy
sub_part.pin_memory
epsilon.epsilon.p.detach
data_utils.utils.set_environment
cand_indexes.append
self._to_cuda
len
data_utils.utils.AverageMeter
hid_shape.weight.new.zero_
apex.amp.scale_loss
torch.no_grad
torch.nn.functional.dropout
sklearn.metrics.matthews_corrcoef
model_class
make_qid_to_has_ans
remove_articles
x1.size.x1.contiguous.view.self.x_linear.view.expand_as
test_data_list.append
vb.new_dict.cpu.numpy
self.self_att
task_id.self.dropout_list.contiguous
x2.size.x2.contiguous.view.self.y_linear.view
suffix.split.upper
self.encoder_type.EncoderModelType.name.lower
self.init_hidden
logging.getLogger.setLevel
self.tokenizer.batch_decode
weight.size
os.path.split
self.scheduler.step
input.view.float
recalls.append
torch.zeros
self.MaskLmHeader.super.__init__
SanEncoder
p.data.mul_
start.numpy.tolist
current_chunk.append
random.seed
p.size.p.contiguous.view.norm.view
filecmp.cmp
sorted.insert
eval.split
load_rte
functools.partial
self._rnn.flatten_parameters
model.predict
v.cuda.cuda
self._get_shuffled_index_batches_bin
module.similarity.FlatSimilarityWrapper
target.contiguous.view.size
score.numpy.contiguous
build_data_single
positives.random.sample.pop
qa_entry.get
torch.ones_like
torch.distributed.is_initialized
kw.new_dict.cpu
random.choice
generate_decoder_opt
encoding_0.size.input_lengths.max.input_lengths.len.torch.LongTensor.fill_
DotProduct
torch.bmm
mask.unsqueeze.expand_as
format.replace
train_config
module.pooler.Pooler
p.data.addcdiv_
self._get_shuffled_index_batches
self.alpha.expand_as
attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte.unsqueeze
load_boolq
metric_max_over_ground_truths
cs.LOSS_REGISTRY
mask.sum
self._get_batch_size
getattr.cuda
self.optimizer.state_dict
argparse.ArgumentParser.parse_args
opt.get
attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte.unsqueeze.expand_as
load_cb
bin_idx.data.append
grad.abs
dict
argparse.ArgumentParser.add_argument
exp_inf.mul_
len.pop
data_utils.utils_qa.postprocess_qa_predictions
x.data.new
yaml.safe_load
torch.nn.functional.softmax.unsqueeze
WeightNorm.apply
transformers.get_constant_schedule_with_warmup
experiments.ner.ner_utils.load_conll_ner
_mg
RuntimeError
noise.detach.detach
label_tokenize
prepare_train_feature
SanLayer
common.init_wrapper
args.layers.split
temp_2.append
self.dropout.float
plt.fill_between
next.extend
pytorch_pretrained_bert.modeling.BertLayerNorm
n_best_size.end_logits.np.argsort.tolist
self.task_loss_criterion.append
float.key_padding_mask.unsqueeze.unsqueeze.attn_weights.float.masked_fill.type_as
feature_extractor
part.pin_memory.to
load_cola
input.view.detach
x1.size.x1.contiguous.view.self.x_linear.view
segment_ids.append
sklearn.metrics.confusion_matrix
tokens_a.extend
experiments.mlm.mlm_utils.load_loose_json
torch.cuda.manual_seed_all
feature_index.features.get
Wy.unsqueeze.x.bmm.squeeze
str
encoding_1.encoding_0.abs
self.score_func.view
torch.std
y.x.torch.cat.contiguous
self._task_type_map.keys
embed.detach.abs
task_id.self.dropout_list
exact_scores.values
seqeval.metrics.f1_score
apex.amp.initialize
plt.ylim
torch.nn.parameter.Parameter
blocks.lang_dict.append
self.dense
self.bert.parameters
torch.nn.Parameter
os.makedirs
entry.get
random.shuffle
BilinearSum
tasks.get_task_obj.test_prepare_label
experiments.exp_def.TaskDef.from_dict
line.strip.split.split
self.tok2ind.get
x1_flat.self.proj_1.self.f.view
f1_scores.values
sample_id_2_label_dic.items
generate_golds_predictions_scores
context.strip
re.sub
transformers.AutoTokenizer.from_pretrained.convert_tokens_to_ids
self.config.get
args.model.split.split
mask.sum.tolist
torch.nn.DataParallel
input.view.squeeze
transformers.AutoTokenizer.from_pretrained
Trilinear
register_task
x2.contiguous.view
self.network.load_state_dict
module.register_parameter
target.contiguous.view.contiguous
experiments.exp_def.TaskDefs.get_task_def
torch.FloatTensor
tensorboardX.SummaryWriter.add_scalar
sizes.append
self.rnn_type.nn.getattr
module.san.SANClassifier
metric_func
self.ind2tok.get
self._setup_optim
re.compile.findall
kw.new_dict.cpu.numpy
score_path.open.read
prelim_predictions.append
mt_dnn.matcher.SANBertNetwork
glob.glob
arch.config_class.from_pretrained.to_dict
torch.bernoulli.unsqueeze
Bilinear
self.__random_select__
torch.distributed.get_rank
self.model.named_parameters
data_utils.roberta_utils.patch_name_dict
model
input.view.view
line.strip.strip
max_answer_seq_len.batch_size.torch.LongTensor.fill_
mt_dnn.batcher.MultiTaskDataset
collections.Counter.values
getattr
math.sqrt
logits.data.masked_fill_
qa_sample
ImportError
args.transformer_cache.init_model.config_class.from_pretrained.to_dict
examples.append
x2.size.x2.contiguous.view.self.y_linear.view.expand_as
self.bert.embeddings
data_utils.log_wrapper.create_logger.info
experiments.mlm.mlm_utils.create_instances_from_document
eff_perturb.item
white_space_fix
reduce_features_to_examples
end_scores.squeeze.squeeze
compute_acc
feature_index.features.get.get
experiments.ner.ner_utils.load_conll_chunk
next.new
predict.strip.strip
exp_inf.new.long
new_data.zero_.zero_
eff_noise.detach.abs.mean
torch.stack
json.load
evaluation
tasks.get_task_obj.train_prepare_soft_labels
mt_dnn.batcher.SingleTaskDataset
idx.all_encoder_layers.detach.cpu
target.F.log_softmax.exp
task_id.self.dropout_list.view
self.config.get.item
flat_squad
load_wic_mtdnn
batch_meta.batch_data.detach.cpu.numpy
attn.transpose.size
zip
self.LayerNorm.super.__init__
task_layer
p.size
bais.cpu
tokenizer.sequence_ids
compute_exact
os.path.join
precisions.append
y_true.append
eps.grad.abs.add_.unsqueeze_
mlm_p.view.view
repr
model.cuda
tuple
x.contiguous.view
print
numpy.argsort
load_snli
model.size
self.scoring_list.append

@developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

opened by PyDeps 0

Multi-Task Deep Neural Networks for Natural Language Understanding

Related tags

Overview

Multi-Task Deep Neural Networks for Natural Language Understanding

Quickstart

Setup Environment

Install via pip:

Use docker:

Train a toy MT-DNN model

GLUE Result reproduce

SciTail & SNIL Result reproduce (Domain Adaptation)

Sequence Labeling Task

Question Answer Task

SMART

HNN

Extract embeddings

Speed up Training

Convert Tensorflow BERT model to the MT-DNN format

TODO

FAQ

Did you share the pretrained mt-dnn models?

Why SciTail/SNLI do not enable SAN?

What is the difference between V1 and V2

Did you fine-tune single task for your GLUE leaderboard submission?

Notes and Acknowledgments

Related Projects/Codebase

How do I cite MT-DNN?

Contact Information

Comments

Owner

Xiaodong

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

Code for Understanding Pooling in Graph Neural Networks

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

Multi Task Vision and Language

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks