Multi-Task Deep Neural Networks for Natural Language Understanding

Overview

License: MIT Travis-CI

New Release
We released Adversarial training for both LM pre-training/finetuning and f-divergence.

Large-scale Adversarial training for LMs: ALUM code.
If you want to use the old version, please use following cmd to clone the code:
git clone -b v0.1 https://github.com/namisan/mt-dnn.git

Multi-Task Deep Neural Networks for Natural Language Understanding

This PyTorch package implements the Multi-Task Deep Neural Networks (MT-DNN) for Natural Language Understanding, as described in:

Xiaodong Liu*, Pengcheng He*, Weizhu Chen and Jianfeng Gao
Multi-Task Deep Neural Networks for Natural Language Understanding
ACL 2019
*: Equal contribution

Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
arXiv version

Pengcheng He, Xiaodong Liu, Weizhu Chen and Jianfeng Gao
Hybrid Neural Network Model for Commonsense Reasoning
arXiv version

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Jiawei Han
On the Variance of the Adaptive Learning Rate and Beyond
arXiv version

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
arXiv version

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
arXiv version

Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon and Jianfeng Gao
Adversarial Training for Large Neural Language Models
arXiv version

Hao Cheng and Xiaodong Liu and Lis Pereira and Yaoliang Yu and Jianfeng Gao
Posterior Differential Regularization with f-divergence for Improving Model Robustness
arXiv version

Quickstart

Setup Environment

Install via pip:

  1. python3.6
    Reference to download and install : https://www.python.org/downloads/release/python-360/

  2. install requirements
    > pip install -r requirements.txt

Use docker:

  1. Pull docker
    > docker pull allenlao/pytorch-mt-dnn:v0.5

  2. Run docker
    > docker run -it --rm --runtime nvidia allenlao/pytorch-mt-dnn:v1.2 bash
    Please refer to the following link if you first use docker: https://docs.docker.com/

Train a toy MT-DNN model

  1. Download data
    > sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/

  2. Preprocess data
    > sh experiments/glue/prepro.sh

  3. Training
    > python train.py

Note that we ran experiments on 4 V100 GPUs for base MT-DNN models. You may need to reduce batch size for other GPUs.

GLUE Result reproduce

  1. MTL refinement: refine MT-DNN (shared layers), initialized with the pre-trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation.
    Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32.

    • Preprocess GLUE data via the aforementioned script
    • Training:
      >scripts\run_mt_dnn.sh
  2. Finetuning: finetune MT-DNN to each of the GLUE tasks to get task-specific models.
    Here, we provide two examples, STS-B and RTE. You can use similar scripts to finetune all the GLUE tasks.

    • Finetune on the STS-B task
      > scripts\run_stsb.sh
      You should get about 90.5/90.4 on STS-B dev in terms of Pearson/Spearman correlation.
    • Finetune on the RTE task
      > scripts\run_rte.sh
      You should get about 83.8 on RTE dev in terms of accuracy.

SciTail & SNIL Result reproduce (Domain Adaptation)

  1. Domain Adaptation on SciTail
    >scripts\scitail_domain_adaptation_bash.sh

  2. Domain Adaptation on SNLI
    >scripts\snli_domain_adaptation_bash.sh

Sequence Labeling Task

  1. Preprocess data
    a) Download NER data to data/ner including: {train/valid/test}.txt
    b) Convert NER data to the canonical format: > python experiments\ner\prepro.py --data data\ner --output_dir data\canonical_data
    c) Preprocess the canonical data to the MT-DNN format: > python prepro_std.py --root_dir data\canonical_data --task_def experiments\ner\ner_task_def.yml --model bert-base-uncased

  2. Training
    > python train.py --data_dir <data-path> --init_checkpoint <bert-base-uncased> --train_dataset squad,squad-v2 --test_dataset squad,squad-v2 --task_def experiments\squad\squad_task_def.yml

Question Answer Task

  1. Preprocess data
    a) Download SQuAD data to data/squad including: {train/valid}.txt and then change file name to: {squad_train/squad_dev}.json
    b) Convert data to the MT-DNN format: > python experiments\squad\squad_prepro.py --root_dir data\canonical_data --task_def experiments\squad\squad_task_def.yml --model bert-base-uncased

  2. Training
    > python train.py --data_dir <data-path> --init_checkpoint <bert-model> --train_dataset ner --test_dataset ner --task_def experiments\ner\ner_task_def.yml

SMART

Adv training at the fine-tuning stages: > python train.py --data_dir <data-path> --init_checkpoint <bert/mt-dnn-model> --train_dataset mnli --test_dataset mnli_matched,mnli_mismatched --task_def experiments\glue\glue_task_def.yml --adv_train --adv_opt 1

HNN

The code to reproduce HNN is under hnn folder, to reproduce the results of HNN, run

> hnn/script/hnn_train_large.sh

Extract embeddings

  1. Extracting embeddings of a pair text example
    >python extractor.py --do_lower_case --finput input_examples\pair-input.txt --foutput input_examples\pair-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt
    Note that the pair of text is split by a special token |||. You may refer input_examples\pair-output.json as example.

  2. Extracting embeddings of a single sentence example
    >python extractor.py --do_lower_case --finput input_examples\single-input.txt --foutput input_examples\single-output.json --bert_model bert-base-uncased --checkpoint mt_dnn_models\mt_dnn_base.pt

Speed up Training

  1. Gradient Accumulation
    If you have small GPUs, you may need to use the gradient accumulation to make training stable.
    For example, if you use the flag: --grad_accumulation_step 4 during the training, the actual batch size will be batch_size * 4.

  2. FP16 The current version of MT-DNN also supports FP16 training, and please install apex.
    You just need to turn on the flag during the training: --fp16
    Please refer the script: scripts\run_mt_dnn_gc_fp16.sh

Convert Tensorflow BERT model to the MT-DNN format

Here, we go through how to convert a Chinese Tensorflow BERT model into mt-dnn format.

  1. Download BERT model from the Google bert web: https://github.com/google-research/bert

  2. Run the following script for MT-DNN format
    python scripts\convert_tf_to_pt.py --tf_checkpoint_root chinese_L-12_H-768_A-12\ --pytorch_checkpoint_path chinese_L-12_H-768_A-12\bert_base_chinese.pt

TODO

  • Publish pretrained Tensorflow checkpoints.

FAQ

Did you share the pretrained mt-dnn models?

Yes, we released the pretrained shared embedings via MTL which are aligned to BERT base/large models: mt_dnn_base.pt and mt_dnn_large.pt.
To obtain the similar models:

  1. run the >sh scripts\run_mt_dnn.sh, and then pick the best checkpoint based on the average dev preformance of MNLI/RTE.
  2. strip the task-specific layers via scritps\strip_model.py.

Why SciTail/SNLI do not enable SAN?

For SciTail/SNLI tasks, the purpose is to test generalization of the learned embedding and how easy it is adapted to a new domain instead of complicated model structures for a direct comparison with BERT. Thus, we use a linear projection on the all domain adaptation settings.

What is the difference between V1 and V2

The difference is in the QNLI dataset. Please refere to the GLUE official homepage for more details. If you want to formulate QNLI as pair-wise ranking task as our paper, make sure that you use the old QNLI data.
Then run the prepro script with flags: > sh experiments/glue/prepro.sh --old_glue
If you have issues to access the old version of the data, please contact the GLUE team.

Did you fine-tune single task for your GLUE leaderboard submission?

We can use the multi-task refinement model to run the prediction and produce a reasonable result. But to achieve a better result, it requires a fine-tuneing on each task. It is worthing noting the paper in arxiv is a littled out-dated and on the old GLUE dataset. We will update the paper as we mentioned below.

Notes and Acknowledgments

BERT pytorch is from: https://github.com/huggingface/pytorch-pretrained-BERT
BERT: https://github.com/google-research/bert
We also used some code from: https://github.com/kevinduh/san_mrc

Related Projects/Codebase

  1. Pretrained UniLM: https://github.com/microsoft/unilm
  2. Pretrained Response Generation Model: https://github.com/microsoft/DialoGPT
  3. Internal MT-DNN repo: https://github.com/microsoft/mt-dnn

How do I cite MT-DNN?

@inproceedings{liu2019mt-dnn,
    title = "Multi-Task Deep Neural Networks for Natural Language Understanding",
    author = "Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1441",
    pages = "4487--4496"
}


@article{liu2019mt-dnn-kd,
  title={Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding},
  author={Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1904.09482},
  year={2019}
}


@article{he2019hnn,
  title={A Hybrid Neural Network Model for Commonsense Reasoning},
  author={He, Pengcheng and Liu, Xiaodong and Chen, Weizhu and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1907.11983},
  year={2019}
}


@article{liu2019radam,
  title={On the Variance of the Adaptive Learning Rate and Beyond},
  author={Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
  journal={arXiv preprint arXiv:1908.03265},
  year={2019}
}


@article{jiang2019smart,
  title={SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization},
  author={Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo},
  journal={arXiv preprint arXiv:1911.03437},
  year={2019}
}


@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}


@article{liu2020alum,
  title={Adversarial Training for Large Neural Language Models},
  author={Liu, Xiaodong and Cheng, Hao and He, Pengcheng and Chen, Weizhu and Wang, Yu and Poon, Hoifung and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2004.08994},
  year={2020}
}

@article{cheng2020posterior,
  title={Posterior Differential Regularization with f-divergence for Improving Model Robustness},
  author={Cheng, Hao and Liu, Xiaodong and Pereira, Lis and Yu, Yaoliang and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2010.12638},
  year={2020}
}

Contact Information

For help or issues using MT-DNN, please submit a GitHub issue.

For personal communication related to this package, please contact Xiaodong Liu ([email protected]), Yu Wang ([email protected]), Pengcheng He ([email protected]), Weizhu Chen ([email protected]), Jianshu Ji ([email protected]), Hao Cheng ([email protected]) or Jianfeng Gao ([email protected]).

Comments
  • No such file or directory: 'ner/ner_train.json'

    No such file or directory: 'ner/ner_train.json'

    ls ner/ chunk_dev.tsv chunk_test.tsv chunk_train.tsv ner_dev.tsv ner_test.tsv ner_train.tsv pos_dev.tsv pos_test.tsv pos_train.tsv test.txt train.txt valid.txt

    python3 train.py --data_dir ner/ --init_checkpoint ./mt_dnn_models/bert_model_large_uncased.pt --train_dataset ner --test_dataset ner --task_def experiments/ner/ner_task_def.yml Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex. Namespace(adam_eps=1e-06, answer_att_hidden_size=128, answer_att_type='bilinear', answer_dropout_p=0.1, answer_mem_drop_p=0.1, answer_mem_type=1, answer_merge_opt=1, answer_num_turn=5, answer_opt=0, answer_rnn_type='gru', answer_sum_att_type='bilinear', answer_weight_norm_on=False, batch_size=8, batch_size_eval=8, bert_dropout_p=0.1, bert_l2norm=0.0, cuda=True, data_dir='ner/', data_sort_on=False, dropout_p=0.1, dropout_w=0.0, dump_state_on=False, ema_gamma=0.995, ema_opt=0, embedding_opt=0, encoder_type=<EncoderModelType.BERT: 1>, epochs=5, fp16=False, fp16_opt_level='O1', freeze_layers=-1, global_grad_clipping=1.0, glue_format_on=False, grad_accumulation_step=1, grad_clipping=0, have_lr_scheduler=True, init_checkpoint='./mt_dnn_models/bert_model_large_uncased.pt', init_ratio=1, label_size='3', learning_rate=5e-05, log_file='mt-dnn-train.log', log_per_updates=500, lr_gamma=0.5, max_seq_len=512, mem_cum_type='simple', mix_opt=0, model_ckpt='checkpoints/model_0.pt', momentum=0, mtl_opt=0, multi_gpu_on=False, multi_step_lr='10,20,30', name='farmer', optimizer='adamax', output_dir='checkpoint', ratio=0, resume=False, save_per_updates=10000, save_per_updates_on=False, scheduler_type='ms', seed=2018, task_def='experiments/ner/ner_task_def.yml', tensorboard=False, tensorboard_logdir='tensorboard_logdir', test_datasets=['ner'], train_datasets=['ner'], update_bert_opt=0, vb_dropout=True, warmup=0.1, warmup_schedule='warmup_linear', weight_decay=0) 10/19/2019 12:30:52 0 10/19/2019 12:30:52 Launching the MT-DNN training 10/19/2019 12:30:52 Loading ner/ner_train.json as task 0 Traceback (most recent call last): File "train.py", line 434, in main() File "train.py", line 207, in main train_data = BatchGen(BatchGen.load(train_path, True, task_type=task_type, maxlen=args.max_seq_len), File "/home/ant/multitask/mt-dnn/mt_dnn/batcher.py", line 53, in load with open(path, 'r', encoding='utf-8') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'ner/ner_train.json'

    opened by antgr 7
  • Why fine-tuning in single-task setting (not as stated in the paper)?

    Why fine-tuning in single-task setting (not as stated in the paper)?

    In the arxiv paper it is stated:

    In the multi-task fine-tuning stage, we use minibatch based stochastic gradient descent (SGD) to learn the parameters of our model (i.e., the parameters of all shared layers and task-specific layers) as shown in Algorithm 1. In each epoch, a mini-batch b_t is selected(e.g., among all 9 GLUE tasks), and the model is updated according to the task-specific objective for the task t. This approximately optimizes the sum of all multi-task objectives.

    If I understand it correctly, in your code this multi-task fine-tuning stage is called MTL refinement. Then why do you fine-tune for each task in single task setting in your fine-tuning stage? There is no such stage in the original paper. Also, in run_mt_dnn.sh there are lines: train_datasets="mnli,rte,qqp,qnli,mrpc,sst,cola,stsb" test_datasets="mnli_matched,mnli_mismatched,rte" Why do you only test on mnli and rte and not test on all other tasks? I would also like to ask if I can switch from BERT large to BERT base there because i only have one 1080 GTX card.

    Thank you.

    good first issue 
    opened by svboeing 7
  • add more detail about how to train a roberta+alum?

    add more detail about how to train a roberta+alum?

    Through the paper I know that is first to pretrain a roberta(it is already done by fb), second to continue train with alum, but I wonder how to merge to code below the alum into fairseq to run the trainning code?

    opened by RyanHuangNLP 6
  • Could you release MT-DNN no-fine-tune checkpoint with task layers?

    Could you release MT-DNN no-fine-tune checkpoint with task layers?

    Hi,

    Is it possible to release MT-DNN no-fine-tune checkpoint?

    Currently released mt-dnn-large.pt only contains shared layers, lacks task-specific layers.

    Thanks!

    opened by yaolu 6
  • Hotfix load_qnnli in glue_utils.py

    Hotfix load_qnnli in glue_utils.py

    I receive two asset error on prepro.py. Specifically, it is in glue_utils.py called from prepro.py. The errors are shown below. And I fixed the errors.

    Traceback (most recent call last):
      File "prepro.py", line 352, in <module>
        main(args)
      File "prepro.py", line 193, in main
        qnnli_train_data = load_qnnli(qnli_train_path, GLOBAL_MAP['qnli'])
      File "./mt-dnn/data_utils/glue_utils.py", line 113, in load_qnnli
        assert len(lines) % 2 == 0
    AssertionError
    
    Traceback (most recent call last):
      File "prepro.py", line 352, in <module>
        main(args)
      File "prepro.py", line 195, in main
        qnnli_test_data = load_qnnli(qnli_test_path, GLOBAL_MAP['qnli'], is_train=False)
      File "./mt-dnn/data_utils/glue_utils.py", line 122, in load_qnnli
        assert block1[1] == block2[1]
    AssertionError
    
    opened by tommy19970714 6
  • test: {

    test: {"metrics": {}, "predictions": <== metrics is empty

    In checkpoints folder, I see that for the test set, there is no result for metrics field. Compare this to dev set, which is ok e.g {"metrics": {"ACC": 76.69902912621359}

    opened by antgr 5
  • modify load data method to fit billion level data, but memory leak, help!!

    modify load data method to fit billion level data, but memory leak, help!!

    We have a billion pieces of data, so, source code put all data to memory is not work. i use dataloader + yield , every batch load data to memory, but gpu memory continue increase. i just test load data, not begin train. I try all clear memory methods(torch.cuda.empty_cache() ... ), but not work. This is my read data code

    class CustomIterableDataset(IterableDataset):
        def __init__(self, task_def, task_id, batch_size=32,
                     gpu=True, is_train=True, epochs=10,
                     maxlen=128, dropout_w=0.005):
            super(CustomIterableDataset).__init__()
            ...省略n行
    
        def _get_max_len(self, batch, key='token_id'):
            tok_len = max(len(x[key]) for x in batch)
            return tok_len
    
        def __if_pair__(self, data_type):
            return data_type in [DataFormat.PremiseAndOneHypothesis, DataFormat.PremiseAndMultiHypothesis]
    
        def __random_select__(self, arr):
            if self.dropout_w > 0:
                return [UNK_ID if random.uniform(0, 1) < self.dropout_w else e for e in arr]
            else: return arr
    
        def patch(self, v):
            v = v.cuda(non_blocking=True)
            return v
    
        def _get_batch_size(self, batch):
            return len(batch)
    
        def _prepare_model_input(self, batch_def):
             #ignore
            ...
            return batch_data, batch_info
    
        def _process(self, batch_def):
            #省略n行....
            if self.gpu:
                for i, item in enumerate(batch_data):
                    batch_data[i] = self.patch(item)
    
             #省略n行....
            return batch_info, batch_data
    
        def _line_mapper(self, lines):
            samples = []
            for line in lines:
                sample = json.loads(line.strip())
                sample['factor'] = 1.0
                samples.append(sample)
            batch_def = {"data": samples,
                         "task_type": self.task_def["task_type"],
                         "task_id": self.task_id,
                         "data_type": self.task_def["data_type"],
                         "encoder_type": self.task_def["encoder_type"],
            }
            return self._process(batch_def)
    
        def _dir_iter(self, file_list):
            if len(file_list) == 0:
                return None
            #file_list = random.shuffle(file_list)
            for f in file_list:
                with open(f) as reader:
                    lines = []
                    for line in reader:
                        if len(lines) >= self.batch_size:
                            yield lines
                            lines = []
                            torch.cuda.empty_cache()
                        lines.append(line)
                    yield lines
    
        def __iter__(self):
            if self.is_train:
                dataset_dir = self.task_def['train_dataset_dir']
            else:
                dataset_dir = task_def['test_dataset_dir']
            file_list = os.listdir(dataset_dir)
            for i, data in enumerate(file_list):
                data = os.path.join(dataset_dir, data)
                file_list[i] = data
    
            line_iter = self._dir_iter(file_list)
            # Create an iterator
            mapped_itr = map(self._line_mapper, line_iter)
    
            return mapped_itr
    

    this is run independently code pytorch 1.2

    import os
    import sys
    import json
    import torch
    import random
    import resource
    from torch.utils.data import IterableDataset
    from torch.utils.data import DataLoader
    from itertools import tee
    
    UNK_ID=100
    BOS_ID=101
    
    class CustomIterableDataset(IterableDataset):
        def __init__(self, task_def, task_id, batch_size=32,
                     gpu=True, is_train=True, epochs=10,
                     maxlen=128, dropout_w=0.005):
            super(CustomIterableDataset).__init__()
            self.task_def = task_def
            self.task_id = task_id
            self.batch_size = batch_size
            self.maxlen = maxlen
            self.is_train = is_train
            self.epochs = 1 if not is_train else epochs
            self.gpu = gpu
            self.dropout_w = dropout_w
            self.pairwise_size = 1
    
    
        def patch(self, v):
            v = v.cuda(non_blocking=True)
            return v
    
        def _dir_iter(self):
            for i in range(0,3):
                lines = []
                reader = [i for i in range(0,1000000)]
                for line in reader:
                    line = torch.LongTensor(self.batch_size, 10).fill_(0)
                    if len(lines) >= self.batch_size:
                        yield lines
                        lines = []
                        torch.cuda.empty_cache ()
                    lines.append(self.patch(line))
                del reader
                yield lines
    
        def __iter__(self):
            mapped_itr =self._dir_iter()
            return mapped_itr
    
    
    if __name__ == '__main__':
        dataset_iter = CustomIterableDataset(task_def=None, task_id=None)
        data_generator = DataLoader(dataset_iter, batch_size=None)
        train_generator_list = []
        train_generator_list.append(iter(data_generator))
        copy_iter_list = []
        for id, first_it in enumerate(train_generator_list):
            first_itr, second_itr = tee(first_it)
            train_generator_list[id] = first_itr
            copy_iter_list.append(second_itr)
        i= 0
        while True:
            try:
                i += 1
                batch_data = next(train_generator_list[0])
                if i % 100 == 0:
                    max_mem_used = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
                    print("Mem: {:.2f} MB".format(max_mem_used / 1024))
            except StopIteration:
                # end of one epoch
                print('again')
                _, train_generator_list[0] = tee(copy_iter_list[0])
    

    image

    opened by xiangxianzhang 5
  • prepro_std.py does not support sequence tasks

    prepro_std.py does not support sequence tasks

    running prepro_std.py for sequence task, gives an error: (also the spelling is incorrect: Seqence --> Sequence)

    python prepro_std.py --model bert-base-uncased --root_dir ner --task_def experiments/ner/ner_task_def.yml 
    Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
    11/04/2019 09:31:52 Task ner
    data_format:  DataFormat.Seqence
    Traceback (most recent call last):
      File "prepro_std.py", line 551, in <module>
        main(args)
      File "prepro_std.py", line 536, in main
        label_mapper)
      File "prepro_std.py", line 433, in load_data
        raise ValueError(data_format)
    ValueError: DataFormat.Seqence
    

    The missing implementation I think is in load data, in final else (implementation for data_format Seqence is missing):

        if task_type == TaskType.Ranking:
            assert data_format == DataFormat.PremiseAndMultiHypothesis
    
        rows = []
        for line in open(file_path, encoding="utf-8"):
            fields = line.strip("\n").split("\t")
            if data_format == DataFormat.PremiseOnly:
                assert len(fields) == 3
                row = {"uid": fields[0], "label": fields[1], "premise": fields[2]}
            elif data_format == DataFormat.PremiseAndOneHypothesis:
                assert len(fields) == 4
                row = {
                    "uid": fields[0],
                    "label": fields[1],
                    "premise": fields[2],
                    "hypothesis": fields[3]}
            elif data_format == DataFormat.PremiseAndMultiHypothesis:
                assert len(fields) > 5
                row = {"uid": fields[0], "ruid": fields[1].split(","), "label": fields[2], "premise": fields[3],
                       "hypothesis": fields[4:]}
            else:
                print ("data_format: ", data_format)
                raise ValueError(data_format) <---- here is the exception
    
    opened by antgr 5
  • Issue in preprocessing data

    Issue in preprocessing data

    I downloaded the data using the download script and ran the prepro.sh script as given in the readme and it runs into the following error:

    sh experiments/glue/prepro.sh
    08/28/2019 11:18:45 Loaded 23596 SciTail train samples
    08/28/2019 11:18:45 Loaded 1304 SciTail dev samples
    08/28/2019 11:18:45 Loaded 2126 SciTail test samples
    08/28/2019 11:18:47 Loaded 549367 SNLI train samples
    08/28/2019 11:18:47 Loaded 9842 SNLI dev samples
    08/28/2019 11:18:47 Loaded 9824 SNLI test samples
    08/28/2019 11:18:48 Loaded 392702 MNLI train samples
    08/28/2019 11:18:48 Loaded 9815 MNLI matched dev samples
    08/28/2019 11:18:48 Loaded 9832 MNLI mismatched dev samples
    08/28/2019 11:18:48 Loaded 9796 MNLI matched test samples
    08/28/2019 11:18:48 Loaded 9847 MNLI mismatched test samples
    08/28/2019 11:18:48 Loaded 3668 MRPC train samples
    08/28/2019 11:18:48 Loaded 408 MRPC dev samples
    08/28/2019 11:18:48 Loaded 1725 MRPC test samples
    08/28/2019 11:18:49 Loaded 104743 QNLI train samples
    08/28/2019 11:18:49 Loaded 5463 QNLI dev samples
    08/28/2019 11:18:49 Loaded 5463 QNLI test samples
    08/28/2019 11:18:50 Loaded 363846 QQP train samples
    08/28/2019 11:18:50 Loaded 40430 QQP dev samples
    08/28/2019 11:18:50 Loaded 390965 QQP test samples
    08/28/2019 11:18:50 Loaded 2490 RTE train samples
    08/28/2019 11:18:50 Loaded 277 RTE dev samples
    08/28/2019 11:18:50 Loaded 3000 RTE test samples
    08/28/2019 11:18:50 Loaded 635 WNLI train samples
    08/28/2019 11:18:50 Loaded 71 WNLI dev samples
    08/28/2019 11:18:50 Loaded 146 WNLI test samples
    08/28/2019 11:18:50 Loaded 67349 SST train samples
    08/28/2019 11:18:50 Loaded 872 SST dev samples
    08/28/2019 11:18:50 Loaded 1821 SST test samples
    08/28/2019 11:18:50 Loaded 8551 COLA train samples
    08/28/2019 11:18:50 Loaded 1043 COLA dev samples
    08/28/2019 11:18:50 Loaded 1063 COLA test samples
    08/28/2019 11:18:50 Loaded 5749 STS-B train samples
    08/28/2019 11:18:50 Loaded 1500 STS-B dev samples
    08/28/2019 11:18:50 Loaded 1379 STS-B test samples
    08/28/2019 11:18:50 done with scitail
    08/28/2019 11:18:52 done with snli
    08/28/2019 11:18:54 done with mnli
    08/28/2019 11:18:54 done with mrpc
    08/28/2019 11:18:54 done with qnli
    08/28/2019 11:18:57 done with qqp
    08/28/2019 11:18:57 done with rte
    08/28/2019 11:18:57 done with wnli
    08/28/2019 11:18:57 done with sst
    08/28/2019 11:18:58 done with cola
    08/28/2019 11:18:58 done with stsb
    Traceback (most recent call last):
      File "prepro_std.py", line 7, in <module>
        from pytorch_pretrained_bert.tokenization import BertTokenizer
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/pytorch_pretrained_bert/__init__.py", line 7, in <module>
        from .modeling import (BertConfig, BertModel, BertForPreTraining,
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 218, in <module>
        from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
        from apex.interfaces import (ApexImplementation,
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
        class ApexImplementation(object):
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
        implements(IApex)
      File "/home/ec2-user/anaconda3/envs/ms_mtdnn/lib/python3.6/site-packages/zope/interface/declarations.py", line 483, in implements
        raise TypeError(_ADVICE_ERROR % 'implementer')
    TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.
    
    opened by zeeshansayyed 4
  • RuntimeError: CUDA error: device-side assert triggered

    RuntimeError: CUDA error: device-side assert triggered

    In training model, when I use python train.py --init_checkpoint='mt_dnn_models/bert_model_base_chinese.pt', Some error occur: RuntimeError: CUDA error: device-side assert triggered

    However, when I replace this hyper-parameter with, for example, bert_model_base_uncased.pt; bert_model_base_uncased.pt, it can train normally.

    But, it is obviously not what I want, I train it for a simple four-class classification task in chinese dataset, and want to use bert_model_base_chinese.py you provided.

    Note that:

    • all model mentioned above are download by download.sh
    • I have check out this issue previously in pytorch repo, also it could not give me any helps.
    opened by mazicwong 4
  • How to change the code for a specific task?

    How to change the code for a specific task?

    I am trying to run task 1. Now science I removed the data and json file related to other tasks, I am getting an error for missing file. But I don't want to run the whole task as I am very specific just for one task. Please help me.

    opened by Rajratnpranesh 4
  • Older version of Pytorch unavailable

    Older version of Pytorch unavailable

    I am quite new to deep learning and coding in general but am keen to make use of this library to explore a possible dissertation topic on MT. I have cloned the repository onto my hard drive but when I try to install with pip I get an error to the effect that pip cannot find version 1.5.0 of pytorch - the earliest available is 1.7.0. Does it need to be 1.5.0 or can I tweak requirements.txt to choose a different version and if so, which version should be used? Many thanks for your support. Rob

    opened by robfuller7 0
  • Can you provide the pretrain files of Hugging Face?

    Can you provide the pretrain files of Hugging Face?

    It seems that it is not convenient to put mtdnn in other work, and it can be more convenient to use mtdnn with the hugging face. But I searched the official website and found that no one published relevant content. Can you provide the corresponding file format? thank you!

    opened by chaser-girl 0
  • Readme.md is updated?

    Readme.md is updated?

    First observation: the Python version on the instructions is 3.6, but the transformer version in the requirements.txt is 4.20.0, which is only available for python >= 3.7. The second is: I cannot find the scripts directory. I'm looking for the embedding extractor. So, I'm assuming that is the /experiments/dump_embedding directory. It's correct?

    And finally, nice job! You have very interesting results.

    opened by LarissaGuder 0
  • mt-dnn on Windows?

    mt-dnn on Windows?

    Hello, I am trying to download and test the repository, but I keep getting an error with docker nvidia runtime. So I am just guessing: is it possible to run the project on Windows? Thank you in advance! Simone

    ErrorResponseFromDaemon

    opened by SimoneGherardi 0
  • question about task split and pretrain model

    question about task split and pretrain model

    Hello! Your work is very effective. I wonder if tasks must be separated from the benchmark model before they can be used as pre training language models for other tasks? Does the provided. PT file split the task?

    opened by chaser-girl 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In mt-dnn, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    numpy
    torch==1.5.0
    tqdm
    colorlog
    boto3
    pytorch-pretrained-bert==v0.6.0
    regex
    scikit-learn
    pyyaml
    pytest
    sentencepiece
    tensorboardX
    tensorboard
    future
    apex
    seqeval==0.0.12
    transformers==4.6.0
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency numpy can be changed to >=1.8.0,<=1.23.0rc3. The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0. The version constraint of dependency pytorch-pretrained-bert can be changed to >=0.1.1,<=0.3.0. The version constraint of dependency scikit-learn can be changed to >=0.15.0,<=0.20.4. The version constraint of dependency future can be changed to >=0.12.0,<=0.18.2. The version constraint of dependency transformers can be changed to >=2.0.0,<=4.1.1.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the numpy
    numpy.ma.masked_array
    
    The calling methods from the tqdm
    tqdm.tqdm
    
    The calling methods from the pytorch-pretrained-bert
    pytorch_pretrained_bert.modeling.BertLayerNorm
    
    The calling methods from the scikit-learn
    sklearn.metrics.f1_score
    random.choice
    sklearn.metrics.accuracy_score
    sklearn.metrics.matthews_corrcoef
    sklearn.metrics.roc_auc_score
    sklearn.metrics.confusion_matrix
    
    The calling methods from the future
    time.strftime
    datetime.datetime.now
    
    The calling methods from the transformers
    transformers.AutoTokenizer.from_pretrained
    
    The calling methods from the all methods
    plt.clf
    _make_qid_to_has_ans
    grad.abs.add_
    sorted.append
    torch.nn.Tanh
    score.reshape.tolist.numpy
    os.path.isdir
    torch.erf
    logging.StreamHandler.setFormatter
    self._task_def_dic.items
    layer_module
    module.register_forward_pre_hook
    subprocess.call.strip
    iter
    make_qid_to_has_ans.items
    self._prepare_model_input
    rnn_cls
    m.contiguous
    src.items
    set
    data_utils.log_wrapper.create_logger.warning
    numpy.ma.masked_array
    extra_indices.tolist.tolist
    mask.size
    output_dir.os.path.join.open.read
    self.shadow.items
    find_best_thresh
    self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose
    find_all_best_thresh
    min
    truncate_seq_pair
    data.get.str.lower
    score.numpy.numpy
    set_config
    self.dropout.transpose
    transformers.get_polynomial_decay_schedule_with_warmup
    key_padding_mask.unsqueeze.unsqueeze
    input_lengths.append
    self.tok2ind.keys
    final_predict.append
    passage.replace.replace
    prefix.format.opt.get.upper
    logging.StreamHandler
    f.readlines
    logging.StreamHandler.setLevel
    _get_raw_scores
    torch.nn.functional.cross_entropy
    line.strip.split
    tqdm.tqdm
    prepare_validation_features
    AttentionWrapper
    range
    format.startswith
    self._tokenizer.vocab.keys
    tokens_b.extend
    scores.data.masked_fill_
    qb.new_dict.cpu.numpy
    rnn_type.upper.upper
    delattr
    score.numpy.tolist
    data_utils.task_def.EncoderModelType
    mlm_p.view.size
    shutil.rmtree
    load_stsb
    self.AdamaxW.super.__init__
    self.kd_task_loss_criterion.append
    self.dropout.bmm
    partial_feature
    data_utils.log_wrapper.create_logger.error
    x.size
    self.SmartPerturbation.super.__init__
    torch.tanh
    self.eff_perturb.update
    copy.deepcopy.item
    rows.append
    key_padding_mask.unsqueeze
    score.reshape.tolist
    self.linear.contiguous
    test_prefix.lower
    self.score_func
    load_qqp
    self.SimpleFlatSim.super.__init__
    self.BilinearSum.super.__init__
    make_qid_to_has_ans.values
    json.dumps
    extract_answers_from_features
    x_flat.self.linear.view
    self.rnn.new
    os.path.exists
    TaskDef
    text.lower
    round
    self.Pooler.super.__init__
    self.decoder
    self.dropout
    copy.deepcopy.extend
    self.DropoutWrapper.super.__init__
    self.y_linear
    logging.FileHandler
    inputs.append
    data_utils.vocab.Vocabulary
    SanPooler
    numpy.random.seed
    x1.contiguous.view
    args.train_datasets.split
    self._global_map.get
    i.multi_task_train_data.len.i.start.datetime.now.str.split
    search_index
    torch.float32.target.detach.F.softmax.sqrt_
    kb.new_dict.cpu.numpy
    mt_dnn.batcher.Collater
    torch.device
    open
    key.transpose.size
    masked_lm_positions.append
    collections.OrderedDict
    torch.max
    torch.sum
    text.split
    random.randint
    self.dropout.contiguous
    torch.unbind
    x1_flat.self.proj_1.self.f.view.bmm
    self.qkv_head_dim.self.num_heads.bsz.src_len.v.contiguous.view.transpose.contiguous
    experiments.superglue.superglue_utils.save
    all_encoder_layers.append
    y.unsqueeze.expand_as.unsqueeze
    self.optimizer.zero_grad
    data_utils.tokenizer_utils.create_tokenizer
    self.encode
    qa.get
    mt_dnn.batcher.DistTaskDataset
    batch_meta.batch_data.detach.cpu.numpy.tolist
    self.classifier
    sklearn.metrics.roc_auc_score
    self.SANBertNetwork.super.__init__
    grad.abs.max
    self.optimizer.load_state_dict
    self.qkv_head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose
    self.init
    self.setup
    self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.transpose
    self.alpha.data.new
    torch.autograd.grad
    common.activation
    kb.new_dict.cpu
    load_json
    super
    labels.append
    tasks.get_task_obj.train_forward
    super.__init__
    beta2.exp_inf.mul_.unsqueeze
    logits.size.logits.data.new.zero_.torch.diag.byte
    train_datasets.append
    torch.nn.functional.kl_div
    self.f
    self.train_loss.update
    seqeval.metrics.classification_report
    FlatSimV2
    embed.detach.abs.mean
    subprocess.check_output
    scipy.stats.spearmanr
    x.torch.std.expand_as
    numpy.argmax
    make_eval_dict
    mt_dnn.batcher.DistMultiTaskBatchSampler
    score.reshape.tolist.reshape
    mt_dnn.model.MTDNNModel.update
    k.contiguous.view
    token.startswith
    p.transpose
    module.my_optim.weight_norm
    noise.detach.requires_grad_
    self.DotProduct.super.__init__
    mt_dnn.batcher.Collater.patch_data
    load_mrpc
    SimpleFlatSim
    eff_noise.detach
    plt.hist
    adv_logits.view.view
    load_copa_mtdnn
    run_precision_recall_analysis
    trim
    torch.zeros_like
    batch_info.pin_memory.to
    self.DeepAttentionWrapper.super.__init__
    target.contiguous.view.view
    int
    self.parameters
    self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.contiguous
    torch.manual_seed
    merge_eval
    x2_flat.self.proj_2.self.f.view.transpose
    scores.items
    shutil.copytree
    plot_pr_curve
    self.__str__
    grad.sign
    json.loads.get
    time.strftime
    BilinearFlatSim
    compute_f1
    reduce_features_to_examples.append
    closure
    end.numpy.tolist.contiguous
    data_utils.load_data
    self.bert
    create_bins
    self.SANClassifier.super.__init__
    get_tokens
    logging.FileHandler.setLevel
    self._tokenizer.convert_tokens_to_ids.get
    y.x.torch.abs.y.x.y.x.torch.cat.contiguous
    x.size.idx.mask.view.expand_as
    self.rnn.size
    embedding_weights.size
    torch.nn.functional.mse_loss
    uid.split
    torch.LongTensor
    sample.get.strip
    os.getcwd
    self._rnn
    att_scores.F.softmax.unsqueeze
    temp_answers.append
    test_prepro_std
    self.MLPSelfAttn.super.__init__
    self.FlatSimV2.super.__init__
    torch.nn.ModuleList
    main
    load_data
    os.mkdir
    self.compute_weight
    self._setup_lossmap
    setattr
    x1.size.x1.size.x1.contiguous.view.self.x_dot_linear.view.expand_as.bmm
    plt.savefig
    epsilon.epsilon.y.detach
    module.similarity.SelfAttnWrapper
    torch.distributed.all_reduce
    ground_truth.normalize_answer.split
    batch_meta.batch_data.size
    self.mnetwork.bert
    self.query_wsum
    prediction.normalize_answer.split
    dim.p.transpose._norm.transpose
    self._norm_grad
    torch.nn.Linear
    start_scores.squeeze.squeeze
    adv_lc
    model_config
    remove_punc
    self.config.update
    torch.isnan
    batch_meta.batch_data.detach
    para.strip
    tensorboardX.SummaryWriter.close
    data_utils.roberta_utils.update_roberta_keys
    sub_part.pin_memory.to
    start.data.cpu
    self.x_dot_linear
    NotImplementedError
    self.attn
    set.add
    sequence_outputs.append
    slen.idx.yidx.embeddings.tolist
    self.load
    model.to.to
    exp_avg.mul_
    self.task_types.append
    self.encoder
    numpy.array
    task_id.self.scoring_list.split
    self.qkv_head_dim.self.num_heads.bsz.tgt_len.q.contiguous.view.transpose.contiguous
    task_id.self.scoring_list
    y.x.torch.abs.y.x.y.x.torch.cat.contiguous.view
    y.view.view
    module.common.activation
    self.score_func.size
    experiments.common_utils.dump_rows
    self.emb_val.update
    value.transpose.transpose
    list.extend
    tasks.get_task_obj.train_build_task_layer
    start.append
    vars.update
    transformers.get_cosine_schedule_with_warmup
    train_data_list.append
    transformers.get_linear_schedule_with_warmup
    predict.reshape.tolist
    dataset.get_task_id
    self._tokenizer.convert_tokens_to_ids
    end.append
    self.WeightNorm.super.__init__
    self.linear
    idx.lines.strip
    torch.mean
    argparse.ArgumentParser.print_help
    mt_dnn.optim.AdamaxW
    score.np.argmax.reshape
    qb.new_dict.cpu
    self._dataset.get_task_id
    plt.xlim
    line.strip
    self.network.eval
    dump_data
    train_config.parse_args
    vw.new_dict.cpu
    end.numpy.tolist.numpy
    load
    alpha.unsqueeze.bmm
    collections.defaultdict
    self.num_hid.size.size.tmp_output.view.max
    dropout_layer
    question.strip
    self._get_param_groups
    bool
    newbatch.append
    s.normalize_answer.split
    out_f.write
    self.qkv_head_dim.self.num_heads.bsz.src_len.v.contiguous.view.transpose
    input_ids.index
    normalize_answer
    any
    key_padding_mask.size
    torch.nn.utils.weight_norm.unsqueeze
    machines.append
    tasks.get_task_obj.input_parse_label
    stable_kl
    pred.pop
    adv_loss.item
    tensor.to
    delta_grad.norm
    torch.nn.utils.weight_norm
    issubclass
    self.FlatSimilarityWrapper.super.__init__
    e.to
    start.numpy.tolist.numpy
    score.data.cpu
    max
    x.att_scores.F.softmax.unsqueeze.torch.bmm.squeeze
    tasks.get_task_obj
    torch.nn.parallel.DistributedDataParallel
    self.config.get.backward
    self.Bilinear.super.__init__
    task_id.self.task_loss_criterion
    logging.FileHandler.setFormatter
    float
    experiments.exp_def.TaskDefs.get_task_names
    load_multirc_mtdnn
    len.get
    result.strip.split
    model.to.parameters
    _norm
    self.dropout.size
    torch.norm
    hasattr
    torch.diag
    temp_1.pop
    self.LinearSelfAttn.super.__init__
    data_utils.mrc_eval.squadv2_evaluate_func
    TypeError
    self.scalar.expand_as
    initialize_distributed
    task_cls
    LinearSelfAttn
    sorted
    start.numpy.tolist.contiguous
    torch.utils.data.DataLoader
    model_class.from_pretrained
    state.items
    enumerate
    sys.exit
    self.bert.generate
    self._setup_kd_lossmap
    self._gen_task_indices
    plt.title
    data_utils.mrc_eval.squadv1_evaluate_func
    weight.cpu
    join
    os.path.abspath
    numpy.max
    tasks.get_task_obj.test_predict
    target.size.target.view.float
    pdb.set_trace
    key.transpose.transpose
    torch.cat
    random.uniform
    literal_model_type.lower.lower
    exp_inf.new
    get_raw_scores
    self.adv_task_loss_criterion.append
    epsilon.epsilon.p.detach.log
    emb_val.item
    features.append
    x.torch.mean.expand_as
    new_data.zero_.size
    qw.new_dict.cpu
    load_mlm_data
    tok_len.batch_size.torch.BoolTensor.fill_
    self.proj_1
    torch.abs
    json.loads.split
    dump
    q.contiguous.view
    eff_noise.detach.abs
    metric_fn
    self.network.cuda
    batch_meta.get
    search_bin
    bais.cpu.numpy
    type_ids_list.append
    make_precision_recall_eval
    create_masked_lm_predictions
    attn_weights.data.masked_fill_
    logging.Formatter
    y.x.torch.cat.contiguous.view
    self.proj
    ValueError
    logits.size.logits.data.new.zero_.torch.diag.byte.unsqueeze.expand_as
    i.tgt.extend
    query.replace
    time.gmtime
    load_mnli
    histogram_na_prob
    collections.Counter
    logit.size.logit.view.float.size
    generate_mask
    print_message
    local_task_idx.self._datasets.get_task_id
    torch.load.size
    part.pin_memory
    positions.append
    self.activation
    torch.pow
    assert_file_equal
    tasks.get_task_obj.train_prepare_label
    attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte
    torch.isinf
    x1.size.x1.contiguous.view.self.x_dot_linear.view
    numpy.array_equal
    torch.nn.utils.clip_grad_norm_
    grounds.items
    Vocabulary
    attn_weights.float.masked_fill
    self.embed_encode
    type
    cls
    mt_dnn.model.MTDNNModel.load
    mt_dnn.model.MTDNNModel.save
    self.__if_pair__
    task.split
    format
    module._parameters.keys
    merge
    logits.detach
    logit.F.log_softmax.exp
    model.detach
    subprocess.call
    dropout_wrapper.DropoutWrapper
    eval
    config_class.from_dict
    isinstance
    scaled_loss.backward
    datetime.datetime.now
    self.dropout.view
    self._get_max_len
    kd_lc
    args.test_datasets.split
    dataset.items
    input_ids_list.append
    logits.size.logits.data.new.zero_.torch.diag.byte.unsqueeze
    tokenizer.vocab.keys
    load_record_mtdnn
    evaluate
    logging.getLogger.addHandler
    beta1.exp_avg.mul_.add_
    load_sst
    batch_info.pin_memory
    locals
    qw.new_dict.cpu.numpy
    new_batch.append
    state.keys
    batch_data.append
    expected_file.open.read
    end_position.append
    laod_function
    FlatSim
    attn_weights.data.new
    torch.nn.functional.log_softmax
    i.self.attn_list
    self.embeddings
    process_data
    plt.xlabel
    load_xnli
    mt_dnn.model.MTDNNModel.cuda
    mt_dnn.batcher.MultiTaskBatchSampler
    scores.extend
    map
    score.contiguous.view
    self.rnn
    self.proj_2
    vars
    data_utils.load_score_file.keys
    temp_1.append
    self._get_index_batches
    mt_dnn.model.MTDNNModel
    y.x.y.x.torch.cat.contiguous.view
    data_utils.metrics.calc_metrics
    MultiheadAttentionWrapper
    feature.example_id_to_index.features_per_example.append
    para.clone
    self.beta.expand_as
    opt.EncoderModelType.name.lower
    scores_for_ground_truths.append
    sentence.append
    self.pooler
    self.network.train
    numpy.zeros
    label.append
    scores_list.append
    os.getenv
    copy.deepcopy
    tokens.append
    p.norm
    numpy.exp
    loss.stable_kl
    self._dropout_p_map.get
    self.AttentionWrapper.super.__init__
    json.loads
    new_sequence_outputs.append
    parse_args
    re.compile
    transformers.AutoTokenizer.from_pretrained.tokenize
    prefix.format.opt.get.lower
    x_flat.self.FC.self.f.self.linear.view
    plt.ylabel
    docs.append
    torch.float32.input.detach.F.softmax.sqrt_
    ry.rp.p.sum
    data_config
    list
    v.contiguous.view
    pytorch_pretrained_bert.modeling.BertEmbeddings
    idx.mask.view
    apply_no_ans_threshold
    self.adv_teacher.forward
    module.dropout_wrapper.DropoutWrapper
    v.cpu
    config.get
    dataset.split
    idx.all_encoder_layers.detach
    train_config.add_argument
    self.FC
    sample_id_2_label_dic.keys
    weight.reshape
    proj.unsqueeze.x.bmm.squeeze
    self.SelfAttnWrapper.super.__init__
    rng.randint
    data_utils.load_score_file
    numpy.exp.sum
    self.dropout_list.append
    data_utils.log_wrapper.create_logger
    torch.cuda.set_device
    dev_data_list.append
    query.transpose.size
    mt_dnn.inference.eval_model
    target.contiguous.view
    h0.size.h0.new.zero_
    locals.items
    TASK_REGISTRY.get
    plt.step
    n_best_size.start_logits.np.argsort.tolist
    Vocabulary.add
    attn.transpose.transpose
    tokens.extend
    temp_2.pop
    experiments.ner.ner_utils.load_conll_pos
    self.Trilinear.super.__init__
    logits.data.new
    numpy.ones_like
    p.nelement
    self.network.named_parameters
    pool.map
    numpy.power
    self.network.parameters
    rvl.append
    random.sample
    self.attn_list.append
    compare_output
    self.FlatSim.super.__init__
    self.network.encode
    torch.bernoulli
    assert_dir_equal
    torch.save
    json.dump
    self.SimilarityWrapper.super.__init__
    y_pred.append
    pooled_output.contiguous.view
    torch.distributed.init_process_group
    task_def.get
    args.task.lower
    embed.data.new
    vw.new_dict.cpu.numpy
    xWy.data.masked_fill_
    self.DotProductProject.super.__init__
    next
    batch_meta.batch_data.detach.cpu
    MaskedLmInstance
    self.num_hid.size.size.tmp_output.view.max.view
    Classifier
    x2.x1.abs
    score.np.argmax.tolist
    predict.strip.reshape
    self.reset
    query.transpose.transpose
    DotProductProject
    self._setup_adv_lossmap
    vb.new_dict.cpu
    generate_noise
    torch.cuda.device_count
    experiments.mlm.mlm_utils.load_loose_json.append
    batch_meta.batch_data.cuda
    torch.optim.SGD
    model.state_dict.items
    logit.size.logit.view.float
    self.x_linear
    rng.shuffle
    task_id.self.dropout_list.size
    sum
    tensorboardX.SummaryWriter
    os.access
    sys.path.append
    encoding_0.size.input_length.torch.LongTensor.fill_
    x.size.x.size.x.data.new.zero_
    sigmoid
    uid.gold_map.append
    x2_flat.self.proj_2.self.f.view
    rng.random
    golds.append
    weight.cpu.numpy
    tokenizer
    argparse.ArgumentParser
    WeightNorm
    p.size.p.contiguous.view.norm
    task_def_list.append
    self.qkv_head_dim.self.num_heads.bsz.src_len.k.contiguous.view.transpose.size
    experiments.exp_def.TaskDefs
    self._setup_adv_training
    line.strip.startswith
    self.optimizer.step
    self.adv_loss.update
    logit.size.logit.view.float.view
    args.model.split
    TrainingInstance
    torch.ones
    config_class.from_pretrained
    flat_scores.contiguous.view
    collections.namedtuple
    torch.log
    scipy.stats.pearsonr
    logging.getLogger.info
    grounds.append
    x1.size.x1.size.x1.contiguous.view.self.x_dot_linear.view.expand_as
    lang_map.items
    tasks.get_task_obj.input_is_valid_sample
    attn.transpose.contiguous
    self.att
    torch.optim.AdamW
    mt_dnn.perturbation.SmartPerturbation
    doc.split
    data_utils.tokenizer_utils.create_tokenizer._convert_token_to_id
    tokenizer.append
    embed.size.embed.data.new.normal_
    torch.load
    numpy.concatenate
    p.dim
    torch.nn.functional.softmax
    build_data
    y.unsqueeze.expand_as
    self.Classifier.super.__init__
    SimilarityWrapper
    logits.size.logits.data.new.zero_
    attn.transpose.contiguous.view
    attn_weights.size.attn_weights.data.new.zero_
    uid.predict_map.append
    key_padding_mask.unsqueeze.expand_as
    self._layer_norm
    self.EMA.super.__init__
    start_position.append
    p.contiguous.view
    ent_strs.append
    torch.BoolTensor
    idx.lines.strip.split
    multiprocessing.Pool
    load_qnli
    torch.cuda.is_available
    sklearn.metrics.accuracy_score
    x.alpha.unsqueeze.bmm.squeeze
    logging.getLogger
    lower
    mask.size.score.np.argmax.reshape.tolist
    score.numpy.reshape
    group.setdefault
    mt_dnn.batcher.DistSingleTaskBatchSampler
    load_wnli
    human.items
    self.decoder_opt.append
    y.x.y.x.torch.cat.contiguous
    os.path.splitext
    pred.items
    predictions.append
    random.Random
    self.rebatch
    apex.amp.master_params
    numpy.ma.masked_array.mean
    mt_dnn.model.MTDNNModel.extract
    end.numpy.tolist
    load_scitail
    numpy.random.choice
    test_encoder
    tokenizer.pop
    attn_weights.float.F.softmax.type_as
    MLPSelfAttn
    data_utils.vocab.Vocabulary.add
    compute
    writer.write
    self.linear.unsqueeze
    tok_len.batch_size.torch.LongTensor.fill_
    self.mnetwork
    model.to.state_dict
    end.data.cpu
    mt_dnn.inference.extract_encoding
    WeightNorm.compute_weight
    sklearn.metrics.f1_score
    target.contiguous.view.float
    instances.append
    output_file.open.read
    load_squad
    target.contiguous.view.detach
    experiments.glue.glue_utils.submit
    attention_mask_list.append
    model.encode
    load_qnnli
    self.nsp
    self.BilinearFlatSim.super.__init__
    tqdm.auto.tqdm
    proj
    self.AdamaxW.super.__setstate__
    mlm_y.view.view
    hyp_mask.size
    expected_dir.os.path.join.open.read
    masked_lm_labels.append
    p.contiguous
    epsilon.epsilon.y.detach.log
    test_metrics.items
    idx.all_encoder_layers.detach.cpu.numpy
    sub_part.pin_memory
    epsilon.epsilon.p.detach
    data_utils.utils.set_environment
    cand_indexes.append
    self._to_cuda
    len
    data_utils.utils.AverageMeter
    hid_shape.weight.new.zero_
    apex.amp.scale_loss
    torch.no_grad
    torch.nn.functional.dropout
    sklearn.metrics.matthews_corrcoef
    model_class
    make_qid_to_has_ans
    remove_articles
    x1.size.x1.contiguous.view.self.x_linear.view.expand_as
    test_data_list.append
    vb.new_dict.cpu.numpy
    self.self_att
    task_id.self.dropout_list.contiguous
    x2.size.x2.contiguous.view.self.y_linear.view
    suffix.split.upper
    self.encoder_type.EncoderModelType.name.lower
    self.init_hidden
    logging.getLogger.setLevel
    self.tokenizer.batch_decode
    weight.size
    os.path.split
    self.scheduler.step
    input.view.float
    recalls.append
    torch.zeros
    self.MaskLmHeader.super.__init__
    SanEncoder
    p.data.mul_
    start.numpy.tolist
    current_chunk.append
    random.seed
    p.size.p.contiguous.view.norm.view
    filecmp.cmp
    sorted.insert
    eval.split
    load_rte
    functools.partial
    self._rnn.flatten_parameters
    model.predict
    v.cuda.cuda
    self._get_shuffled_index_batches_bin
    module.similarity.FlatSimilarityWrapper
    target.contiguous.view.size
    score.numpy.contiguous
    build_data_single
    positives.random.sample.pop
    qa_entry.get
    torch.ones_like
    torch.distributed.is_initialized
    kw.new_dict.cpu
    random.choice
    generate_decoder_opt
    encoding_0.size.input_lengths.max.input_lengths.len.torch.LongTensor.fill_
    DotProduct
    torch.bmm
    mask.unsqueeze.expand_as
    format.replace
    train_config
    module.pooler.Pooler
    p.data.addcdiv_
    self._get_shuffled_index_batches
    self.alpha.expand_as
    attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte.unsqueeze
    load_boolq
    metric_max_over_ground_truths
    cs.LOSS_REGISTRY
    mask.sum
    self._get_batch_size
    getattr.cuda
    self.optimizer.state_dict
    argparse.ArgumentParser.parse_args
    opt.get
    attn_weights.size.attn_weights.data.new.zero_.torch.diag.byte.unsqueeze.expand_as
    load_cb
    bin_idx.data.append
    grad.abs
    dict
    argparse.ArgumentParser.add_argument
    exp_inf.mul_
    len.pop
    data_utils.utils_qa.postprocess_qa_predictions
    x.data.new
    yaml.safe_load
    torch.nn.functional.softmax.unsqueeze
    WeightNorm.apply
    transformers.get_constant_schedule_with_warmup
    experiments.ner.ner_utils.load_conll_ner
    _mg
    RuntimeError
    noise.detach.detach
    label_tokenize
    prepare_train_feature
    SanLayer
    common.init_wrapper
    args.layers.split
    temp_2.append
    self.dropout.float
    plt.fill_between
    next.extend
    pytorch_pretrained_bert.modeling.BertLayerNorm
    n_best_size.end_logits.np.argsort.tolist
    self.task_loss_criterion.append
    float.key_padding_mask.unsqueeze.unsqueeze.attn_weights.float.masked_fill.type_as
    feature_extractor
    part.pin_memory.to
    load_cola
    input.view.detach
    x1.size.x1.contiguous.view.self.x_linear.view
    segment_ids.append
    sklearn.metrics.confusion_matrix
    tokens_a.extend
    experiments.mlm.mlm_utils.load_loose_json
    torch.cuda.manual_seed_all
    feature_index.features.get
    Wy.unsqueeze.x.bmm.squeeze
    str
    encoding_1.encoding_0.abs
    self.score_func.view
    torch.std
    y.x.torch.cat.contiguous
    self._task_type_map.keys
    embed.detach.abs
    task_id.self.dropout_list
    exact_scores.values
    seqeval.metrics.f1_score
    apex.amp.initialize
    plt.ylim
    torch.nn.parameter.Parameter
    blocks.lang_dict.append
    self.dense
    self.bert.parameters
    torch.nn.Parameter
    os.makedirs
    entry.get
    random.shuffle
    BilinearSum
    tasks.get_task_obj.test_prepare_label
    experiments.exp_def.TaskDef.from_dict
    line.strip.split.split
    self.tok2ind.get
    x1_flat.self.proj_1.self.f.view
    f1_scores.values
    sample_id_2_label_dic.items
    generate_golds_predictions_scores
    context.strip
    re.sub
    transformers.AutoTokenizer.from_pretrained.convert_tokens_to_ids
    self.config.get
    args.model.split.split
    mask.sum.tolist
    torch.nn.DataParallel
    input.view.squeeze
    transformers.AutoTokenizer.from_pretrained
    Trilinear
    register_task
    x2.contiguous.view
    self.network.load_state_dict
    module.register_parameter
    target.contiguous.view.contiguous
    experiments.exp_def.TaskDefs.get_task_def
    torch.FloatTensor
    tensorboardX.SummaryWriter.add_scalar
    sizes.append
    self.rnn_type.nn.getattr
    module.san.SANClassifier
    metric_func
    self.ind2tok.get
    self._setup_optim
    re.compile.findall
    kw.new_dict.cpu.numpy
    score_path.open.read
    prelim_predictions.append
    mt_dnn.matcher.SANBertNetwork
    glob.glob
    arch.config_class.from_pretrained.to_dict
    torch.bernoulli.unsqueeze
    Bilinear
    self.__random_select__
    torch.distributed.get_rank
    self.model.named_parameters
    data_utils.roberta_utils.patch_name_dict
    model
    input.view.view
    line.strip.strip
    max_answer_seq_len.batch_size.torch.LongTensor.fill_
    mt_dnn.batcher.MultiTaskDataset
    collections.Counter.values
    getattr
    math.sqrt
    logits.data.masked_fill_
    qa_sample
    ImportError
    args.transformer_cache.init_model.config_class.from_pretrained.to_dict
    examples.append
    x2.size.x2.contiguous.view.self.y_linear.view.expand_as
    self.bert.embeddings
    data_utils.log_wrapper.create_logger.info
    experiments.mlm.mlm_utils.create_instances_from_document
    eff_perturb.item
    white_space_fix
    reduce_features_to_examples
    end_scores.squeeze.squeeze
    compute_acc
    feature_index.features.get.get
    experiments.ner.ner_utils.load_conll_chunk
    next.new
    predict.strip.strip
    exp_inf.new.long
    new_data.zero_.zero_
    eff_noise.detach.abs.mean
    torch.stack
    json.load
    evaluation
    tasks.get_task_obj.train_prepare_soft_labels
    mt_dnn.batcher.SingleTaskDataset
    idx.all_encoder_layers.detach.cpu
    target.F.log_softmax.exp
    task_id.self.dropout_list.view
    self.config.get.item
    flat_squad
    load_wic_mtdnn
    batch_meta.batch_data.detach.cpu.numpy
    attn.transpose.size
    zip
    self.LayerNorm.super.__init__
    task_layer
    p.size
    bais.cpu
    tokenizer.sequence_ids
    compute_exact
    os.path.join
    precisions.append
    y_true.append
    eps.grad.abs.add_.unsqueeze_
    mlm_p.view.view
    repr
    model.cuda
    tuple
    x.contiguous.view
    print
    numpy.argsort
    load_snli
    model.size
    self.scoring_list.append
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
Owner
Xiaodong
And if you gaze long into an abyss, the abyss also gazes into you --Friedrich Nietzsche
Xiaodong
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Yu Meng 38 Dec 12, 2022
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

null 0 Mar 1, 2022
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 5, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Karan Desai 105 Nov 25, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Facebook Research 712 Dec 19, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Mutli-agent task allocation This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams. To change

Biorobotics Lab 5 Oct 12, 2022
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

Chetan Hirapara 3 Oct 7, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

null 2 Aug 5, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

Oliver Hahn 1 Jan 26, 2022
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (NeurIPS 2020) Introduction AdaShare is a novel and differentiable approach fo

null 94 Dec 22, 2022
Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

Improving evidential deep learning via multi task learning It is a repository of AAAI2022 paper, “Improving evidential deep learning via multi-task le

deargen 11 Nov 19, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 3, 2023