Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Overview

DialogBERT

This is a PyTorch implementation of the DialogBERT model described in DialogBERT: Neural Response Generation via Hierarchical BERT with Distributed Utterance Order Ranking.


Prerequisites

  • Python 3.6
  • PyTorch

Install packages of the requirements.txt file.

Usage

  • Run model by
      python main.py
    

The logs and temporary results will be printed to stdout and saved in the ./output path.

References

If you use any source code included in this toolkit in your work, please cite the following paper:

@inproceedings{gu2021dialogbert,
      title={Dialog{BERT}: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances},
      author={Gu, Xiaodong and Yoo, Kang Min and Ha, Jung-Woo},
      journal={In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)},
      year={2021}
}
Comments
  • Question about model parameter size

    Question about model parameter size

    I am interested in implementing gradient checkpointing to support DialogBERT-XL training. What would the level of effort be with modifying DialogBERT to support an equivalent parameter size as GPT2-XL?

    Thanks in advance!

    opened by pablogranolabar 7
  • Could you please share the script for preprocessing the original dialogues?

    Could you please share the script for preprocessing the original dialogues?

    Hi, I found the code was refreshed 15 days ago.

    I would like to use this model for a brand new dialogue dataset. I noticed that the data/ have h5 files such as dailydialog/train.h5. I also downloaded the original dailydialog dataset, but I do not know how to parse them to be train.h5.

    Could you please share related script or source code please? thank you very much.

    opened by frankdarkluo 3
  • Issue with V100 Distributed Training

    Issue with V100 Distributed Training

    I have the following distributed training setup working without issue on Tesla K80, but whenever I attempt to do this with an 8X V100 the training process just silently hangs without dispatching any process to any of the GPUs:

    export MASTER_PORT=29500
    export MASTER_ADDR="127.0.0.1"
    export WORLD_SIZE=8
    export RANK=0
    python3 main.py --model_size=large --per_gpu_train_batch_size=128 --local_rank 0
    

    What's weird is that training works fine on a single GPU if I drop the --local_rank flag. While the process is just hanging, nothing is being dispatched to any of the GPUs:

    $ sudo nvidia-smi
    Sat May 15 21:53:13 2021       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  A100-SXM4-40GB      On   | 00000000:10:1C.0 Off |                    0 |
    | N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   1  A100-SXM4-40GB      On   | 00000000:10:1D.0 Off |                    0 |
    | N/A   47C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   2  A100-SXM4-40GB      On   | 00000000:20:1C.0 Off |                    0 |
    | N/A   49C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   3  A100-SXM4-40GB      On   | 00000000:20:1D.0 Off |                    0 |
    | N/A   45C    P0    55W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   4  A100-SXM4-40GB      On   | 00000000:90:1C.0 Off |                    0 |
    | N/A   50C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   5  A100-SXM4-40GB      On   | 00000000:90:1D.0 Off |                    0 |
    | N/A   45C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   6  A100-SXM4-40GB      On   | 00000000:A0:1C.0 Off |                    0 |
    | N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
    |   7  A100-SXM4-40GB      On   | 00000000:A0:1D.0 Off |                    0 |
    | N/A   48C    P0    57W / 400W |      3MiB / 40537MiB |      0%      Default |
    |                               |                      |             Disabled |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    

    Any ideas?

    opened by pablogranolabar 3
  • Could you please share your parsed data or codes for preprocessing?

    Could you please share your parsed data or codes for preprocessing?

    I noticed that the main.py loaded a ./data/dailydialog/train.h5. I also downloaded the original dailydialog dataset. But I have no idea that how to parse them to train.h5.

    Could you please give me help?

    opened by eefaan 3
  • DataLoader Function

    DataLoader Function

    Hi Xiaodong,

    Thanks for sharing the source code.

    I have a question regarding data_loader function. Is there any reason to create mini-batches by adding the following inputs?

    self.cls_utt = [tokenizer.cls_token_id, tokenizer.cls_token_id, tokenizer.sep_token_id] self.sep_utt = [tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.sep_token_id]

    The resulting output would be like: [[101, 101, 102], [contexts], [101, 102, 102]].

    Best,

    Dong

    opened by dongqian0206 2
  • Data processing

    Data processing

    Thanks for your great work! I'd like to apply your method and model to a brand new dataset, but have no idea about how to preprocess our dataset to the required format. Could you release the data preprocessing script? it'll be of great help!

    opened by TingchenFu 1
  • Model not converging?

    Model not converging?

    Using the standard main.py training loop, I've been training on V100 tiny for almost a week but without it stopping? Is there additional hyperparameter tuning needed even to run the tiny training process?

    python3 main.py --model_size=tiny --per_gpu_train_batch_size=24
    
    avg_len = 12.61646884272997
    bleu = 0.03122757749152926
    meteor = 0.039703799201764936
    nist = 0.12024726693793758
    perplexity = 116.93566131591797
    rouge-L = 0.05778559382996833
    valid_loss = 4.761623978844736
    

    Can you share what your final numbers were after training tiny and small?

    opened by pablogranolabar 1
  • Difficulty replicating results of the paper

    Difficulty replicating results of the paper

    I am training on the DailyDialog dataset with the same hyperparameters as described in the paper. I cannot seem to get the model to perform to the standards described in the paper, specifically the BLEU score for the testing data is half the reported value. In addition, looking at the generated text for the testing dataset shows that the model is generating responses that have little to do with the actual context. Are there any solutions to this?

    opened by anthonycou 3
  • Reproducing results from the paper and hyperparameters

    Reproducing results from the paper and hyperparameters

    Hi,

    I'm trying to reproduce the results you reported in the paper and unable to do so with the set of current hyperparameters. One notable problem is with per_gpu_eval_batch_size=1. Keeping it as is takes a long time to do evaluation, but when I set it to a value > 1, the code breaks. I figured that might have something to do with the generate method of DialogBERT class. Here, for example

    generated = torch.zeros((num_samples,1), dtype=torch.long, device=device).fill_(self.tokenizer.cls_token_id) # [batch_sz x 1] (1=seq_len)

    num_samples is used as batch_sz? I'm wondering if this is intended, or a typo, because when I change num_samples to batch_sz for generated tokens the code works. However when the generated text shapes up, it doesn't seem to match the context it is generated from.

    Could you please share the hyperparameters you used and help solve per_gpu_eval_batch_size=1 problem.

    Thanks

    opened by paul-ruban 5
  • can't load pretrained model

    can't load pretrained model

    self.context_mlm_trans and self.context_order_trans are expecting a different key-structure

    RuntimeError: Error(s) in loading state_dict for BertPredictionHeadTransform: Missing key(s) in state_dict: "dense.weight", "dense.bias", "LayerNorm.weight", "LayerNorm.bias". Unexpected key(s) in state_dict: "utt_encoder.bert.embeddings.position_ids", "utt_encoder.bert.embeddings.word_embeddings.weight", "utt_encoder.bert.embeddings.position_embeddings.weight", "utt_encoder.bert.embeddings.token_type_embeddings.weight", "utt_encoder.bert.embeddings.LayerNorm.weight", "utt_encoder.bert.embeddings.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.attention.self.query.weight", "utt_encoder.bert.encoder.layer.0.attention.self.query.bias", "utt_encoder.bert.encoder.layer.0.attention.self.key.weight", "utt_encoder.bert.encoder.layer.0.attention.self.key.bias", "utt_encoder.bert.encoder.layer.0.attention.self.value.weight", "utt_encoder.bert.encoder.layer.0.attention.self.value.bias", "utt_encoder.bert.encoder.layer.0.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.0.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.0.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.0.output.dense.weight", "utt_encoder.bert.encoder.layer.0.output.dense.bias", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.attention.self.query.weight", "utt_encoder.bert.encoder.layer.1.attention.self.query.bias", "utt_encoder.bert.encoder.layer.1.attention.self.key.weight", "utt_encoder.bert.encoder.layer.1.attention.self.key.bias", "utt_encoder.bert.encoder.layer.1.attention.self.value.weight", "utt_encoder.bert.encoder.layer.1.attention.self.value.bias", "utt_encoder.bert.encoder.layer.1.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.1.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.1.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.1.output.dense.weight", "utt_encoder.bert.encoder.layer.1.output.dense.bias", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.attention.self.query.weight", "utt_encoder.bert.encoder.layer.2.attention.self.query.bias", "utt_encoder.bert.encoder.layer.2.attention.self.key.weight", "utt_encoder.bert.encoder.layer.2.attention.self.key.bias", "utt_encoder.bert.encoder.layer.2.attention.self.value.weight", "utt_encoder.bert.encoder.layer.2.attention.self.value.bias", "utt_encoder.bert.encoder.layer.2.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.2.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.2.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.2.output.dense.weight", "utt_encoder.bert.encoder.layer.2.output.dense.bias", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.attention.self.query.weight", "utt_encoder.bert.encoder.layer.3.attention.self.query.bias", "utt_encoder.bert.encoder.layer.3.attention.self.key.weight", "utt_encoder.bert.encoder.layer.3.attention.self.key.bias", "utt_encoder.bert.encoder.layer.3.attention.self.value.weight", "utt_encoder.bert.encoder.layer.3.attention.self.value.bias", "utt_encoder.bert.encoder.layer.3.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.3.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.3.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.3.output.dense.weight", "utt_encoder.bert.encoder.layer.3.output.dense.bias", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.attention.self.query.weight", "utt_encoder.bert.encoder.layer.4.attention.self.query.bias", "utt_encoder.bert.encoder.layer.4.attention.self.key.weight", "utt_encoder.bert.encoder.layer.4.attention.self.key.bias", "utt_encoder.bert.encoder.layer.4.attention.self.value.weight", "utt_encoder.bert.encoder.layer.4.attention.self.value.bias", "utt_encoder.bert.encoder.layer.4.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.4.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.4.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.4.output.dense.weight", "utt_encoder.bert.encoder.layer.4.output.dense.bias", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.attention.self.query.weight", "utt_encoder.bert.encoder.layer.5.attention.self.query.bias", "utt_encoder.bert.encoder.layer.5.attention.self.key.weight", "utt_encoder.bert.encoder.layer.5.attention.self.key.bias", "utt_encoder.bert.encoder.layer.5.attention.self.value.weight", "utt_encoder.bert.encoder.layer.5.attention.self.value.bias", "utt_encoder.bert.encoder.layer.5.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.5.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.5.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.5.output.dense.weight", "utt_encoder.bert.encoder.layer.5.output.dense.bias", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.attention.self.query.weight", "utt_encoder.bert.encoder.layer.6.attention.self.query.bias", "utt_encoder.bert.encoder.layer.6.attention.self.key.weight", "utt_encoder.bert.encoder.layer.6.attention.self.key.bias", "utt_encoder.bert.encoder.layer.6.attention.self.value.weight", "utt_encoder.bert.encoder.layer.6.attention.self.value.bias", "utt_encoder.bert.encoder.layer.6.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.6.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.6.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.6.output.dense.weight", "utt_encoder.bert.encoder.layer.6.output.dense.bias", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.attention.self.query.weight", "utt_encoder.bert.encoder.layer.7.attention.self.query.bias", "utt_encoder.bert.encoder.layer.7.attention.self.key.weight", "utt_encoder.bert.encoder.layer.7.attention.self.key.bias", "utt_encoder.bert.encoder.layer.7.attention.self.value.weight", "utt_encoder.bert.encoder.layer.7.attention.self.value.bias", "utt_encoder.bert.encoder.layer.7.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.7.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.7.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.7.output.dense.weight", "utt_encoder.bert.encoder.layer.7.output.dense.bias", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.attention.self.query.weight", "utt_encoder.bert.encoder.layer.8.attention.self.query.bias", "utt_encoder.bert.encoder.layer.8.attention.self.key.weight", "utt_encoder.bert.encoder.layer.8.attention.self.key.bias", "utt_encoder.bert.encoder.layer.8.attention.self.value.weight", "utt_encoder.bert.encoder.layer.8.attention.self.value.bias", "utt_encoder.bert.encoder.layer.8.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.8.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.8.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.8.output.dense.weight", "utt_encoder.bert.encoder.layer.8.output.dense.bias", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.attention.self.query.weight", "utt_encoder.bert.encoder.layer.9.attention.self.query.bias", "utt_encoder.bert.encoder.layer.9.attention.self.key.weight", "utt_encoder.bert.encoder.layer.9.attention.self.key.bias", "utt_encoder.bert.encoder.layer.9.attention.self.value.weight", "utt_encoder.bert.encoder.layer.9.attention.self.value.bias", "utt_encoder.bert.encoder.layer.9.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.9.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.9.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.9.output.dense.weight", "utt_encoder.bert.encoder.layer.9.output.dense.bias", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.attention.self.query.weight", "utt_encoder.bert.encoder.layer.10.attention.self.query.bias", "utt_encoder.bert.encoder.layer.10.attention.self.key.weight", "utt_encoder.bert.encoder.layer.10.attention.self.key.bias", "utt_encoder.bert.encoder.layer.10.attention.self.value.weight", "utt_encoder.bert.encoder.layer.10.attention.self.value.bias", "utt_encoder.bert.encoder.layer.10.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.10.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.10.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.10.output.dense.weight", "utt_encoder.bert.encoder.layer.10.output.dense.bias", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.attention.self.query.weight", "utt_encoder.bert.encoder.layer.11.attention.self.query.bias", "utt_encoder.bert.encoder.layer.11.attention.self.key.weight", "utt_encoder.bert.encoder.layer.11.attention.self.key.bias", "utt_encoder.bert.encoder.layer.11.attention.self.value.weight", "utt_encoder.bert.encoder.layer.11.attention.self.value.bias", "utt_encoder.bert.encoder.layer.11.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.11.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.11.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.11.output.dense.weight", "utt_encoder.bert.encoder.layer.11.output.dense.bias", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.bias", "utt_encoder.bert.pooler.dense.weight", "utt_encoder.bert.pooler.dense.bias", "utt_encoder.cls.predictions.bias", "utt_encoder.cls.predictions.transform.dense.weight", "utt_encoder.cls.predictions.transform.dense.bias", "utt_encoder.cls.predictions.transform.LayerNorm.weight", "utt_encoder.cls.predictions.transform.LayerNorm.bias", "utt_encoder.cls.predictions.decoder.weight", "utt_encoder.cls.predictions.decoder.bias", "utt_encoder.cls.seq_relationship.weight", "utt_encoder.cls.seq_relationship.bias", "context_encoder.embeddings.position_ids", "context_encoder.embeddings.word_embeddings.weight", "context_encoder.embeddings.position_embeddings.weight", "context_encoder.embeddings.token_type_embeddings.weight", "context_encoder.embeddings.LayerNorm.weight", "context_encoder.embeddings.LayerNorm.bias", "context_encoder.encoder.layer.0.attention.self.query.weight", "context_encoder.encoder.layer.0.attention.self.query.bias", "context_encoder.encoder.layer.0.attention.self.key.weight", "context_encoder.encoder.layer.0.attention.self.key.bias", "context_encoder.encoder.layer.0.attention.self.value.weight", "context_encoder.encoder.layer.0.attention.self.value.bias", "context_encoder.encoder.layer.0.attention.output.dense.weight", "context_encoder.encoder.layer.0.attention.output.dense.bias", "context_encoder.encoder.layer.0.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.0.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.0.intermediate.dense.weight", "context_encoder.encoder.layer.0.intermediate.dense.bias", "context_encoder.encoder.layer.0.output.dense.weight", "context_encoder.encoder.layer.0.output.dense.bias", "context_encoder.encoder.layer.0.output.LayerNorm.weight", "context_encoder.encoder.layer.0.output.LayerNorm.bias", "context_encoder.encoder.layer.1.attention.self.query.weight", "context_encoder.encoder.layer.1.attention.self.query.bias", "context_encoder.encoder.layer.1.attention.self.key.weight", "context_encoder.encoder.layer.1.attention.self.key.bias", "context_encoder.encoder.layer.1.attention.self.value.weight", "context_encoder.encoder.layer.1.attention.self.value.bias", "context_encoder.encoder.layer.1.attention.output.dense.weight", "context_encoder.encoder.layer.1.attention.output.dense.bias", "context_encoder.encoder.layer.1.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.1.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.1.intermediate.dense.weight", "context_encoder.encoder.layer.1.intermediate.dense.bias", "context_encoder.encoder.layer.1.output.dense.weight", "context_encoder.encoder.layer.1.output.dense.bias", "context_encoder.encoder.layer.1.output.LayerNorm.weight", "context_encoder.encoder.layer.1.output.LayerNorm.bias", "context_encoder.encoder.layer.2.attention.self.query.weight", "context_encoder.encoder.layer.2.attention.self.query.bias", "context_encoder.encoder.layer.2.attention.self.key.weight", "context_encoder.encoder.layer.2.attention.self.key.bias", "context_encoder.encoder.layer.2.attention.self.value.weight", "context_encoder.encoder.layer.2.attention.self.value.bias", "context_encoder.encoder.layer.2.attention.output.dense.weight", "context_encoder.encoder.layer.2.attention.output.dense.bias", "context_encoder.encoder.layer.2.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.2.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.2.intermediate.dense.weight", "context_encoder.encoder.layer.2.intermediate.dense.bias", "context_encoder.encoder.layer.2.output.dense.weight", "context_encoder.encoder.layer.2.output.dense.bias", "context_encoder.encoder.layer.2.output.LayerNorm.weight", "context_encoder.encoder.layer.2.output.LayerNorm.bias", "context_encoder.encoder.layer.3.attention.self.query.weight", "context_encoder.encoder.layer.3.attention.self.query.bias", "context_encoder.encoder.layer.3.attention.self.key.weight", "context_encoder.encoder.layer.3.attention.self.key.bias", "context_encoder.encoder.layer.3.attention.self.value.weight", "context_encoder.encoder.layer.3.attention.self.value.bias", "context_encoder.encoder.layer.3.attention.output.dense.weight", "context_encoder.encoder.layer.3.attention.output.dense.bias", "context_encoder.encoder.layer.3.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.3.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.3.intermediate.dense.weight", "context_encoder.encoder.layer.3.intermediate.dense.bias", "context_encoder.encoder.layer.3.output.dense.weight", "context_encoder.encoder.layer.3.output.dense.bias", "context_encoder.encoder.layer.3.output.LayerNorm.weight", "context_encoder.encoder.layer.3.output.LayerNorm.bias", "context_encoder.encoder.layer.4.attention.self.query.weight", "context_encoder.encoder.layer.4.attention.self.query.bias", "context_encoder.encoder.layer.4.attention.self.key.weight", "context_encoder.encoder.layer.4.attention.self.key.bias", "context_encoder.encoder.layer.4.attention.self.value.weight", "context_encoder.encoder.layer.4.attention.self.value.bias", "context_encoder.encoder.layer.4.attention.output.dense.weight", "context_encoder.encoder.layer.4.attention.output.dense.bias", "context_encoder.encoder.layer.4.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.4.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.4.intermediate.dense.weight", "context_encoder.encoder.layer.4.intermediate.dense.bias", "context_encoder.encoder.layer.4.output.dense.weight", "context_encoder.encoder.layer.4.output.dense.bias", "context_encoder.encoder.layer.4.output.LayerNorm.weight", "context_encoder.encoder.layer.4.output.LayerNorm.bias", "context_encoder.encoder.layer.5.attention.self.query.weight", "context_encoder.encoder.layer.5.attention.self.query.bias", "context_encoder.encoder.layer.5.attention.self.key.weight", "context_encoder.encoder.layer.5.attention.self.key.bias", "context_encoder.encoder.layer.5.attention.self.value.weight", "context_encoder.encoder.layer.5.attention.self.value.bias", "context_encoder.encoder.layer.5.attention.output.dense.weight", "context_encoder.encoder.layer.5.attention.output.dense.bias", "context_encoder.encoder.layer.5.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.5.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.5.intermediate.dense.weight", "context_encoder.encoder.layer.5.intermediate.dense.bias", "context_encoder.encoder.layer.5.output.dense.weight", "context_encoder.encoder.layer.5.output.dense.bias", "context_encoder.encoder.layer.5.output.LayerNorm.weight", "context_encoder.encoder.layer.5.output.LayerNorm.bias", "context_encoder.encoder.layer.6.attention.self.query.weight", "context_encoder.encoder.layer.6.attention.self.query.bias", "context_encoder.encoder.layer.6.attention.self.key.weight", "context_encoder.encoder.layer.6.attention.self.key.bias", "context_encoder.encoder.layer.6.attention.self.value.weight", "context_encoder.encoder.layer.6.attention.self.value.bias", "context_encoder.encoder.layer.6.attention.output.dense.weight", "context_encoder.encoder.layer.6.attention.output.dense.bias", "context_encoder.encoder.layer.6.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.6.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.6.intermediate.dense.weight", "context_encoder.encoder.layer.6.intermediate.dense.bias", "context_encoder.encoder.layer.6.output.dense.weight", "context_encoder.encoder.layer.6.output.dense.bias", "context_encoder.encoder.layer.6.output.LayerNorm.weight", "context_encoder.encoder.layer.6.output.LayerNorm.bias", "context_encoder.encoder.layer.7.attention.self.query.weight", "context_encoder.encoder.layer.7.attention.self.query.bias", "context_encoder.encoder.layer.7.attention.self.key.weight", "context_encoder.encoder.layer.7.attention.self.key.bias", "context_encoder.encoder.layer.7.attention.self.value.weight", "context_encoder.encoder.layer.7.attention.self.value.bias", "context_encoder.encoder.layer.7.attention.output.dense.weight", "context_encoder.encoder.layer.7.attention.output.dense.bias", "context_encoder.encoder.layer.7.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.7.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.7.intermediate.dense.weight", "context_encoder.encoder.layer.7.intermediate.dense.bias", "context_encoder.encoder.layer.7.output.dense.weight", "context_encoder.encoder.layer.7.output.dense.bias", "context_encoder.encoder.layer.7.output.LayerNorm.weight", "context_encoder.encoder.layer.7.output.LayerNorm.bias", "context_encoder.encoder.layer.8.attention.self.query.weight", "context_encoder.encoder.layer.8.attention.self.query.bias", "context_encoder.encoder.layer.8.attention.self.key.weight", "context_encoder.encoder.layer.8.attention.self.key.bias", "context_encoder.encoder.layer.8.attention.self.value.weight", "context_encoder.encoder.layer.8.attention.self.value.bias", "context_encoder.encoder.layer.8.attention.output.dense.weight", "context_encoder.encoder.layer.8.attention.output.dense.bias", "context_encoder.encoder.layer.8.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.8.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.8.intermediate.dense.weight", "context_encoder.encoder.layer.8.intermediate.dense.bias", "context_encoder.encoder.layer.8.output.dense.weight", "context_encoder.encoder.layer.8.output.dense.bias", "context_encoder.encoder.layer.8.output.LayerNorm.weight", "context_encoder.encoder.layer.8.output.LayerNorm.bias", "context_encoder.encoder.layer.9.attention.self.query.weight", "context_encoder.encoder.layer.9.attention.self.query.bias", "context_encoder.encoder.layer.9.attention.self.key.weight", "context_encoder.encoder.layer.9.attention.self.key.bias", "context_encoder.encoder.layer.9.attention.self.value.weight", "context_encoder.encoder.layer.9.attention.self.value.bias", "context_encoder.encoder.layer.9.attention.output.dense.weight", "context_encoder.encoder.layer.9.attention.output.dense.bias", "context_encoder.encoder.layer.9.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.9.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.9.intermediate.dense.weight", "context_encoder.encoder.layer.9.intermediate.dense.bias", "context_encoder.encoder.layer.9.output.dense.weight", "context_encoder.encoder.layer.9.output.dense.bias", "context_encoder.encoder.layer.9.output.LayerNorm.weight", "context_encoder.encoder.layer.9.output.LayerNorm.bias", "context_encoder.encoder.layer.10.attention.self.query.weight", "context_encoder.encoder.layer.10.attention.self.query.bias", "context_encoder.encoder.layer.10.attention.self.key.weight", "context_encoder.encoder.layer.10.attention.self.key.bias", "context_encoder.encoder.layer.10.attention.self.value.weight", "context_encoder.encoder.layer.10.attention.self.value.bias", "context_encoder.encoder.layer.10.attention.output.dense.weight", "context_encoder.encoder.layer.10.attention.output.dense.bias", "context_encoder.encoder.layer.10.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.10.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.10.intermediate.dense.weight", "context_encoder.encoder.layer.10.intermediate.dense.bias", "context_encoder.encoder.layer.10.output.dense.weight", "context_encoder.encoder.layer.10.output.dense.bias", "context_encoder.encoder.layer.10.output.LayerNorm.weight", "context_encoder.encoder.layer.10.output.LayerNorm.bias", "context_encoder.encoder.layer.11.attention.self.query.weight", "context_encoder.encoder.layer.11.attention.self.query.bias", "context_encoder.encoder.layer.11.attention.self.key.weight", "context_encoder.encoder.layer.11.attention.self.key.bias", "context_encoder.encoder.layer.11.attention.self.value.weight", "context_encoder.encoder.layer.11.attention.self.value.bias", "context_encoder.encoder.layer.11.attention.output.dense.weight", "context_encoder.encoder.layer.11.attention.output.dense.bias", "context_encoder.encoder.layer.11.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.11.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.11.intermediate.dense.weight", "context_encoder.encoder.layer.11.intermediate.dense.bias", "context_encoder.encoder.layer.11.output.dense.weight", "context_encoder.encoder.layer.11.output.dense.bias", "context_encoder.encoder.layer.11.output.LayerNorm.weight", "context_encoder.encoder.layer.11.output.LayerNorm.bias", "context_encoder.pooler.dense.weight", "context_encoder.pooler.dense.bias", "context_mlm_trans.dense.weight", "context_mlm_trans.dense.bias", "context_mlm_trans.LayerNorm.weight", "context_mlm_trans.LayerNorm.bias", "context_order_trans.linear_in.weight", "decoder.bert.embeddings.position_ids", "decoder.bert.embeddings.word_embeddings.weight", "decoder.bert.embeddings.position_embeddings.weight", "decoder.bert.embeddings.token_type_embeddings.weight", "decoder.bert.embeddings.LayerNorm.weight", "decoder.bert.embeddings.LayerNorm.bias", "decoder.bert.encoder.layer.0.attention.self.query.weight", "decoder.bert.encoder.layer.0.attention.self.query.bias", "decoder.bert.encoder.layer.0.attention.self.key.weight", "decoder.bert.encoder.layer.0.attention.self.key.bias", "decoder.bert.encoder.layer.0.attention.self.value.weight", "decoder.bert.encoder.layer.0.attention.self.value.bias", "decoder.bert.encoder.layer.0.attention.output.dense.weight", "decoder.bert.encoder.layer.0.attention.output.dense.bias", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.crossattention.self.query.weight", "decoder.bert.encoder.layer.0.crossattention.self.query.bias", "decoder.bert.encoder.layer.0.crossattention.self.key.weight", "decoder.bert.encoder.layer.0.crossattention.self.key.bias", "decoder.bert.encoder.layer.0.crossattention.self.value.weight", "decoder.bert.encoder.layer.0.crossattention.self.value.bias", "decoder.bert.encoder.layer.0.crossattention.output.dense.weight", "decoder.bert.encoder.layer.0.crossattention.output.dense.bias", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.intermediate.dense.weight", "decoder.bert.encoder.layer.0.intermediate.dense.bias", "decoder.bert.encoder.layer.0.output.dense.weight", "decoder.bert.encoder.layer.0.output.dense.bias", "decoder.bert.encoder.layer.0.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.attention.self.query.weight", "decoder.bert.encoder.layer.1.attention.self.query.bias", "decoder.bert.encoder.layer.1.attention.self.key.weight", "decoder.bert.encoder.layer.1.attention.self.key.bias", "decoder.bert.encoder.layer.1.attention.self.value.weight", "decoder.bert.encoder.layer.1.attention.self.value.bias", "decoder.bert.encoder.layer.1.attention.output.dense.weight", "decoder.bert.encoder.layer.1.attention.output.dense.bias", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.crossattention.self.query.weight", "decoder.bert.encoder.layer.1.crossattention.self.query.bias", "decoder.bert.encoder.layer.1.crossattention.self.key.weight", "decoder.bert.encoder.layer.1.crossattention.self.key.bias", "decoder.bert.encoder.layer.1.crossattention.self.value.weight", "decoder.bert.encoder.layer.1.crossattention.self.value.bias", "decoder.bert.encoder.layer.1.crossattention.output.dense.weight", "decoder.bert.encoder.layer.1.crossattention.output.dense.bias", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.intermediate.dense.weight", "decoder.bert.encoder.layer.1.intermediate.dense.bias", "decoder.bert.encoder.layer.1.output.dense.weight", "decoder.bert.encoder.layer.1.output.dense.bias", "decoder.bert.encoder.layer.1.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.attention.self.query.weight", "decoder.bert.encoder.layer.2.attention.self.query.bias", "decoder.bert.encoder.layer.2.attention.self.key.weight", "decoder.bert.encoder.layer.2.attention.self.key.bias", "decoder.bert.encoder.layer.2.attention.self.value.weight", "decoder.bert.encoder.layer.2.attention.self.value.bias", "decoder.bert.encoder.layer.2.attention.output.dense.weight", "decoder.bert.encoder.layer.2.attention.output.dense.bias", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.crossattention.self.query.weight", "decoder.bert.encoder.layer.2.crossattention.self.query.bias", "decoder.bert.encoder.layer.2.crossattention.self.key.weight", "decoder.bert.encoder.layer.2.crossattention.self.key.bias", "decoder.bert.encoder.layer.2.crossattention.self.value.weight", "decoder.bert.encoder.layer.2.crossattention.self.value.bias", "decoder.bert.encoder.layer.2.crossattention.output.dense.weight", "decoder.bert.encoder.layer.2.crossattention.output.dense.bias", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.intermediate.dense.weight", "decoder.bert.encoder.layer.2.intermediate.dense.bias", "decoder.bert.encoder.layer.2.output.dense.weight", "decoder.bert.encoder.layer.2.output.dense.bias", "decoder.bert.encoder.layer.2.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.attention.self.query.weight", "decoder.bert.encoder.layer.3.attention.self.query.bias", "decoder.bert.encoder.layer.3.attention.self.key.weight", "decoder.bert.encoder.layer.3.attention.self.key.bias", "decoder.bert.encoder.layer.3.attention.self.value.weight", "decoder.bert.encoder.layer.3.attention.self.value.bias", "decoder.bert.encoder.layer.3.attention.output.dense.weight", "decoder.bert.encoder.layer.3.attention.output.dense.bias", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.crossattention.self.query.weight", "decoder.bert.encoder.layer.3.crossattention.self.query.bias", "decoder.bert.encoder.layer.3.crossattention.self.key.weight", "decoder.bert.encoder.layer.3.crossattention.self.key.bias", "decoder.bert.encoder.layer.3.crossattention.self.value.weight", "decoder.bert.encoder.layer.3.crossattention.self.value.bias", "decoder.bert.encoder.layer.3.crossattention.output.dense.weight", "decoder.bert.encoder.layer.3.crossattention.output.dense.bias", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.intermediate.dense.weight", "decoder.bert.encoder.layer.3.intermediate.dense.bias", "decoder.bert.encoder.layer.3.output.dense.weight", "decoder.bert.encoder.layer.3.output.dense.bias", "decoder.bert.encoder.layer.3.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.attention.self.query.weight", "decoder.bert.encoder.layer.4.attention.self.query.bias", "decoder.bert.encoder.layer.4.attention.self.key.weight", "decoder.bert.encoder.layer.4.attention.self.key.bias", "decoder.bert.encoder.layer.4.attention.self.value.weight", "decoder.bert.encoder.layer.4.attention.self.value.bias", "decoder.bert.encoder.layer.4.attention.output.dense.weight", "decoder.bert.encoder.layer.4.attention.output.dense.bias", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.crossattention.self.query.weight", "decoder.bert.encoder.layer.4.crossattention.self.query.bias", "decoder.bert.encoder.layer.4.crossattention.self.key.weight", "decoder.bert.encoder.layer.4.crossattention.self.key.bias", "decoder.bert.encoder.layer.4.crossattention.self.value.weight", "decoder.bert.encoder.layer.4.crossattention.self.value.bias", "decoder.bert.encoder.layer.4.crossattention.output.dense.weight", "decoder.bert.encoder.layer.4.crossattention.output.dense.bias", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.intermediate.dense.weight", "decoder.bert.encoder.layer.4.intermediate.dense.bias", "decoder.bert.encoder.layer.4.output.dense.weight", "decoder.bert.encoder.layer.4.output.dense.bias", "decoder.bert.encoder.layer.4.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.attention.self.query.weight", "decoder.bert.encoder.layer.5.attention.self.query.bias", "decoder.bert.encoder.layer.5.attention.self.key.weight", "decoder.bert.encoder.layer.5.attention.self.key.bias", "decoder.bert.encoder.layer.5.attention.self.value.weight", "decoder.bert.encoder.layer.5.attention.self.value.bias", "decoder.bert.encoder.layer.5.attention.output.dense.weight", "decoder.bert.encoder.layer.5.attention.output.dense.bias", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.crossattention.self.query.weight", "decoder.bert.encoder.layer.5.crossattention.self.query.bias", "decoder.bert.encoder.layer.5.crossattention.self.key.weight", "decoder.bert.encoder.layer.5.crossattention.self.key.bias", "decoder.bert.encoder.layer.5.crossattention.self.value.weight", "decoder.bert.encoder.layer.5.crossattention.self.value.bias", "decoder.bert.encoder.layer.5.crossattention.output.dense.weight", "decoder.bert.encoder.layer.5.crossattention.output.dense.bias", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.intermediate.dense.weight", "decoder.bert.encoder.layer.5.intermediate.dense.bias", "decoder.bert.encoder.layer.5.output.dense.weight", "decoder.bert.encoder.layer.5.output.dense.bias", "decoder.bert.encoder.layer.5.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.attention.self.query.weight", "decoder.bert.encoder.layer.6.attention.self.query.bias", "decoder.bert.encoder.layer.6.attention.self.key.weight", "decoder.bert.encoder.layer.6.attention.self.key.bias", "decoder.bert.encoder.layer.6.attention.self.value.weight", "decoder.bert.encoder.layer.6.attention.self.value.bias", "decoder.bert.encoder.layer.6.attention.output.dense.weight", "decoder.bert.encoder.layer.6.attention.output.dense.bias", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.crossattention.self.query.weight", "decoder.bert.encoder.layer.6.crossattention.self.query.bias", "decoder.bert.encoder.layer.6.crossattention.self.key.weight", "decoder.bert.encoder.layer.6.crossattention.self.key.bias", "decoder.bert.encoder.layer.6.crossattention.self.value.weight", "decoder.bert.encoder.layer.6.crossattention.self.value.bias", "decoder.bert.encoder.layer.6.crossattention.output.dense.weight", "decoder.bert.encoder.layer.6.crossattention.output.dense.bias", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.intermediate.dense.weight", "decoder.bert.encoder.layer.6.intermediate.dense.bias", "decoder.bert.encoder.layer.6.output.dense.weight", "decoder.bert.encoder.layer.6.output.dense.bias", "decoder.bert.encoder.layer.6.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.attention.self.query.weight", "decoder.bert.encoder.layer.7.attention.self.query.bias", "decoder.bert.encoder.layer.7.attention.self.key.weight", "decoder.bert.encoder.layer.7.attention.self.key.bias", "decoder.bert.encoder.layer.7.attention.self.value.weight", "decoder.bert.encoder.layer.7.attention.self.value.bias", "decoder.bert.encoder.layer.7.attention.output.dense.weight", "decoder.bert.encoder.layer.7.attention.output.dense.bias", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.crossattention.self.query.weight", "decoder.bert.encoder.layer.7.crossattention.self.query.bias", "decoder.bert.encoder.layer.7.crossattention.self.key.weight", "decoder.bert.encoder.layer.7.crossattention.self.key.bias", "decoder.bert.encoder.layer.7.crossattention.self.value.weight", "decoder.bert.encoder.layer.7.crossattention.self.value.bias", "decoder.bert.encoder.layer.7.crossattention.output.dense.weight", "decoder.bert.encoder.layer.7.crossattention.output.dense.bias", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.intermediate.dense.weight", "decoder.bert.encoder.layer.7.intermediate.dense.bias", "decoder.bert.encoder.layer.7.output.dense.weight", "decoder.bert.encoder.layer.7.output.dense.bias", "decoder.bert.encoder.layer.7.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.attention.self.query.weight", "decoder.bert.encoder.layer.8.attention.self.query.bias", "decoder.bert.encoder.layer.8.attention.self.key.weight", "decoder.bert.encoder.layer.8.attention.self.key.bias", "decoder.bert.encoder.layer.8.attention.self.value.weight", "decoder.bert.encoder.layer.8.attention.self.value.bias", "decoder.bert.encoder.layer.8.attention.output.dense.weight", "decoder.bert.encoder.layer.8.attention.output.dense.bias", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.crossattention.self.query.weight", "decoder.bert.encoder.layer.8.crossattention.self.query.bias", "decoder.bert.encoder.layer.8.crossattention.self.key.weight", "decoder.bert.encoder.layer.8.crossattention.self.key.bias", "decoder.bert.encoder.layer.8.crossattention.self.value.weight", "decoder.bert.encoder.layer.8.crossattention.self.value.bias", "decoder.bert.encoder.layer.8.crossattention.output.dense.weight", "decoder.bert.encoder.layer.8.crossattention.output.dense.bias", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.intermediate.dense.weight", "decoder.bert.encoder.layer.8.intermediate.dense.bias", "decoder.bert.encoder.layer.8.output.dense.weight", "decoder.bert.encoder.layer.8.output.dense.bias", "decoder.bert.encoder.layer.8.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.attention.self.query.weight", "decoder.bert.encoder.layer.9.attention.self.query.bias", "decoder.bert.encoder.layer.9.attention.self.key.weight", "decoder.bert.encoder.layer.9.attention.self.key.bias", "decoder.bert.encoder.layer.9.attention.self.value.weight", "decoder.bert.encoder.layer.9.attention.self.value.bias", "decoder.bert.encoder.layer.9.attention.output.dense.weight", "decoder.bert.encoder.layer.9.attention.output.dense.bias", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.crossattention.self.query.weight", "decoder.bert.encoder.layer.9.crossattention.self.query.bias", "decoder.bert.encoder.layer.9.crossattention.self.key.weight", "decoder.bert.encoder.layer.9.crossattention.self.key.bias", "decoder.bert.encoder.layer.9.crossattention.self.value.weight", "decoder.bert.encoder.layer.9.crossattention.self.value.bias", "decoder.bert.encoder.layer.9.crossattention.output.dense.weight", "decoder.bert.encoder.layer.9.crossattention.output.dense.bias", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.intermediate.dense.weight", "decoder.bert.encoder.layer.9.intermediate.dense.bias", "decoder.bert.encoder.layer.9.output.dense.weight", "decoder.bert.encoder.layer.9.output.dense.bias", "decoder.bert.encoder.layer.9.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.attention.self.query.weight", "decoder.bert.encoder.layer.10.attention.self.query.bias", "decoder.bert.encoder.layer.10.attention.self.key.weight", "decoder.bert.encoder.layer.10.attention.self.key.bias", "decoder.bert.encoder.layer.10.attention.self.value.weight", "decoder.bert.encoder.layer.10.attention.self.value.bias", "decoder.bert.encoder.layer.10.attention.output.dense.weight", "decoder.bert.encoder.layer.10.attention.output.dense.bias", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.crossattention.self.query.weight", "decoder.bert.encoder.layer.10.crossattention.self.query.bias", "decoder.bert.encoder.layer.10.crossattention.self.key.weight", "decoder.bert.encoder.layer.10.crossattention.self.key.bias", "decoder.bert.encoder.layer.10.crossattention.self.value.weight", "decoder.bert.encoder.layer.10.crossattention.self.value.bias", "decoder.bert.encoder.layer.10.crossattention.output.dense.weight", "decoder.bert.encoder.layer.10.crossattention.output.dense.bias", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.intermediate.dense.weight", "decoder.bert.encoder.layer.10.intermediate.dense.bias", "decoder.bert.encoder.layer.10.output.dense.weight", "decoder.bert.encoder.layer.10.output.dense.bias", "decoder.bert.encoder.layer.10.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.attention.self.query.weight", "decoder.bert.encoder.layer.11.attention.self.query.bias", "decoder.bert.encoder.layer.11.attention.self.key.weight", "decoder.bert.encoder.layer.11.attention.self.key.bias", "decoder.bert.encoder.layer.11.attention.self.value.weight", "decoder.bert.encoder.layer.11.attention.self.value.bias", "decoder.bert.encoder.layer.11.attention.output.dense.weight", "decoder.bert.encoder.layer.11.attention.output.dense.bias", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.crossattention.self.query.weight", "decoder.bert.encoder.layer.11.crossattention.self.query.bias", "decoder.bert.encoder.layer.11.crossattention.self.key.weight", "decoder.bert.encoder.layer.11.crossattention.self.key.bias", "decoder.bert.encoder.layer.11.crossattention.self.value.weight", "decoder.bert.encoder.layer.11.crossattention.self.value.bias", "decoder.bert.encoder.layer.11.crossattention.output.dense.weight", "decoder.bert.encoder.layer.11.crossattention.output.dense.bias", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.intermediate.dense.weight", "decoder.bert.encoder.layer.11.intermediate.dense.bias", "decoder.bert.encoder.layer.11.output.dense.weight", "decoder.bert.encoder.layer.11.output.dense.bias", "decoder.bert.encoder.layer.11.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.output.LayerNorm.bias", "decoder.bert.pooler.dense.weight", "decoder.bert.pooler.dense.bias", "decoder.cls.predictions.bias", "decoder.cls.predictions.transform.dense.weight", "decoder.cls.predictions.transform.dense.bias", "decoder.cls.predictions.transform.LayerNorm.weight", "decoder.cls.predictions.transform.LayerNorm.bias", "decoder.cls.predictions.decoder.weight", "decoder.cls.predictions.decoder.bias".

    opened by rokosbasilisk 2
  • test error

    test error

    def load(self, args):
        # Load a trained model and vocabulary that you have fine-tuned
        assert args.reload_from>=0, "please specify the checkpoint iteration in args.reload_from"
        output_dir = os.path.join(f"./output/{args.model}/{args.model_size}/models/", f'checkpoint-{args.reload_from}')
        self.model = DialogBERT.from_pretrained(output_dir)
        self.model.to(args.device)
    
    def from_pretrained(self, model_dir):
        self.encoder_config = BertConfig.from_pretrained(model_dir)
        self.tokenizer = BertTokenizer.from_pretrained(path.join(model_dir, 'tokenizer'), do_lower_case=True)
        self.utt_encoder = BertForPreTraining.from_pretrained(path.join(model_dir, 'utt_encoder'))
        self.context_encoder = BertForSequenceClassification.from_pretrained(path.join(model_dir, 'context_encoder'))
        self.context_mlm_trans = BertPredictionHeadTransform(self.encoder_config)
        self.context_mlm_trans.load_state_dict(torch.load(path.join(model_dir, 'context_mlm_trans.pkl')),strict= False)
        self.context_order_trans = SelfSorting(self.encoder_config.hidden_size)
        self.context_order_trans.load_state_dict(torch.load(path.join(model_dir, 'context_order_trans.pkl')), strict= False)
        self.decoder_config = BertConfig.from_pretrained(model_dir)
        self.decoder = BertLMHeadModel.from_pretrained(path.join(model_dir, 'decoder'))
    

    File "D:\NLP\DialogBERT-master\solvers.py", line 77, in load self.model.to(args.device) AttributeError: 'NoneType' object has no attribute 'to' DialogBERT.from_pretrained is none ,how can i solve it?

    opened by ztx313 10
  • DialogBERT methods of context?

    DialogBERT methods of context?

    Hi again,

    I am curious about what methods the paper authors used for context with DialogBERT development? Did you use context prepending of input tokens for that? And how many conversational turns for context were used to obtain the DialogBERT research paper results?

    Thanks in advance

    opened by pablogranolabar 0
Owner
Xiaodong Gu
Xiaodong Gu
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta

Raymond 247 Dec 28, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Structure-Aware-BART This repo contains codes for the following paper: Jiaao Chen, Diyi Yang:Structure-Aware Abstractive Conversation Summarization vi

GT-SALT 56 Dec 8, 2022
Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Sicheng 19 Dec 7, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

null 458 Jan 2, 2023
source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

International Business Machines 71 Nov 15, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Emma Rocheteau 76 Dec 22, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Meta-Solver for Neural Ordinary Differential Equations Towards robust neural ODEs using parametrized solvers. Main idea Each Runge-Kutta (RK) solver w

Julia Gusak 25 Aug 12, 2021
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Zhengxia Zou 56 Jul 15, 2022
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022