Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

self.context_mlm_trans and self.context_order_trans are expecting a different key-structure

RuntimeError: Error(s) in loading state_dict for BertPredictionHeadTransform: Missing key(s) in state_dict: "dense.weight", "dense.bias", "LayerNorm.weight", "LayerNorm.bias". Unexpected key(s) in state_dict: "utt_encoder.bert.embeddings.position_ids", "utt_encoder.bert.embeddings.word_embeddings.weight", "utt_encoder.bert.embeddings.position_embeddings.weight", "utt_encoder.bert.embeddings.token_type_embeddings.weight", "utt_encoder.bert.embeddings.LayerNorm.weight", "utt_encoder.bert.embeddings.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.attention.self.query.weight", "utt_encoder.bert.encoder.layer.0.attention.self.query.bias", "utt_encoder.bert.encoder.layer.0.attention.self.key.weight", "utt_encoder.bert.encoder.layer.0.attention.self.key.bias", "utt_encoder.bert.encoder.layer.0.attention.self.value.weight", "utt_encoder.bert.encoder.layer.0.attention.self.value.bias", "utt_encoder.bert.encoder.layer.0.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.0.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.0.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.0.output.dense.weight", "utt_encoder.bert.encoder.layer.0.output.dense.bias", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.attention.self.query.weight", "utt_encoder.bert.encoder.layer.1.attention.self.query.bias", "utt_encoder.bert.encoder.layer.1.attention.self.key.weight", "utt_encoder.bert.encoder.layer.1.attention.self.key.bias", "utt_encoder.bert.encoder.layer.1.attention.self.value.weight", "utt_encoder.bert.encoder.layer.1.attention.self.value.bias", "utt_encoder.bert.encoder.layer.1.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.1.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.1.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.1.output.dense.weight", "utt_encoder.bert.encoder.layer.1.output.dense.bias", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.attention.self.query.weight", "utt_encoder.bert.encoder.layer.2.attention.self.query.bias", "utt_encoder.bert.encoder.layer.2.attention.self.key.weight", "utt_encoder.bert.encoder.layer.2.attention.self.key.bias", "utt_encoder.bert.encoder.layer.2.attention.self.value.weight", "utt_encoder.bert.encoder.layer.2.attention.self.value.bias", "utt_encoder.bert.encoder.layer.2.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.2.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.2.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.2.output.dense.weight", "utt_encoder.bert.encoder.layer.2.output.dense.bias", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.attention.self.query.weight", "utt_encoder.bert.encoder.layer.3.attention.self.query.bias", "utt_encoder.bert.encoder.layer.3.attention.self.key.weight", "utt_encoder.bert.encoder.layer.3.attention.self.key.bias", "utt_encoder.bert.encoder.layer.3.attention.self.value.weight", "utt_encoder.bert.encoder.layer.3.attention.self.value.bias", "utt_encoder.bert.encoder.layer.3.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.3.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.3.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.3.output.dense.weight", "utt_encoder.bert.encoder.layer.3.output.dense.bias", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.attention.self.query.weight", "utt_encoder.bert.encoder.layer.4.attention.self.query.bias", "utt_encoder.bert.encoder.layer.4.attention.self.key.weight", "utt_encoder.bert.encoder.layer.4.attention.self.key.bias", "utt_encoder.bert.encoder.layer.4.attention.self.value.weight", "utt_encoder.bert.encoder.layer.4.attention.self.value.bias", "utt_encoder.bert.encoder.layer.4.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.4.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.4.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.4.output.dense.weight", "utt_encoder.bert.encoder.layer.4.output.dense.bias", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.attention.self.query.weight", "utt_encoder.bert.encoder.layer.5.attention.self.query.bias", "utt_encoder.bert.encoder.layer.5.attention.self.key.weight", "utt_encoder.bert.encoder.layer.5.attention.self.key.bias", "utt_encoder.bert.encoder.layer.5.attention.self.value.weight", "utt_encoder.bert.encoder.layer.5.attention.self.value.bias", "utt_encoder.bert.encoder.layer.5.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.5.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.5.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.5.output.dense.weight", "utt_encoder.bert.encoder.layer.5.output.dense.bias", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.attention.self.query.weight", "utt_encoder.bert.encoder.layer.6.attention.self.query.bias", "utt_encoder.bert.encoder.layer.6.attention.self.key.weight", "utt_encoder.bert.encoder.layer.6.attention.self.key.bias", "utt_encoder.bert.encoder.layer.6.attention.self.value.weight", "utt_encoder.bert.encoder.layer.6.attention.self.value.bias", "utt_encoder.bert.encoder.layer.6.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.6.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.6.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.6.output.dense.weight", "utt_encoder.bert.encoder.layer.6.output.dense.bias", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.attention.self.query.weight", "utt_encoder.bert.encoder.layer.7.attention.self.query.bias", "utt_encoder.bert.encoder.layer.7.attention.self.key.weight", "utt_encoder.bert.encoder.layer.7.attention.self.key.bias", "utt_encoder.bert.encoder.layer.7.attention.self.value.weight", "utt_encoder.bert.encoder.layer.7.attention.self.value.bias", "utt_encoder.bert.encoder.layer.7.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.7.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.7.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.7.output.dense.weight", "utt_encoder.bert.encoder.layer.7.output.dense.bias", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.attention.self.query.weight", "utt_encoder.bert.encoder.layer.8.attention.self.query.bias", "utt_encoder.bert.encoder.layer.8.attention.self.key.weight", "utt_encoder.bert.encoder.layer.8.attention.self.key.bias", "utt_encoder.bert.encoder.layer.8.attention.self.value.weight", "utt_encoder.bert.encoder.layer.8.attention.self.value.bias", "utt_encoder.bert.encoder.layer.8.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.8.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.8.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.8.output.dense.weight", "utt_encoder.bert.encoder.layer.8.output.dense.bias", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.attention.self.query.weight", "utt_encoder.bert.encoder.layer.9.attention.self.query.bias", "utt_encoder.bert.encoder.layer.9.attention.self.key.weight", "utt_encoder.bert.encoder.layer.9.attention.self.key.bias", "utt_encoder.bert.encoder.layer.9.attention.self.value.weight", "utt_encoder.bert.encoder.layer.9.attention.self.value.bias", "utt_encoder.bert.encoder.layer.9.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.9.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.9.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.9.output.dense.weight", "utt_encoder.bert.encoder.layer.9.output.dense.bias", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.attention.self.query.weight", "utt_encoder.bert.encoder.layer.10.attention.self.query.bias", "utt_encoder.bert.encoder.layer.10.attention.self.key.weight", "utt_encoder.bert.encoder.layer.10.attention.self.key.bias", "utt_encoder.bert.encoder.layer.10.attention.self.value.weight", "utt_encoder.bert.encoder.layer.10.attention.self.value.bias", "utt_encoder.bert.encoder.layer.10.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.10.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.10.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.10.output.dense.weight", "utt_encoder.bert.encoder.layer.10.output.dense.bias", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.attention.self.query.weight", "utt_encoder.bert.encoder.layer.11.attention.self.query.bias", "utt_encoder.bert.encoder.layer.11.attention.self.key.weight", "utt_encoder.bert.encoder.layer.11.attention.self.key.bias", "utt_encoder.bert.encoder.layer.11.attention.self.value.weight", "utt_encoder.bert.encoder.layer.11.attention.self.value.bias", "utt_encoder.bert.encoder.layer.11.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.11.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.11.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.11.output.dense.weight", "utt_encoder.bert.encoder.layer.11.output.dense.bias", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.bias", "utt_encoder.bert.pooler.dense.weight", "utt_encoder.bert.pooler.dense.bias", "utt_encoder.cls.predictions.bias", "utt_encoder.cls.predictions.transform.dense.weight", "utt_encoder.cls.predictions.transform.dense.bias", "utt_encoder.cls.predictions.transform.LayerNorm.weight", "utt_encoder.cls.predictions.transform.LayerNorm.bias", "utt_encoder.cls.predictions.decoder.weight", "utt_encoder.cls.predictions.decoder.bias", "utt_encoder.cls.seq_relationship.weight", "utt_encoder.cls.seq_relationship.bias", "context_encoder.embeddings.position_ids", "context_encoder.embeddings.word_embeddings.weight", "context_encoder.embeddings.position_embeddings.weight", "context_encoder.embeddings.token_type_embeddings.weight", "context_encoder.embeddings.LayerNorm.weight", "context_encoder.embeddings.LayerNorm.bias", "context_encoder.encoder.layer.0.attention.self.query.weight", "context_encoder.encoder.layer.0.attention.self.query.bias", "context_encoder.encoder.layer.0.attention.self.key.weight", "context_encoder.encoder.layer.0.attention.self.key.bias", "context_encoder.encoder.layer.0.attention.self.value.weight", "context_encoder.encoder.layer.0.attention.self.value.bias", "context_encoder.encoder.layer.0.attention.output.dense.weight", "context_encoder.encoder.layer.0.attention.output.dense.bias", "context_encoder.encoder.layer.0.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.0.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.0.intermediate.dense.weight", "context_encoder.encoder.layer.0.intermediate.dense.bias", "context_encoder.encoder.layer.0.output.dense.weight", "context_encoder.encoder.layer.0.output.dense.bias", "context_encoder.encoder.layer.0.output.LayerNorm.weight", "context_encoder.encoder.layer.0.output.LayerNorm.bias", "context_encoder.encoder.layer.1.attention.self.query.weight", "context_encoder.encoder.layer.1.attention.self.query.bias", "context_encoder.encoder.layer.1.attention.self.key.weight", "context_encoder.encoder.layer.1.attention.self.key.bias", "context_encoder.encoder.layer.1.attention.self.value.weight", "context_encoder.encoder.layer.1.attention.self.value.bias", "context_encoder.encoder.layer.1.attention.output.dense.weight", "context_encoder.encoder.layer.1.attention.output.dense.bias", "context_encoder.encoder.layer.1.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.1.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.1.intermediate.dense.weight", "context_encoder.encoder.layer.1.intermediate.dense.bias", "context_encoder.encoder.layer.1.output.dense.weight", "context_encoder.encoder.layer.1.output.dense.bias", "context_encoder.encoder.layer.1.output.LayerNorm.weight", "context_encoder.encoder.layer.1.output.LayerNorm.bias", "context_encoder.encoder.layer.2.attention.self.query.weight", "context_encoder.encoder.layer.2.attention.self.query.bias", "context_encoder.encoder.layer.2.attention.self.key.weight", "context_encoder.encoder.layer.2.attention.self.key.bias", "context_encoder.encoder.layer.2.attention.self.value.weight", "context_encoder.encoder.layer.2.attention.self.value.bias", "context_encoder.encoder.layer.2.attention.output.dense.weight", "context_encoder.encoder.layer.2.attention.output.dense.bias", "context_encoder.encoder.layer.2.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.2.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.2.intermediate.dense.weight", "context_encoder.encoder.layer.2.intermediate.dense.bias", "context_encoder.encoder.layer.2.output.dense.weight", "context_encoder.encoder.layer.2.output.dense.bias", "context_encoder.encoder.layer.2.output.LayerNorm.weight", "context_encoder.encoder.layer.2.output.LayerNorm.bias", "context_encoder.encoder.layer.3.attention.self.query.weight", "context_encoder.encoder.layer.3.attention.self.query.bias", "context_encoder.encoder.layer.3.attention.self.key.weight", "context_encoder.encoder.layer.3.attention.self.key.bias", "context_encoder.encoder.layer.3.attention.self.value.weight", "context_encoder.encoder.layer.3.attention.self.value.bias", "context_encoder.encoder.layer.3.attention.output.dense.weight", "context_encoder.encoder.layer.3.attention.output.dense.bias", "context_encoder.encoder.layer.3.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.3.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.3.intermediate.dense.weight", "context_encoder.encoder.layer.3.intermediate.dense.bias", "context_encoder.encoder.layer.3.output.dense.weight", "context_encoder.encoder.layer.3.output.dense.bias", "context_encoder.encoder.layer.3.output.LayerNorm.weight", "context_encoder.encoder.layer.3.output.LayerNorm.bias", "context_encoder.encoder.layer.4.attention.self.query.weight", "context_encoder.encoder.layer.4.attention.self.query.bias", "context_encoder.encoder.layer.4.attention.self.key.weight", "context_encoder.encoder.layer.4.attention.self.key.bias", "context_encoder.encoder.layer.4.attention.self.value.weight", "context_encoder.encoder.layer.4.attention.self.value.bias", "context_encoder.encoder.layer.4.attention.output.dense.weight", "context_encoder.encoder.layer.4.attention.output.dense.bias", "context_encoder.encoder.layer.4.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.4.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.4.intermediate.dense.weight", "context_encoder.encoder.layer.4.intermediate.dense.bias", "context_encoder.encoder.layer.4.output.dense.weight", "context_encoder.encoder.layer.4.output.dense.bias", "context_encoder.encoder.layer.4.output.LayerNorm.weight", "context_encoder.encoder.layer.4.output.LayerNorm.bias", "context_encoder.encoder.layer.5.attention.self.query.weight", "context_encoder.encoder.layer.5.attention.self.query.bias", "context_encoder.encoder.layer.5.attention.self.key.weight", "context_encoder.encoder.layer.5.attention.self.key.bias", "context_encoder.encoder.layer.5.attention.self.value.weight", "context_encoder.encoder.layer.5.attention.self.value.bias", "context_encoder.encoder.layer.5.attention.output.dense.weight", "context_encoder.encoder.layer.5.attention.output.dense.bias", "context_encoder.encoder.layer.5.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.5.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.5.intermediate.dense.weight", "context_encoder.encoder.layer.5.intermediate.dense.bias", "context_encoder.encoder.layer.5.output.dense.weight", "context_encoder.encoder.layer.5.output.dense.bias", "context_encoder.encoder.layer.5.output.LayerNorm.weight", "context_encoder.encoder.layer.5.output.LayerNorm.bias", "context_encoder.encoder.layer.6.attention.self.query.weight", "context_encoder.encoder.layer.6.attention.self.query.bias", "context_encoder.encoder.layer.6.attention.self.key.weight", "context_encoder.encoder.layer.6.attention.self.key.bias", "context_encoder.encoder.layer.6.attention.self.value.weight", "context_encoder.encoder.layer.6.attention.self.value.bias", "context_encoder.encoder.layer.6.attention.output.dense.weight", "context_encoder.encoder.layer.6.attention.output.dense.bias", "context_encoder.encoder.layer.6.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.6.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.6.intermediate.dense.weight", "context_encoder.encoder.layer.6.intermediate.dense.bias", "context_encoder.encoder.layer.6.output.dense.weight", "context_encoder.encoder.layer.6.output.dense.bias", "context_encoder.encoder.layer.6.output.LayerNorm.weight", "context_encoder.encoder.layer.6.output.LayerNorm.bias", "context_encoder.encoder.layer.7.attention.self.query.weight", "context_encoder.encoder.layer.7.attention.self.query.bias", "context_encoder.encoder.layer.7.attention.self.key.weight", "context_encoder.encoder.layer.7.attention.self.key.bias", "context_encoder.encoder.layer.7.attention.self.value.weight", "context_encoder.encoder.layer.7.attention.self.value.bias", "context_encoder.encoder.layer.7.attention.output.dense.weight", "context_encoder.encoder.layer.7.attention.output.dense.bias", "context_encoder.encoder.layer.7.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.7.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.7.intermediate.dense.weight", "context_encoder.encoder.layer.7.intermediate.dense.bias", "context_encoder.encoder.layer.7.output.dense.weight", "context_encoder.encoder.layer.7.output.dense.bias", "context_encoder.encoder.layer.7.output.LayerNorm.weight", "context_encoder.encoder.layer.7.output.LayerNorm.bias", "context_encoder.encoder.layer.8.attention.self.query.weight", "context_encoder.encoder.layer.8.attention.self.query.bias", "context_encoder.encoder.layer.8.attention.self.key.weight", "context_encoder.encoder.layer.8.attention.self.key.bias", "context_encoder.encoder.layer.8.attention.self.value.weight", "context_encoder.encoder.layer.8.attention.self.value.bias", "context_encoder.encoder.layer.8.attention.output.dense.weight", "context_encoder.encoder.layer.8.attention.output.dense.bias", "context_encoder.encoder.layer.8.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.8.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.8.intermediate.dense.weight", "context_encoder.encoder.layer.8.intermediate.dense.bias", "context_encoder.encoder.layer.8.output.dense.weight", "context_encoder.encoder.layer.8.output.dense.bias", "context_encoder.encoder.layer.8.output.LayerNorm.weight", "context_encoder.encoder.layer.8.output.LayerNorm.bias", "context_encoder.encoder.layer.9.attention.self.query.weight", "context_encoder.encoder.layer.9.attention.self.query.bias", "context_encoder.encoder.layer.9.attention.self.key.weight", "context_encoder.encoder.layer.9.attention.self.key.bias", "context_encoder.encoder.layer.9.attention.self.value.weight", "context_encoder.encoder.layer.9.attention.self.value.bias", "context_encoder.encoder.layer.9.attention.output.dense.weight", "context_encoder.encoder.layer.9.attention.output.dense.bias", "context_encoder.encoder.layer.9.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.9.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.9.intermediate.dense.weight", "context_encoder.encoder.layer.9.intermediate.dense.bias", "context_encoder.encoder.layer.9.output.dense.weight", "context_encoder.encoder.layer.9.output.dense.bias", "context_encoder.encoder.layer.9.output.LayerNorm.weight", "context_encoder.encoder.layer.9.output.LayerNorm.bias", "context_encoder.encoder.layer.10.attention.self.query.weight", "context_encoder.encoder.layer.10.attention.self.query.bias", "context_encoder.encoder.layer.10.attention.self.key.weight", "context_encoder.encoder.layer.10.attention.self.key.bias", "context_encoder.encoder.layer.10.attention.self.value.weight", "context_encoder.encoder.layer.10.attention.self.value.bias", "context_encoder.encoder.layer.10.attention.output.dense.weight", "context_encoder.encoder.layer.10.attention.output.dense.bias", "context_encoder.encoder.layer.10.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.10.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.10.intermediate.dense.weight", "context_encoder.encoder.layer.10.intermediate.dense.bias", "context_encoder.encoder.layer.10.output.dense.weight", "context_encoder.encoder.layer.10.output.dense.bias", "context_encoder.encoder.layer.10.output.LayerNorm.weight", "context_encoder.encoder.layer.10.output.LayerNorm.bias", "context_encoder.encoder.layer.11.attention.self.query.weight", "context_encoder.encoder.layer.11.attention.self.query.bias", "context_encoder.encoder.layer.11.attention.self.key.weight", "context_encoder.encoder.layer.11.attention.self.key.bias", "context_encoder.encoder.layer.11.attention.self.value.weight", "context_encoder.encoder.layer.11.attention.self.value.bias", "context_encoder.encoder.layer.11.attention.output.dense.weight", "context_encoder.encoder.layer.11.attention.output.dense.bias", "context_encoder.encoder.layer.11.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.11.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.11.intermediate.dense.weight", "context_encoder.encoder.layer.11.intermediate.dense.bias", "context_encoder.encoder.layer.11.output.dense.weight", "context_encoder.encoder.layer.11.output.dense.bias", "context_encoder.encoder.layer.11.output.LayerNorm.weight", "context_encoder.encoder.layer.11.output.LayerNorm.bias", "context_encoder.pooler.dense.weight", "context_encoder.pooler.dense.bias", "context_mlm_trans.dense.weight", "context_mlm_trans.dense.bias", "context_mlm_trans.LayerNorm.weight", "context_mlm_trans.LayerNorm.bias", "context_order_trans.linear_in.weight", "decoder.bert.embeddings.position_ids", "decoder.bert.embeddings.word_embeddings.weight", "decoder.bert.embeddings.position_embeddings.weight", "decoder.bert.embeddings.token_type_embeddings.weight", "decoder.bert.embeddings.LayerNorm.weight", "decoder.bert.embeddings.LayerNorm.bias", "decoder.bert.encoder.layer.0.attention.self.query.weight", "decoder.bert.encoder.layer.0.attention.self.query.bias", "decoder.bert.encoder.layer.0.attention.self.key.weight", "decoder.bert.encoder.layer.0.attention.self.key.bias", "decoder.bert.encoder.layer.0.attention.self.value.weight", "decoder.bert.encoder.layer.0.attention.self.value.bias", "decoder.bert.encoder.layer.0.attention.output.dense.weight", "decoder.bert.encoder.layer.0.attention.output.dense.bias", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.crossattention.self.query.weight", "decoder.bert.encoder.layer.0.crossattention.self.query.bias", "decoder.bert.encoder.layer.0.crossattention.self.key.weight", "decoder.bert.encoder.layer.0.crossattention.self.key.bias", "decoder.bert.encoder.layer.0.crossattention.self.value.weight", "decoder.bert.encoder.layer.0.crossattention.self.value.bias", "decoder.bert.encoder.layer.0.crossattention.output.dense.weight", "decoder.bert.encoder.layer.0.crossattention.output.dense.bias", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.intermediate.dense.weight", "decoder.bert.encoder.layer.0.intermediate.dense.bias", "decoder.bert.encoder.layer.0.output.dense.weight", "decoder.bert.encoder.layer.0.output.dense.bias", "decoder.bert.encoder.layer.0.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.attention.self.query.weight", "decoder.bert.encoder.layer.1.attention.self.query.bias", "decoder.bert.encoder.layer.1.attention.self.key.weight", "decoder.bert.encoder.layer.1.attention.self.key.bias", "decoder.bert.encoder.layer.1.attention.self.value.weight", "decoder.bert.encoder.layer.1.attention.self.value.bias", "decoder.bert.encoder.layer.1.attention.output.dense.weight", "decoder.bert.encoder.layer.1.attention.output.dense.bias", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.crossattention.self.query.weight", "decoder.bert.encoder.layer.1.crossattention.self.query.bias", "decoder.bert.encoder.layer.1.crossattention.self.key.weight", "decoder.bert.encoder.layer.1.crossattention.self.key.bias", "decoder.bert.encoder.layer.1.crossattention.self.value.weight", "decoder.bert.encoder.layer.1.crossattention.self.value.bias", "decoder.bert.encoder.layer.1.crossattention.output.dense.weight", "decoder.bert.encoder.layer.1.crossattention.output.dense.bias", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.intermediate.dense.weight", "decoder.bert.encoder.layer.1.intermediate.dense.bias", "decoder.bert.encoder.layer.1.output.dense.weight", "decoder.bert.encoder.layer.1.output.dense.bias", "decoder.bert.encoder.layer.1.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.attention.self.query.weight", "decoder.bert.encoder.layer.2.attention.self.query.bias", "decoder.bert.encoder.layer.2.attention.self.key.weight", "decoder.bert.encoder.layer.2.attention.self.key.bias", "decoder.bert.encoder.layer.2.attention.self.value.weight", "decoder.bert.encoder.layer.2.attention.self.value.bias", "decoder.bert.encoder.layer.2.attention.output.dense.weight", "decoder.bert.encoder.layer.2.attention.output.dense.bias", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.crossattention.self.query.weight", "decoder.bert.encoder.layer.2.crossattention.self.query.bias", "decoder.bert.encoder.layer.2.crossattention.self.key.weight", "decoder.bert.encoder.layer.2.crossattention.self.key.bias", "decoder.bert.encoder.layer.2.crossattention.self.value.weight", "decoder.bert.encoder.layer.2.crossattention.self.value.bias", "decoder.bert.encoder.layer.2.crossattention.output.dense.weight", "decoder.bert.encoder.layer.2.crossattention.output.dense.bias", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.intermediate.dense.weight", "decoder.bert.encoder.layer.2.intermediate.dense.bias", "decoder.bert.encoder.layer.2.output.dense.weight", "decoder.bert.encoder.layer.2.output.dense.bias", "decoder.bert.encoder.layer.2.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.attention.self.query.weight", "decoder.bert.encoder.layer.3.attention.self.query.bias", "decoder.bert.encoder.layer.3.attention.self.key.weight", "decoder.bert.encoder.layer.3.attention.self.key.bias", "decoder.bert.encoder.layer.3.attention.self.value.weight", "decoder.bert.encoder.layer.3.attention.self.value.bias", "decoder.bert.encoder.layer.3.attention.output.dense.weight", "decoder.bert.encoder.layer.3.attention.output.dense.bias", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.crossattention.self.query.weight", "decoder.bert.encoder.layer.3.crossattention.self.query.bias", "decoder.bert.encoder.layer.3.crossattention.self.key.weight", "decoder.bert.encoder.layer.3.crossattention.self.key.bias", "decoder.bert.encoder.layer.3.crossattention.self.value.weight", "decoder.bert.encoder.layer.3.crossattention.self.value.bias", "decoder.bert.encoder.layer.3.crossattention.output.dense.weight", "decoder.bert.encoder.layer.3.crossattention.output.dense.bias", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.intermediate.dense.weight", "decoder.bert.encoder.layer.3.intermediate.dense.bias", "decoder.bert.encoder.layer.3.output.dense.weight", "decoder.bert.encoder.layer.3.output.dense.bias", "decoder.bert.encoder.layer.3.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.attention.self.query.weight", "decoder.bert.encoder.layer.4.attention.self.query.bias", "decoder.bert.encoder.layer.4.attention.self.key.weight", "decoder.bert.encoder.layer.4.attention.self.key.bias", "decoder.bert.encoder.layer.4.attention.self.value.weight", "decoder.bert.encoder.layer.4.attention.self.value.bias", "decoder.bert.encoder.layer.4.attention.output.dense.weight", "decoder.bert.encoder.layer.4.attention.output.dense.bias", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.crossattention.self.query.weight", "decoder.bert.encoder.layer.4.crossattention.self.query.bias", "decoder.bert.encoder.layer.4.crossattention.self.key.weight", "decoder.bert.encoder.layer.4.crossattention.self.key.bias", "decoder.bert.encoder.layer.4.crossattention.self.value.weight", "decoder.bert.encoder.layer.4.crossattention.self.value.bias", "decoder.bert.encoder.layer.4.crossattention.output.dense.weight", "decoder.bert.encoder.layer.4.crossattention.output.dense.bias", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.intermediate.dense.weight", "decoder.bert.encoder.layer.4.intermediate.dense.bias", "decoder.bert.encoder.layer.4.output.dense.weight", "decoder.bert.encoder.layer.4.output.dense.bias", "decoder.bert.encoder.layer.4.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.attention.self.query.weight", "decoder.bert.encoder.layer.5.attention.self.query.bias", "decoder.bert.encoder.layer.5.attention.self.key.weight", "decoder.bert.encoder.layer.5.attention.self.key.bias", "decoder.bert.encoder.layer.5.attention.self.value.weight", "decoder.bert.encoder.layer.5.attention.self.value.bias", "decoder.bert.encoder.layer.5.attention.output.dense.weight", "decoder.bert.encoder.layer.5.attention.output.dense.bias", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.crossattention.self.query.weight", "decoder.bert.encoder.layer.5.crossattention.self.query.bias", "decoder.bert.encoder.layer.5.crossattention.self.key.weight", "decoder.bert.encoder.layer.5.crossattention.self.key.bias", "decoder.bert.encoder.layer.5.crossattention.self.value.weight", "decoder.bert.encoder.layer.5.crossattention.self.value.bias", "decoder.bert.encoder.layer.5.crossattention.output.dense.weight", "decoder.bert.encoder.layer.5.crossattention.output.dense.bias", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.intermediate.dense.weight", "decoder.bert.encoder.layer.5.intermediate.dense.bias", "decoder.bert.encoder.layer.5.output.dense.weight", "decoder.bert.encoder.layer.5.output.dense.bias", "decoder.bert.encoder.layer.5.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.attention.self.query.weight", "decoder.bert.encoder.layer.6.attention.self.query.bias", "decoder.bert.encoder.layer.6.attention.self.key.weight", "decoder.bert.encoder.layer.6.attention.self.key.bias", "decoder.bert.encoder.layer.6.attention.self.value.weight", "decoder.bert.encoder.layer.6.attention.self.value.bias", "decoder.bert.encoder.layer.6.attention.output.dense.weight", "decoder.bert.encoder.layer.6.attention.output.dense.bias", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.crossattention.self.query.weight", "decoder.bert.encoder.layer.6.crossattention.self.query.bias", "decoder.bert.encoder.layer.6.crossattention.self.key.weight", "decoder.bert.encoder.layer.6.crossattention.self.key.bias", "decoder.bert.encoder.layer.6.crossattention.self.value.weight", "decoder.bert.encoder.layer.6.crossattention.self.value.bias", "decoder.bert.encoder.layer.6.crossattention.output.dense.weight", "decoder.bert.encoder.layer.6.crossattention.output.dense.bias", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.intermediate.dense.weight", "decoder.bert.encoder.layer.6.intermediate.dense.bias", "decoder.bert.encoder.layer.6.output.dense.weight", "decoder.bert.encoder.layer.6.output.dense.bias", "decoder.bert.encoder.layer.6.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.attention.self.query.weight", "decoder.bert.encoder.layer.7.attention.self.query.bias", "decoder.bert.encoder.layer.7.attention.self.key.weight", "decoder.bert.encoder.layer.7.attention.self.key.bias", "decoder.bert.encoder.layer.7.attention.self.value.weight", "decoder.bert.encoder.layer.7.attention.self.value.bias", "decoder.bert.encoder.layer.7.attention.output.dense.weight", "decoder.bert.encoder.layer.7.attention.output.dense.bias", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.crossattention.self.query.weight", "decoder.bert.encoder.layer.7.crossattention.self.query.bias", "decoder.bert.encoder.layer.7.crossattention.self.key.weight", "decoder.bert.encoder.layer.7.crossattention.self.key.bias", "decoder.bert.encoder.layer.7.crossattention.self.value.weight", "decoder.bert.encoder.layer.7.crossattention.self.value.bias", "decoder.bert.encoder.layer.7.crossattention.output.dense.weight", "decoder.bert.encoder.layer.7.crossattention.output.dense.bias", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.intermediate.dense.weight", "decoder.bert.encoder.layer.7.intermediate.dense.bias", "decoder.bert.encoder.layer.7.output.dense.weight", "decoder.bert.encoder.layer.7.output.dense.bias", "decoder.bert.encoder.layer.7.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.attention.self.query.weight", "decoder.bert.encoder.layer.8.attention.self.query.bias", "decoder.bert.encoder.layer.8.attention.self.key.weight", "decoder.bert.encoder.layer.8.attention.self.key.bias", "decoder.bert.encoder.layer.8.attention.self.value.weight", "decoder.bert.encoder.layer.8.attention.self.value.bias", "decoder.bert.encoder.layer.8.attention.output.dense.weight", "decoder.bert.encoder.layer.8.attention.output.dense.bias", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.crossattention.self.query.weight", "decoder.bert.encoder.layer.8.crossattention.self.query.bias", "decoder.bert.encoder.layer.8.crossattention.self.key.weight", "decoder.bert.encoder.layer.8.crossattention.self.key.bias", "decoder.bert.encoder.layer.8.crossattention.self.value.weight", "decoder.bert.encoder.layer.8.crossattention.self.value.bias", "decoder.bert.encoder.layer.8.crossattention.output.dense.weight", "decoder.bert.encoder.layer.8.crossattention.output.dense.bias", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.intermediate.dense.weight", "decoder.bert.encoder.layer.8.intermediate.dense.bias", "decoder.bert.encoder.layer.8.output.dense.weight", "decoder.bert.encoder.layer.8.output.dense.bias", "decoder.bert.encoder.layer.8.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.attention.self.query.weight", "decoder.bert.encoder.layer.9.attention.self.query.bias", "decoder.bert.encoder.layer.9.attention.self.key.weight", "decoder.bert.encoder.layer.9.attention.self.key.bias", "decoder.bert.encoder.layer.9.attention.self.value.weight", "decoder.bert.encoder.layer.9.attention.self.value.bias", "decoder.bert.encoder.layer.9.attention.output.dense.weight", "decoder.bert.encoder.layer.9.attention.output.dense.bias", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.crossattention.self.query.weight", "decoder.bert.encoder.layer.9.crossattention.self.query.bias", "decoder.bert.encoder.layer.9.crossattention.self.key.weight", "decoder.bert.encoder.layer.9.crossattention.self.key.bias", "decoder.bert.encoder.layer.9.crossattention.self.value.weight", "decoder.bert.encoder.layer.9.crossattention.self.value.bias", "decoder.bert.encoder.layer.9.crossattention.output.dense.weight", "decoder.bert.encoder.layer.9.crossattention.output.dense.bias", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.intermediate.dense.weight", "decoder.bert.encoder.layer.9.intermediate.dense.bias", "decoder.bert.encoder.layer.9.output.dense.weight", "decoder.bert.encoder.layer.9.output.dense.bias", "decoder.bert.encoder.layer.9.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.attention.self.query.weight", "decoder.bert.encoder.layer.10.attention.self.query.bias", "decoder.bert.encoder.layer.10.attention.self.key.weight", "decoder.bert.encoder.layer.10.attention.self.key.bias", "decoder.bert.encoder.layer.10.attention.self.value.weight", "decoder.bert.encoder.layer.10.attention.self.value.bias", "decoder.bert.encoder.layer.10.attention.output.dense.weight", "decoder.bert.encoder.layer.10.attention.output.dense.bias", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.crossattention.self.query.weight", "decoder.bert.encoder.layer.10.crossattention.self.query.bias", "decoder.bert.encoder.layer.10.crossattention.self.key.weight", "decoder.bert.encoder.layer.10.crossattention.self.key.bias", "decoder.bert.encoder.layer.10.crossattention.self.value.weight", "decoder.bert.encoder.layer.10.crossattention.self.value.bias", "decoder.bert.encoder.layer.10.crossattention.output.dense.weight", "decoder.bert.encoder.layer.10.crossattention.output.dense.bias", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.intermediate.dense.weight", "decoder.bert.encoder.layer.10.intermediate.dense.bias", "decoder.bert.encoder.layer.10.output.dense.weight", "decoder.bert.encoder.layer.10.output.dense.bias", "decoder.bert.encoder.layer.10.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.attention.self.query.weight", "decoder.bert.encoder.layer.11.attention.self.query.bias", "decoder.bert.encoder.layer.11.attention.self.key.weight", "decoder.bert.encoder.layer.11.attention.self.key.bias", "decoder.bert.encoder.layer.11.attention.self.value.weight", "decoder.bert.encoder.layer.11.attention.self.value.bias", "decoder.bert.encoder.layer.11.attention.output.dense.weight", "decoder.bert.encoder.layer.11.attention.output.dense.bias", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.crossattention.self.query.weight", "decoder.bert.encoder.layer.11.crossattention.self.query.bias", "decoder.bert.encoder.layer.11.crossattention.self.key.weight", "decoder.bert.encoder.layer.11.crossattention.self.key.bias", "decoder.bert.encoder.layer.11.crossattention.self.value.weight", "decoder.bert.encoder.layer.11.crossattention.self.value.bias", "decoder.bert.encoder.layer.11.crossattention.output.dense.weight", "decoder.bert.encoder.layer.11.crossattention.output.dense.bias", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.intermediate.dense.weight", "decoder.bert.encoder.layer.11.intermediate.dense.bias", "decoder.bert.encoder.layer.11.output.dense.weight", "decoder.bert.encoder.layer.11.output.dense.bias", "decoder.bert.encoder.layer.11.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.output.LayerNorm.bias", "decoder.bert.pooler.dense.weight", "decoder.bert.pooler.dense.bias", "decoder.cls.predictions.bias", "decoder.cls.predictions.transform.dense.weight", "decoder.cls.predictions.transform.dense.bias", "decoder.cls.predictions.transform.LayerNorm.weight", "decoder.cls.predictions.transform.LayerNorm.bias", "decoder.cls.predictions.decoder.weight", "decoder.cls.predictions.decoder.bias".

Question about model parameter size

I am interested in implementing gradient checkpointing to support DialogBERT-XL training. What would the level of effort be with modifying DialogBERT to support an equivalent parameter size as GPT2-XL?

Thanks in advance!

opened by pablogranolabar 7
Could you please share the script for preprocessing the original dialogues?

Hi, I found the code was refreshed 15 days ago.

I would like to use this model for a brand new dialogue dataset. I noticed that the data/ have h5 files such as dailydialog/train.h5. I also downloaded the original dailydialog dataset, but I do not know how to parse them to be train.h5.

Could you please share related script or source code please? thank you very much.

opened by frankdarkluo 3

Issue with V100 Distributed Training

I have the following distributed training setup working without issue on Tesla K80, but whenever I attempt to do this with an 8X V100 the training process just silently hangs without dispatching any process to any of the GPUs:

export MASTER_PORT=29500
export MASTER_ADDR="127.0.0.1"
export WORLD_SIZE=8
export RANK=0
python3 main.py --model_size=large --per_gpu_train_batch_size=128 --local_rank 0

What's weird is that training works fine on a single GPU if I drop the --local_rank flag. While the process is just hanging, nothing is being dispatched to any of the GPUs:

$ sudo nvidia-smi
Sat May 15 21:53:13 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:10:1C.0 Off |                    0 |
| N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100-SXM4-40GB      On   | 00000000:10:1D.0 Off |                    0 |
| N/A   47C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  A100-SXM4-40GB      On   | 00000000:20:1C.0 Off |                    0 |
| N/A   49C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  A100-SXM4-40GB      On   | 00000000:20:1D.0 Off |                    0 |
| N/A   45C    P0    55W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   4  A100-SXM4-40GB      On   | 00000000:90:1C.0 Off |                    0 |
| N/A   50C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   5  A100-SXM4-40GB      On   | 00000000:90:1D.0 Off |                    0 |
| N/A   45C    P0    56W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   6  A100-SXM4-40GB      On   | 00000000:A0:1C.0 Off |                    0 |
| N/A   52C    P0    60W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   7  A100-SXM4-40GB      On   | 00000000:A0:1D.0 Off |                    0 |
| N/A   48C    P0    57W / 400W |      3MiB / 40537MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any ideas?

opened by pablogranolabar 3

Could you please share your parsed data or codes for preprocessing?

I noticed that the main.py loaded a ./data/dailydialog/train.h5. I also downloaded the original dailydialog dataset. But I have no idea that how to parse them to train.h5.

Could you please give me help?

opened by eefaan 3
DataLoader Function

Hi Xiaodong,

Thanks for sharing the source code.

I have a question regarding data_loader function. Is there any reason to create mini-batches by adding the following inputs?

self.cls_utt = [tokenizer.cls_token_id, tokenizer.cls_token_id, tokenizer.sep_token_id] self.sep_utt = [tokenizer.cls_token_id, tokenizer.sep_token_id, tokenizer.sep_token_id]

The resulting output would be like: [[101, 101, 102], [contexts], [101, 102, 102]].

Best,

Dong

opened by dongqian0206 2
Data processing

Thanks for your great work! I'd like to apply your method and model to a brand new dataset, but have no idea about how to preprocess our dataset to the required format. Could you release the data preprocessing script? it'll be of great help!

opened by TingchenFu 1
Model not converging?
Using the standard main.py training loop, I've been training on V100 tiny for almost a week but without it stopping? Is there additional hyperparameter tuning needed even to run the tiny training process?

python3 main.py --model_size=tiny --per_gpu_train_batch_size=24 avg_len = 12.61646884272997 bleu = 0.03122757749152926 meteor = 0.039703799201764936 nist = 0.12024726693793758 perplexity = 116.93566131591797 rouge-L = 0.05778559382996833 valid_loss = 4.761623978844736

Can you share what your final numbers were after training tiny and small?
opened by pablogranolabar 1
Difficulty replicating results of the paper

I am training on the DailyDialog dataset with the same hyperparameters as described in the paper. I cannot seem to get the model to perform to the standards described in the paper, specifically the BLEU score for the testing data is half the reported value. In addition, looking at the generated text for the testing dataset shows that the model is generating responses that have little to do with the actual context. Are there any solutions to this?

opened by anthonycou 3
Reproducing results from the paper and hyperparameters

Hi,

I'm trying to reproduce the results you reported in the paper and unable to do so with the set of current hyperparameters. One notable problem is with per_gpu_eval_batch_size=1. Keeping it as is takes a long time to do evaluation, but when I set it to a value > 1, the code breaks. I figured that might have something to do with the generate method of DialogBERT class. Here, for example

generated = torch.zeros((num_samples,1), dtype=torch.long, device=device).fill_(self.tokenizer.cls_token_id) # [batch_sz x 1] (1=seq_len)

num_samples is used as batch_sz? I'm wondering if this is intended, or a typo, because when I change num_samples to batch_sz for generated tokens the code works. However when the generated text shapes up, it doesn't seem to match the context it is generated from.

Could you please share the hyperparameters you used and help solve per_gpu_eval_batch_size=1 problem.

Thanks

opened by paul-ruban 5
can't load pretrained model

self.context_mlm_trans and self.context_order_trans are expecting a different key-structure

RuntimeError: Error(s) in loading state_dict for BertPredictionHeadTransform: Missing key(s) in state_dict: "dense.weight", "dense.bias", "LayerNorm.weight", "LayerNorm.bias". Unexpected key(s) in state_dict: "utt_encoder.bert.embeddings.position_ids", "utt_encoder.bert.embeddings.word_embeddings.weight", "utt_encoder.bert.embeddings.position_embeddings.weight", "utt_encoder.bert.embeddings.token_type_embeddings.weight", "utt_encoder.bert.embeddings.LayerNorm.weight", "utt_encoder.bert.embeddings.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.attention.self.query.weight", "utt_encoder.bert.encoder.layer.0.attention.self.query.bias", "utt_encoder.bert.encoder.layer.0.attention.self.key.weight", "utt_encoder.bert.encoder.layer.0.attention.self.key.bias", "utt_encoder.bert.encoder.layer.0.attention.self.value.weight", "utt_encoder.bert.encoder.layer.0.attention.self.value.bias", "utt_encoder.bert.encoder.layer.0.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.0.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.0.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.0.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.0.output.dense.weight", "utt_encoder.bert.encoder.layer.0.output.dense.bias", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.0.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.attention.self.query.weight", "utt_encoder.bert.encoder.layer.1.attention.self.query.bias", "utt_encoder.bert.encoder.layer.1.attention.self.key.weight", "utt_encoder.bert.encoder.layer.1.attention.self.key.bias", "utt_encoder.bert.encoder.layer.1.attention.self.value.weight", "utt_encoder.bert.encoder.layer.1.attention.self.value.bias", "utt_encoder.bert.encoder.layer.1.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.1.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.1.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.1.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.1.output.dense.weight", "utt_encoder.bert.encoder.layer.1.output.dense.bias", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.1.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.attention.self.query.weight", "utt_encoder.bert.encoder.layer.2.attention.self.query.bias", "utt_encoder.bert.encoder.layer.2.attention.self.key.weight", "utt_encoder.bert.encoder.layer.2.attention.self.key.bias", "utt_encoder.bert.encoder.layer.2.attention.self.value.weight", "utt_encoder.bert.encoder.layer.2.attention.self.value.bias", "utt_encoder.bert.encoder.layer.2.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.2.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.2.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.2.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.2.output.dense.weight", "utt_encoder.bert.encoder.layer.2.output.dense.bias", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.2.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.attention.self.query.weight", "utt_encoder.bert.encoder.layer.3.attention.self.query.bias", "utt_encoder.bert.encoder.layer.3.attention.self.key.weight", "utt_encoder.bert.encoder.layer.3.attention.self.key.bias", "utt_encoder.bert.encoder.layer.3.attention.self.value.weight", "utt_encoder.bert.encoder.layer.3.attention.self.value.bias", "utt_encoder.bert.encoder.layer.3.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.3.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.3.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.3.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.3.output.dense.weight", "utt_encoder.bert.encoder.layer.3.output.dense.bias", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.3.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.attention.self.query.weight", "utt_encoder.bert.encoder.layer.4.attention.self.query.bias", "utt_encoder.bert.encoder.layer.4.attention.self.key.weight", "utt_encoder.bert.encoder.layer.4.attention.self.key.bias", "utt_encoder.bert.encoder.layer.4.attention.self.value.weight", "utt_encoder.bert.encoder.layer.4.attention.self.value.bias", "utt_encoder.bert.encoder.layer.4.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.4.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.4.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.4.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.4.output.dense.weight", "utt_encoder.bert.encoder.layer.4.output.dense.bias", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.4.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.attention.self.query.weight", "utt_encoder.bert.encoder.layer.5.attention.self.query.bias", "utt_encoder.bert.encoder.layer.5.attention.self.key.weight", "utt_encoder.bert.encoder.layer.5.attention.self.key.bias", "utt_encoder.bert.encoder.layer.5.attention.self.value.weight", "utt_encoder.bert.encoder.layer.5.attention.self.value.bias", "utt_encoder.bert.encoder.layer.5.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.5.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.5.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.5.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.5.output.dense.weight", "utt_encoder.bert.encoder.layer.5.output.dense.bias", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.5.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.attention.self.query.weight", "utt_encoder.bert.encoder.layer.6.attention.self.query.bias", "utt_encoder.bert.encoder.layer.6.attention.self.key.weight", "utt_encoder.bert.encoder.layer.6.attention.self.key.bias", "utt_encoder.bert.encoder.layer.6.attention.self.value.weight", "utt_encoder.bert.encoder.layer.6.attention.self.value.bias", "utt_encoder.bert.encoder.layer.6.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.6.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.6.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.6.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.6.output.dense.weight", "utt_encoder.bert.encoder.layer.6.output.dense.bias", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.6.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.attention.self.query.weight", "utt_encoder.bert.encoder.layer.7.attention.self.query.bias", "utt_encoder.bert.encoder.layer.7.attention.self.key.weight", "utt_encoder.bert.encoder.layer.7.attention.self.key.bias", "utt_encoder.bert.encoder.layer.7.attention.self.value.weight", "utt_encoder.bert.encoder.layer.7.attention.self.value.bias", "utt_encoder.bert.encoder.layer.7.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.7.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.7.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.7.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.7.output.dense.weight", "utt_encoder.bert.encoder.layer.7.output.dense.bias", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.7.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.attention.self.query.weight", "utt_encoder.bert.encoder.layer.8.attention.self.query.bias", "utt_encoder.bert.encoder.layer.8.attention.self.key.weight", "utt_encoder.bert.encoder.layer.8.attention.self.key.bias", "utt_encoder.bert.encoder.layer.8.attention.self.value.weight", "utt_encoder.bert.encoder.layer.8.attention.self.value.bias", "utt_encoder.bert.encoder.layer.8.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.8.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.8.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.8.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.8.output.dense.weight", "utt_encoder.bert.encoder.layer.8.output.dense.bias", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.8.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.attention.self.query.weight", "utt_encoder.bert.encoder.layer.9.attention.self.query.bias", "utt_encoder.bert.encoder.layer.9.attention.self.key.weight", "utt_encoder.bert.encoder.layer.9.attention.self.key.bias", "utt_encoder.bert.encoder.layer.9.attention.self.value.weight", "utt_encoder.bert.encoder.layer.9.attention.self.value.bias", "utt_encoder.bert.encoder.layer.9.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.9.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.9.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.9.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.9.output.dense.weight", "utt_encoder.bert.encoder.layer.9.output.dense.bias", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.9.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.attention.self.query.weight", "utt_encoder.bert.encoder.layer.10.attention.self.query.bias", "utt_encoder.bert.encoder.layer.10.attention.self.key.weight", "utt_encoder.bert.encoder.layer.10.attention.self.key.bias", "utt_encoder.bert.encoder.layer.10.attention.self.value.weight", "utt_encoder.bert.encoder.layer.10.attention.self.value.bias", "utt_encoder.bert.encoder.layer.10.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.10.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.10.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.10.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.10.output.dense.weight", "utt_encoder.bert.encoder.layer.10.output.dense.bias", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.10.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.attention.self.query.weight", "utt_encoder.bert.encoder.layer.11.attention.self.query.bias", "utt_encoder.bert.encoder.layer.11.attention.self.key.weight", "utt_encoder.bert.encoder.layer.11.attention.self.key.bias", "utt_encoder.bert.encoder.layer.11.attention.self.value.weight", "utt_encoder.bert.encoder.layer.11.attention.self.value.bias", "utt_encoder.bert.encoder.layer.11.attention.output.dense.weight", "utt_encoder.bert.encoder.layer.11.attention.output.dense.bias", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "utt_encoder.bert.encoder.layer.11.intermediate.dense.weight", "utt_encoder.bert.encoder.layer.11.intermediate.dense.bias", "utt_encoder.bert.encoder.layer.11.output.dense.weight", "utt_encoder.bert.encoder.layer.11.output.dense.bias", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.weight", "utt_encoder.bert.encoder.layer.11.output.LayerNorm.bias", "utt_encoder.bert.pooler.dense.weight", "utt_encoder.bert.pooler.dense.bias", "utt_encoder.cls.predictions.bias", "utt_encoder.cls.predictions.transform.dense.weight", "utt_encoder.cls.predictions.transform.dense.bias", "utt_encoder.cls.predictions.transform.LayerNorm.weight", "utt_encoder.cls.predictions.transform.LayerNorm.bias", "utt_encoder.cls.predictions.decoder.weight", "utt_encoder.cls.predictions.decoder.bias", "utt_encoder.cls.seq_relationship.weight", "utt_encoder.cls.seq_relationship.bias", "context_encoder.embeddings.position_ids", "context_encoder.embeddings.word_embeddings.weight", "context_encoder.embeddings.position_embeddings.weight", "context_encoder.embeddings.token_type_embeddings.weight", "context_encoder.embeddings.LayerNorm.weight", "context_encoder.embeddings.LayerNorm.bias", "context_encoder.encoder.layer.0.attention.self.query.weight", "context_encoder.encoder.layer.0.attention.self.query.bias", "context_encoder.encoder.layer.0.attention.self.key.weight", "context_encoder.encoder.layer.0.attention.self.key.bias", "context_encoder.encoder.layer.0.attention.self.value.weight", "context_encoder.encoder.layer.0.attention.self.value.bias", "context_encoder.encoder.layer.0.attention.output.dense.weight", "context_encoder.encoder.layer.0.attention.output.dense.bias", "context_encoder.encoder.layer.0.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.0.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.0.intermediate.dense.weight", "context_encoder.encoder.layer.0.intermediate.dense.bias", "context_encoder.encoder.layer.0.output.dense.weight", "context_encoder.encoder.layer.0.output.dense.bias", "context_encoder.encoder.layer.0.output.LayerNorm.weight", "context_encoder.encoder.layer.0.output.LayerNorm.bias", "context_encoder.encoder.layer.1.attention.self.query.weight", "context_encoder.encoder.layer.1.attention.self.query.bias", "context_encoder.encoder.layer.1.attention.self.key.weight", "context_encoder.encoder.layer.1.attention.self.key.bias", "context_encoder.encoder.layer.1.attention.self.value.weight", "context_encoder.encoder.layer.1.attention.self.value.bias", "context_encoder.encoder.layer.1.attention.output.dense.weight", "context_encoder.encoder.layer.1.attention.output.dense.bias", "context_encoder.encoder.layer.1.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.1.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.1.intermediate.dense.weight", "context_encoder.encoder.layer.1.intermediate.dense.bias", "context_encoder.encoder.layer.1.output.dense.weight", "context_encoder.encoder.layer.1.output.dense.bias", "context_encoder.encoder.layer.1.output.LayerNorm.weight", "context_encoder.encoder.layer.1.output.LayerNorm.bias", "context_encoder.encoder.layer.2.attention.self.query.weight", "context_encoder.encoder.layer.2.attention.self.query.bias", "context_encoder.encoder.layer.2.attention.self.key.weight", "context_encoder.encoder.layer.2.attention.self.key.bias", "context_encoder.encoder.layer.2.attention.self.value.weight", "context_encoder.encoder.layer.2.attention.self.value.bias", "context_encoder.encoder.layer.2.attention.output.dense.weight", "context_encoder.encoder.layer.2.attention.output.dense.bias", "context_encoder.encoder.layer.2.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.2.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.2.intermediate.dense.weight", "context_encoder.encoder.layer.2.intermediate.dense.bias", "context_encoder.encoder.layer.2.output.dense.weight", "context_encoder.encoder.layer.2.output.dense.bias", "context_encoder.encoder.layer.2.output.LayerNorm.weight", "context_encoder.encoder.layer.2.output.LayerNorm.bias", "context_encoder.encoder.layer.3.attention.self.query.weight", "context_encoder.encoder.layer.3.attention.self.query.bias", "context_encoder.encoder.layer.3.attention.self.key.weight", "context_encoder.encoder.layer.3.attention.self.key.bias", "context_encoder.encoder.layer.3.attention.self.value.weight", "context_encoder.encoder.layer.3.attention.self.value.bias", "context_encoder.encoder.layer.3.attention.output.dense.weight", "context_encoder.encoder.layer.3.attention.output.dense.bias", "context_encoder.encoder.layer.3.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.3.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.3.intermediate.dense.weight", "context_encoder.encoder.layer.3.intermediate.dense.bias", "context_encoder.encoder.layer.3.output.dense.weight", "context_encoder.encoder.layer.3.output.dense.bias", "context_encoder.encoder.layer.3.output.LayerNorm.weight", "context_encoder.encoder.layer.3.output.LayerNorm.bias", "context_encoder.encoder.layer.4.attention.self.query.weight", "context_encoder.encoder.layer.4.attention.self.query.bias", "context_encoder.encoder.layer.4.attention.self.key.weight", "context_encoder.encoder.layer.4.attention.self.key.bias", "context_encoder.encoder.layer.4.attention.self.value.weight", "context_encoder.encoder.layer.4.attention.self.value.bias", "context_encoder.encoder.layer.4.attention.output.dense.weight", "context_encoder.encoder.layer.4.attention.output.dense.bias", "context_encoder.encoder.layer.4.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.4.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.4.intermediate.dense.weight", "context_encoder.encoder.layer.4.intermediate.dense.bias", "context_encoder.encoder.layer.4.output.dense.weight", "context_encoder.encoder.layer.4.output.dense.bias", "context_encoder.encoder.layer.4.output.LayerNorm.weight", "context_encoder.encoder.layer.4.output.LayerNorm.bias", "context_encoder.encoder.layer.5.attention.self.query.weight", "context_encoder.encoder.layer.5.attention.self.query.bias", "context_encoder.encoder.layer.5.attention.self.key.weight", "context_encoder.encoder.layer.5.attention.self.key.bias", "context_encoder.encoder.layer.5.attention.self.value.weight", "context_encoder.encoder.layer.5.attention.self.value.bias", "context_encoder.encoder.layer.5.attention.output.dense.weight", "context_encoder.encoder.layer.5.attention.output.dense.bias", "context_encoder.encoder.layer.5.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.5.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.5.intermediate.dense.weight", "context_encoder.encoder.layer.5.intermediate.dense.bias", "context_encoder.encoder.layer.5.output.dense.weight", "context_encoder.encoder.layer.5.output.dense.bias", "context_encoder.encoder.layer.5.output.LayerNorm.weight", "context_encoder.encoder.layer.5.output.LayerNorm.bias", "context_encoder.encoder.layer.6.attention.self.query.weight", "context_encoder.encoder.layer.6.attention.self.query.bias", "context_encoder.encoder.layer.6.attention.self.key.weight", "context_encoder.encoder.layer.6.attention.self.key.bias", "context_encoder.encoder.layer.6.attention.self.value.weight", "context_encoder.encoder.layer.6.attention.self.value.bias", "context_encoder.encoder.layer.6.attention.output.dense.weight", "context_encoder.encoder.layer.6.attention.output.dense.bias", "context_encoder.encoder.layer.6.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.6.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.6.intermediate.dense.weight", "context_encoder.encoder.layer.6.intermediate.dense.bias", "context_encoder.encoder.layer.6.output.dense.weight", "context_encoder.encoder.layer.6.output.dense.bias", "context_encoder.encoder.layer.6.output.LayerNorm.weight", "context_encoder.encoder.layer.6.output.LayerNorm.bias", "context_encoder.encoder.layer.7.attention.self.query.weight", "context_encoder.encoder.layer.7.attention.self.query.bias", "context_encoder.encoder.layer.7.attention.self.key.weight", "context_encoder.encoder.layer.7.attention.self.key.bias", "context_encoder.encoder.layer.7.attention.self.value.weight", "context_encoder.encoder.layer.7.attention.self.value.bias", "context_encoder.encoder.layer.7.attention.output.dense.weight", "context_encoder.encoder.layer.7.attention.output.dense.bias", "context_encoder.encoder.layer.7.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.7.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.7.intermediate.dense.weight", "context_encoder.encoder.layer.7.intermediate.dense.bias", "context_encoder.encoder.layer.7.output.dense.weight", "context_encoder.encoder.layer.7.output.dense.bias", "context_encoder.encoder.layer.7.output.LayerNorm.weight", "context_encoder.encoder.layer.7.output.LayerNorm.bias", "context_encoder.encoder.layer.8.attention.self.query.weight", "context_encoder.encoder.layer.8.attention.self.query.bias", "context_encoder.encoder.layer.8.attention.self.key.weight", "context_encoder.encoder.layer.8.attention.self.key.bias", "context_encoder.encoder.layer.8.attention.self.value.weight", "context_encoder.encoder.layer.8.attention.self.value.bias", "context_encoder.encoder.layer.8.attention.output.dense.weight", "context_encoder.encoder.layer.8.attention.output.dense.bias", "context_encoder.encoder.layer.8.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.8.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.8.intermediate.dense.weight", "context_encoder.encoder.layer.8.intermediate.dense.bias", "context_encoder.encoder.layer.8.output.dense.weight", "context_encoder.encoder.layer.8.output.dense.bias", "context_encoder.encoder.layer.8.output.LayerNorm.weight", "context_encoder.encoder.layer.8.output.LayerNorm.bias", "context_encoder.encoder.layer.9.attention.self.query.weight", "context_encoder.encoder.layer.9.attention.self.query.bias", "context_encoder.encoder.layer.9.attention.self.key.weight", "context_encoder.encoder.layer.9.attention.self.key.bias", "context_encoder.encoder.layer.9.attention.self.value.weight", "context_encoder.encoder.layer.9.attention.self.value.bias", "context_encoder.encoder.layer.9.attention.output.dense.weight", "context_encoder.encoder.layer.9.attention.output.dense.bias", "context_encoder.encoder.layer.9.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.9.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.9.intermediate.dense.weight", "context_encoder.encoder.layer.9.intermediate.dense.bias", "context_encoder.encoder.layer.9.output.dense.weight", "context_encoder.encoder.layer.9.output.dense.bias", "context_encoder.encoder.layer.9.output.LayerNorm.weight", "context_encoder.encoder.layer.9.output.LayerNorm.bias", "context_encoder.encoder.layer.10.attention.self.query.weight", "context_encoder.encoder.layer.10.attention.self.query.bias", "context_encoder.encoder.layer.10.attention.self.key.weight", "context_encoder.encoder.layer.10.attention.self.key.bias", "context_encoder.encoder.layer.10.attention.self.value.weight", "context_encoder.encoder.layer.10.attention.self.value.bias", "context_encoder.encoder.layer.10.attention.output.dense.weight", "context_encoder.encoder.layer.10.attention.output.dense.bias", "context_encoder.encoder.layer.10.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.10.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.10.intermediate.dense.weight", "context_encoder.encoder.layer.10.intermediate.dense.bias", "context_encoder.encoder.layer.10.output.dense.weight", "context_encoder.encoder.layer.10.output.dense.bias", "context_encoder.encoder.layer.10.output.LayerNorm.weight", "context_encoder.encoder.layer.10.output.LayerNorm.bias", "context_encoder.encoder.layer.11.attention.self.query.weight", "context_encoder.encoder.layer.11.attention.self.query.bias", "context_encoder.encoder.layer.11.attention.self.key.weight", "context_encoder.encoder.layer.11.attention.self.key.bias", "context_encoder.encoder.layer.11.attention.self.value.weight", "context_encoder.encoder.layer.11.attention.self.value.bias", "context_encoder.encoder.layer.11.attention.output.dense.weight", "context_encoder.encoder.layer.11.attention.output.dense.bias", "context_encoder.encoder.layer.11.attention.output.LayerNorm.weight", "context_encoder.encoder.layer.11.attention.output.LayerNorm.bias", "context_encoder.encoder.layer.11.intermediate.dense.weight", "context_encoder.encoder.layer.11.intermediate.dense.bias", "context_encoder.encoder.layer.11.output.dense.weight", "context_encoder.encoder.layer.11.output.dense.bias", "context_encoder.encoder.layer.11.output.LayerNorm.weight", "context_encoder.encoder.layer.11.output.LayerNorm.bias", "context_encoder.pooler.dense.weight", "context_encoder.pooler.dense.bias", "context_mlm_trans.dense.weight", "context_mlm_trans.dense.bias", "context_mlm_trans.LayerNorm.weight", "context_mlm_trans.LayerNorm.bias", "context_order_trans.linear_in.weight", "decoder.bert.embeddings.position_ids", "decoder.bert.embeddings.word_embeddings.weight", "decoder.bert.embeddings.position_embeddings.weight", "decoder.bert.embeddings.token_type_embeddings.weight", "decoder.bert.embeddings.LayerNorm.weight", "decoder.bert.embeddings.LayerNorm.bias", "decoder.bert.encoder.layer.0.attention.self.query.weight", "decoder.bert.encoder.layer.0.attention.self.query.bias", "decoder.bert.encoder.layer.0.attention.self.key.weight", "decoder.bert.encoder.layer.0.attention.self.key.bias", "decoder.bert.encoder.layer.0.attention.self.value.weight", "decoder.bert.encoder.layer.0.attention.self.value.bias", "decoder.bert.encoder.layer.0.attention.output.dense.weight", "decoder.bert.encoder.layer.0.attention.output.dense.bias", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.crossattention.self.query.weight", "decoder.bert.encoder.layer.0.crossattention.self.query.bias", "decoder.bert.encoder.layer.0.crossattention.self.key.weight", "decoder.bert.encoder.layer.0.crossattention.self.key.bias", "decoder.bert.encoder.layer.0.crossattention.self.value.weight", "decoder.bert.encoder.layer.0.crossattention.self.value.bias", "decoder.bert.encoder.layer.0.crossattention.output.dense.weight", "decoder.bert.encoder.layer.0.crossattention.output.dense.bias", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.0.intermediate.dense.weight", "decoder.bert.encoder.layer.0.intermediate.dense.bias", "decoder.bert.encoder.layer.0.output.dense.weight", "decoder.bert.encoder.layer.0.output.dense.bias", "decoder.bert.encoder.layer.0.output.LayerNorm.weight", "decoder.bert.encoder.layer.0.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.attention.self.query.weight", "decoder.bert.encoder.layer.1.attention.self.query.bias", "decoder.bert.encoder.layer.1.attention.self.key.weight", "decoder.bert.encoder.layer.1.attention.self.key.bias", "decoder.bert.encoder.layer.1.attention.self.value.weight", "decoder.bert.encoder.layer.1.attention.self.value.bias", "decoder.bert.encoder.layer.1.attention.output.dense.weight", "decoder.bert.encoder.layer.1.attention.output.dense.bias", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.crossattention.self.query.weight", "decoder.bert.encoder.layer.1.crossattention.self.query.bias", "decoder.bert.encoder.layer.1.crossattention.self.key.weight", "decoder.bert.encoder.layer.1.crossattention.self.key.bias", "decoder.bert.encoder.layer.1.crossattention.self.value.weight", "decoder.bert.encoder.layer.1.crossattention.self.value.bias", "decoder.bert.encoder.layer.1.crossattention.output.dense.weight", "decoder.bert.encoder.layer.1.crossattention.output.dense.bias", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.1.intermediate.dense.weight", "decoder.bert.encoder.layer.1.intermediate.dense.bias", "decoder.bert.encoder.layer.1.output.dense.weight", "decoder.bert.encoder.layer.1.output.dense.bias", "decoder.bert.encoder.layer.1.output.LayerNorm.weight", "decoder.bert.encoder.layer.1.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.attention.self.query.weight", "decoder.bert.encoder.layer.2.attention.self.query.bias", "decoder.bert.encoder.layer.2.attention.self.key.weight", "decoder.bert.encoder.layer.2.attention.self.key.bias", "decoder.bert.encoder.layer.2.attention.self.value.weight", "decoder.bert.encoder.layer.2.attention.self.value.bias", "decoder.bert.encoder.layer.2.attention.output.dense.weight", "decoder.bert.encoder.layer.2.attention.output.dense.bias", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.crossattention.self.query.weight", "decoder.bert.encoder.layer.2.crossattention.self.query.bias", "decoder.bert.encoder.layer.2.crossattention.self.key.weight", "decoder.bert.encoder.layer.2.crossattention.self.key.bias", "decoder.bert.encoder.layer.2.crossattention.self.value.weight", "decoder.bert.encoder.layer.2.crossattention.self.value.bias", "decoder.bert.encoder.layer.2.crossattention.output.dense.weight", "decoder.bert.encoder.layer.2.crossattention.output.dense.bias", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.2.intermediate.dense.weight", "decoder.bert.encoder.layer.2.intermediate.dense.bias", "decoder.bert.encoder.layer.2.output.dense.weight", "decoder.bert.encoder.layer.2.output.dense.bias", "decoder.bert.encoder.layer.2.output.LayerNorm.weight", "decoder.bert.encoder.layer.2.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.attention.self.query.weight", "decoder.bert.encoder.layer.3.attention.self.query.bias", "decoder.bert.encoder.layer.3.attention.self.key.weight", "decoder.bert.encoder.layer.3.attention.self.key.bias", "decoder.bert.encoder.layer.3.attention.self.value.weight", "decoder.bert.encoder.layer.3.attention.self.value.bias", "decoder.bert.encoder.layer.3.attention.output.dense.weight", "decoder.bert.encoder.layer.3.attention.output.dense.bias", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.crossattention.self.query.weight", "decoder.bert.encoder.layer.3.crossattention.self.query.bias", "decoder.bert.encoder.layer.3.crossattention.self.key.weight", "decoder.bert.encoder.layer.3.crossattention.self.key.bias", "decoder.bert.encoder.layer.3.crossattention.self.value.weight", "decoder.bert.encoder.layer.3.crossattention.self.value.bias", "decoder.bert.encoder.layer.3.crossattention.output.dense.weight", "decoder.bert.encoder.layer.3.crossattention.output.dense.bias", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.3.intermediate.dense.weight", "decoder.bert.encoder.layer.3.intermediate.dense.bias", "decoder.bert.encoder.layer.3.output.dense.weight", "decoder.bert.encoder.layer.3.output.dense.bias", "decoder.bert.encoder.layer.3.output.LayerNorm.weight", "decoder.bert.encoder.layer.3.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.attention.self.query.weight", "decoder.bert.encoder.layer.4.attention.self.query.bias", "decoder.bert.encoder.layer.4.attention.self.key.weight", "decoder.bert.encoder.layer.4.attention.self.key.bias", "decoder.bert.encoder.layer.4.attention.self.value.weight", "decoder.bert.encoder.layer.4.attention.self.value.bias", "decoder.bert.encoder.layer.4.attention.output.dense.weight", "decoder.bert.encoder.layer.4.attention.output.dense.bias", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.crossattention.self.query.weight", "decoder.bert.encoder.layer.4.crossattention.self.query.bias", "decoder.bert.encoder.layer.4.crossattention.self.key.weight", "decoder.bert.encoder.layer.4.crossattention.self.key.bias", "decoder.bert.encoder.layer.4.crossattention.self.value.weight", "decoder.bert.encoder.layer.4.crossattention.self.value.bias", "decoder.bert.encoder.layer.4.crossattention.output.dense.weight", "decoder.bert.encoder.layer.4.crossattention.output.dense.bias", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.4.intermediate.dense.weight", "decoder.bert.encoder.layer.4.intermediate.dense.bias", "decoder.bert.encoder.layer.4.output.dense.weight", "decoder.bert.encoder.layer.4.output.dense.bias", "decoder.bert.encoder.layer.4.output.LayerNorm.weight", "decoder.bert.encoder.layer.4.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.attention.self.query.weight", "decoder.bert.encoder.layer.5.attention.self.query.bias", "decoder.bert.encoder.layer.5.attention.self.key.weight", "decoder.bert.encoder.layer.5.attention.self.key.bias", "decoder.bert.encoder.layer.5.attention.self.value.weight", "decoder.bert.encoder.layer.5.attention.self.value.bias", "decoder.bert.encoder.layer.5.attention.output.dense.weight", "decoder.bert.encoder.layer.5.attention.output.dense.bias", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.crossattention.self.query.weight", "decoder.bert.encoder.layer.5.crossattention.self.query.bias", "decoder.bert.encoder.layer.5.crossattention.self.key.weight", "decoder.bert.encoder.layer.5.crossattention.self.key.bias", "decoder.bert.encoder.layer.5.crossattention.self.value.weight", "decoder.bert.encoder.layer.5.crossattention.self.value.bias", "decoder.bert.encoder.layer.5.crossattention.output.dense.weight", "decoder.bert.encoder.layer.5.crossattention.output.dense.bias", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.5.intermediate.dense.weight", "decoder.bert.encoder.layer.5.intermediate.dense.bias", "decoder.bert.encoder.layer.5.output.dense.weight", "decoder.bert.encoder.layer.5.output.dense.bias", "decoder.bert.encoder.layer.5.output.LayerNorm.weight", "decoder.bert.encoder.layer.5.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.attention.self.query.weight", "decoder.bert.encoder.layer.6.attention.self.query.bias", "decoder.bert.encoder.layer.6.attention.self.key.weight", "decoder.bert.encoder.layer.6.attention.self.key.bias", "decoder.bert.encoder.layer.6.attention.self.value.weight", "decoder.bert.encoder.layer.6.attention.self.value.bias", "decoder.bert.encoder.layer.6.attention.output.dense.weight", "decoder.bert.encoder.layer.6.attention.output.dense.bias", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.crossattention.self.query.weight", "decoder.bert.encoder.layer.6.crossattention.self.query.bias", "decoder.bert.encoder.layer.6.crossattention.self.key.weight", "decoder.bert.encoder.layer.6.crossattention.self.key.bias", "decoder.bert.encoder.layer.6.crossattention.self.value.weight", "decoder.bert.encoder.layer.6.crossattention.self.value.bias", "decoder.bert.encoder.layer.6.crossattention.output.dense.weight", "decoder.bert.encoder.layer.6.crossattention.output.dense.bias", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.6.intermediate.dense.weight", "decoder.bert.encoder.layer.6.intermediate.dense.bias", "decoder.bert.encoder.layer.6.output.dense.weight", "decoder.bert.encoder.layer.6.output.dense.bias", "decoder.bert.encoder.layer.6.output.LayerNorm.weight", "decoder.bert.encoder.layer.6.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.attention.self.query.weight", "decoder.bert.encoder.layer.7.attention.self.query.bias", "decoder.bert.encoder.layer.7.attention.self.key.weight", "decoder.bert.encoder.layer.7.attention.self.key.bias", "decoder.bert.encoder.layer.7.attention.self.value.weight", "decoder.bert.encoder.layer.7.attention.self.value.bias", "decoder.bert.encoder.layer.7.attention.output.dense.weight", "decoder.bert.encoder.layer.7.attention.output.dense.bias", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.crossattention.self.query.weight", "decoder.bert.encoder.layer.7.crossattention.self.query.bias", "decoder.bert.encoder.layer.7.crossattention.self.key.weight", "decoder.bert.encoder.layer.7.crossattention.self.key.bias", "decoder.bert.encoder.layer.7.crossattention.self.value.weight", "decoder.bert.encoder.layer.7.crossattention.self.value.bias", "decoder.bert.encoder.layer.7.crossattention.output.dense.weight", "decoder.bert.encoder.layer.7.crossattention.output.dense.bias", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.7.intermediate.dense.weight", "decoder.bert.encoder.layer.7.intermediate.dense.bias", "decoder.bert.encoder.layer.7.output.dense.weight", "decoder.bert.encoder.layer.7.output.dense.bias", "decoder.bert.encoder.layer.7.output.LayerNorm.weight", "decoder.bert.encoder.layer.7.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.attention.self.query.weight", "decoder.bert.encoder.layer.8.attention.self.query.bias", "decoder.bert.encoder.layer.8.attention.self.key.weight", "decoder.bert.encoder.layer.8.attention.self.key.bias", "decoder.bert.encoder.layer.8.attention.self.value.weight", "decoder.bert.encoder.layer.8.attention.self.value.bias", "decoder.bert.encoder.layer.8.attention.output.dense.weight", "decoder.bert.encoder.layer.8.attention.output.dense.bias", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.crossattention.self.query.weight", "decoder.bert.encoder.layer.8.crossattention.self.query.bias", "decoder.bert.encoder.layer.8.crossattention.self.key.weight", "decoder.bert.encoder.layer.8.crossattention.self.key.bias", "decoder.bert.encoder.layer.8.crossattention.self.value.weight", "decoder.bert.encoder.layer.8.crossattention.self.value.bias", "decoder.bert.encoder.layer.8.crossattention.output.dense.weight", "decoder.bert.encoder.layer.8.crossattention.output.dense.bias", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.8.intermediate.dense.weight", "decoder.bert.encoder.layer.8.intermediate.dense.bias", "decoder.bert.encoder.layer.8.output.dense.weight", "decoder.bert.encoder.layer.8.output.dense.bias", "decoder.bert.encoder.layer.8.output.LayerNorm.weight", "decoder.bert.encoder.layer.8.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.attention.self.query.weight", "decoder.bert.encoder.layer.9.attention.self.query.bias", "decoder.bert.encoder.layer.9.attention.self.key.weight", "decoder.bert.encoder.layer.9.attention.self.key.bias", "decoder.bert.encoder.layer.9.attention.self.value.weight", "decoder.bert.encoder.layer.9.attention.self.value.bias", "decoder.bert.encoder.layer.9.attention.output.dense.weight", "decoder.bert.encoder.layer.9.attention.output.dense.bias", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.crossattention.self.query.weight", "decoder.bert.encoder.layer.9.crossattention.self.query.bias", "decoder.bert.encoder.layer.9.crossattention.self.key.weight", "decoder.bert.encoder.layer.9.crossattention.self.key.bias", "decoder.bert.encoder.layer.9.crossattention.self.value.weight", "decoder.bert.encoder.layer.9.crossattention.self.value.bias", "decoder.bert.encoder.layer.9.crossattention.output.dense.weight", "decoder.bert.encoder.layer.9.crossattention.output.dense.bias", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.9.intermediate.dense.weight", "decoder.bert.encoder.layer.9.intermediate.dense.bias", "decoder.bert.encoder.layer.9.output.dense.weight", "decoder.bert.encoder.layer.9.output.dense.bias", "decoder.bert.encoder.layer.9.output.LayerNorm.weight", "decoder.bert.encoder.layer.9.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.attention.self.query.weight", "decoder.bert.encoder.layer.10.attention.self.query.bias", "decoder.bert.encoder.layer.10.attention.self.key.weight", "decoder.bert.encoder.layer.10.attention.self.key.bias", "decoder.bert.encoder.layer.10.attention.self.value.weight", "decoder.bert.encoder.layer.10.attention.self.value.bias", "decoder.bert.encoder.layer.10.attention.output.dense.weight", "decoder.bert.encoder.layer.10.attention.output.dense.bias", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.crossattention.self.query.weight", "decoder.bert.encoder.layer.10.crossattention.self.query.bias", "decoder.bert.encoder.layer.10.crossattention.self.key.weight", "decoder.bert.encoder.layer.10.crossattention.self.key.bias", "decoder.bert.encoder.layer.10.crossattention.self.value.weight", "decoder.bert.encoder.layer.10.crossattention.self.value.bias", "decoder.bert.encoder.layer.10.crossattention.output.dense.weight", "decoder.bert.encoder.layer.10.crossattention.output.dense.bias", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.10.intermediate.dense.weight", "decoder.bert.encoder.layer.10.intermediate.dense.bias", "decoder.bert.encoder.layer.10.output.dense.weight", "decoder.bert.encoder.layer.10.output.dense.bias", "decoder.bert.encoder.layer.10.output.LayerNorm.weight", "decoder.bert.encoder.layer.10.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.attention.self.query.weight", "decoder.bert.encoder.layer.11.attention.self.query.bias", "decoder.bert.encoder.layer.11.attention.self.key.weight", "decoder.bert.encoder.layer.11.attention.self.key.bias", "decoder.bert.encoder.layer.11.attention.self.value.weight", "decoder.bert.encoder.layer.11.attention.self.value.bias", "decoder.bert.encoder.layer.11.attention.output.dense.weight", "decoder.bert.encoder.layer.11.attention.output.dense.bias", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.attention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.crossattention.self.query.weight", "decoder.bert.encoder.layer.11.crossattention.self.query.bias", "decoder.bert.encoder.layer.11.crossattention.self.key.weight", "decoder.bert.encoder.layer.11.crossattention.self.key.bias", "decoder.bert.encoder.layer.11.crossattention.self.value.weight", "decoder.bert.encoder.layer.11.crossattention.self.value.bias", "decoder.bert.encoder.layer.11.crossattention.output.dense.weight", "decoder.bert.encoder.layer.11.crossattention.output.dense.bias", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.crossattention.output.LayerNorm.bias", "decoder.bert.encoder.layer.11.intermediate.dense.weight", "decoder.bert.encoder.layer.11.intermediate.dense.bias", "decoder.bert.encoder.layer.11.output.dense.weight", "decoder.bert.encoder.layer.11.output.dense.bias", "decoder.bert.encoder.layer.11.output.LayerNorm.weight", "decoder.bert.encoder.layer.11.output.LayerNorm.bias", "decoder.bert.pooler.dense.weight", "decoder.bert.pooler.dense.bias", "decoder.cls.predictions.bias", "decoder.cls.predictions.transform.dense.weight", "decoder.cls.predictions.transform.dense.bias", "decoder.cls.predictions.transform.LayerNorm.weight", "decoder.cls.predictions.transform.LayerNorm.bias", "decoder.cls.predictions.decoder.weight", "decoder.cls.predictions.decoder.bias".

opened by rokosbasilisk 2

test error

def load(self, args):
    # Load a trained model and vocabulary that you have fine-tuned
    assert args.reload_from>=0, "please specify the checkpoint iteration in args.reload_from"
    output_dir = os.path.join(f"./output/{args.model}/{args.model_size}/models/", f'checkpoint-{args.reload_from}')
    self.model = DialogBERT.from_pretrained(output_dir)
    self.model.to(args.device)

def from_pretrained(self, model_dir):
    self.encoder_config = BertConfig.from_pretrained(model_dir)
    self.tokenizer = BertTokenizer.from_pretrained(path.join(model_dir, 'tokenizer'), do_lower_case=True)
    self.utt_encoder = BertForPreTraining.from_pretrained(path.join(model_dir, 'utt_encoder'))
    self.context_encoder = BertForSequenceClassification.from_pretrained(path.join(model_dir, 'context_encoder'))
    self.context_mlm_trans = BertPredictionHeadTransform(self.encoder_config)
    self.context_mlm_trans.load_state_dict(torch.load(path.join(model_dir, 'context_mlm_trans.pkl')),strict= False)
    self.context_order_trans = SelfSorting(self.encoder_config.hidden_size)
    self.context_order_trans.load_state_dict(torch.load(path.join(model_dir, 'context_order_trans.pkl')), strict= False)
    self.decoder_config = BertConfig.from_pretrained(model_dir)
    self.decoder = BertLMHeadModel.from_pretrained(path.join(model_dir, 'decoder'))

File "D:\NLP\DialogBERT-master\solvers.py", line 77, in load self.model.to(args.device) AttributeError: 'NoneType' object has no attribute 'to' DialogBERT.from_pretrained is none ,how can i solve it?

opened by ztx313 10

DialogBERT methods of context?

Hi again,

I am curious about what methods the paper authors used for context with DialogBERT development? Did you use context prepending of input tokens for that? And how many conversational turns for context were used to obtain the DialogBERT research paper results?

Thanks in advance

opened by pablogranolabar 0

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Related tags

Overview

DialogBERT

Prerequisites

Usage

References

Comments

Owner

Xiaodong Gu

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775