Google and Stanford University released a new pre-trained model called ELECTRA

Overview

中文说明 | English



GitHub

谷歌与斯坦福大学共同研发的最新预训练模型ELECTRA因其小巧的模型体积以及良好的模型性能受到了广泛关注。 为了进一步促进中文预训练模型技术的研究与发展,哈工大讯飞联合实验室基于官方ELECTRA训练代码以及大规模的中文数据训练出中文ELECTRA预训练模型供大家下载使用。 其中ELECTRA-small模型可与BERT-base甚至其他同等规模的模型相媲美,而参数量仅为BERT-base的1/10。

本项目基于谷歌&斯坦福大学官方的ELECTRA:https://github.com/google-research/electra

其他相关资源:

查看更多哈工大讯飞联合实验室(HFL)发布的资源:https://github.com/ymcui/HFL-Anthology

新闻

2021/7/21 由哈工大SCIR多位学者撰写的《自然语言处理:基于预训练模型的方法》已出版,欢迎大家选购,也可参与我们的赠书活动

2020/12/13 基于大规模法律文书数据,我们训练了面向司法领域的中文ELECTRA系列模型,查看模型下载司法任务效果

2020/10/22 ELECTRA-180g已发布,增加了CommonCrawl的高质量数据,查看模型下载

2020/9/15 我们的论文"Revisiting Pre-Trained Models for Chinese Natural Language Processing"Findings of EMNLP录用为长文。

2020/8/27 哈工大讯飞联合实验室在通用自然语言理解评测GLUE中荣登榜首,查看GLUE榜单新闻

点击这里查看历史新闻

2020/5/29 Chinese ELECTRA-large/small-ex已发布,请查看模型下载,目前只提供Google Drive下载地址,敬请谅解。

2020/4/7 PyTorch用户可通过 🤗 Transformers加载模型,查看快速加载

2020/3/31 本目录发布的模型已接入飞桨PaddleHub,查看快速加载

2020/3/25 Chinese ELECTRA-small/base已发布,请查看模型下载

内容导引

章节 描述
简介 介绍ELECTRA基本原理
模型下载 中文ELECTRA预训练模型下载
快速加载 介绍了如何使用 🤗 TransformersPaddleHub快速加载模型
基线系统效果 中文基线系统效果:阅读理解、文本分类等
使用方法 模型的详细使用方法
FAQ 常见问题答疑
引用 本目录的技术报告

简介

ELECTRA提出了一套新的预训练框架,其中包括两个部分:GeneratorDiscriminator

  • Generator: 一个小的MLM,在[MASK]的位置预测原来的词。Generator将用来把输入文本做部分词的替换。
  • Discriminator: 判断输入句子中的每个词是否被替换,即使用Replaced Token Detection (RTD)预训练任务,取代了BERT原始的Masked Language Model (MLM)。需要注意的是这里并没有使用Next Sentence Prediction (NSP)任务。

在预训练阶段结束之后,我们只使用Discriminator作为下游任务精调的基模型。

更详细的内容请查阅ELECTRA论文:ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

模型下载

本目录中包含以下模型,目前仅提供TensorFlow版本权重。

  • ELECTRA-large, Chinese: 24-layer, 1024-hidden, 16-heads, 324M parameters
  • ELECTRA-base, Chinese: 12-layer, 768-hidden, 12-heads, 102M parameters
  • ELECTRA-small-ex, Chinese: 24-layer, 256-hidden, 4-heads, 25M parameters
  • ELECTRA-small, Chinese: 12-layer, 256-hidden, 4-heads, 12M parameters

大语料版(新版,180G数据)

模型简称 Google下载 讯飞云下载 压缩包大小
ELECTRA-180g-large, Chinese TensorFlow TensorFlow(密码Yfcy) 1G
ELECTRA-180g-base, Chinese TensorFlow TensorFlow(密码Xcvm) 383M
ELECTRA-180g-small-ex, Chinese TensorFlow TensorFlow(密码GUdp) 92M
ELECTRA-180g-small, Chinese TensorFlow TensorFlow(密码qsHj) 46M

基础版(原版,20G数据)

模型简称 Google下载 讯飞云下载 压缩包大小
ELECTRA-large, Chinese TensorFlow (待补充) 1G
ELECTRA-base, Chinese TensorFlow TensorFlow(密码3VQu) 383M
ELECTRA-small-ex, Chinese TensorFlow (待补充) 92M
ELECTRA-small, Chinese TensorFlow TensorFlow(密码wm2E) 46M

司法领域版(new)

模型简称 Google下载 讯飞云下载 压缩包大小
legal-ELECTRA-large, Chinese TensorFlow TensorFlow(密码7f7b) 1G
legal-ELECTRA-base, Chinese TensorFlow TensorFlow(密码7f7b) 383M
legal-ELECTRA-small, Chinese TensorFlow TensorFlow(密码7f7b) 46M

PyTorch/TF2版本

如需PyTorch版本,请自行通过 🤗 Transformers提供的转换脚本convert_electra_original_tf_checkpoint_to_pytorch.py进行转换。如需配置文件可进入到本目录下的config文件夹中查找。

python transformers/src/transformers/convert_electra_original_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path ./path-to-large-model/ \
--config_file ./path-to-large-model/discriminator.json \
--pytorch_dump_path ./path-to-output/model.bin \
--discriminator_or_generator discriminator

或者通过huggingface官网直接下载PyTorch版权重:https://huggingface.co/hfl

方法:点击任意需要下载的model → 拉到最下方点击"List all files in model" → 在弹出的小框中下载bin和json文件。

使用须知

中国大陆境内建议使用讯飞云下载点,境外用户建议使用谷歌下载点。 以TensorFlow版ELECTRA-small, Chinese为例,下载完毕后对zip文件进行解压得到如下文件。

chinese_electra_small_L-12_H-256_A-4.zip
    |- electra_small.data-00000-of-00001    # 模型权重
    |- electra_small.meta                   # 模型meta信息
    |- electra_small.index                  # 模型index信息
    |- vocab.txt                            # 词表
    |- discriminator.json                   # 配置文件:discriminator(若没有可从本repo中的config目录获取)
    |- generator.json                       # 配置文件:generator(若没有可从本repo中的config目录获取)

训练细节

我们采用了大规模中文维基以及通用文本训练了ELECTRA模型,总token数达到5.4B,与RoBERTa-wwm-ext系列模型一致。词表方面沿用了谷歌原版BERT的WordPiece词表,包含21,128个token。其他细节和超参数如下(未提及的参数保持默认):

  • ELECTRA-large: 24层,隐层1024,16个注意力头,学习率1e-4,batch96,最大长度512,训练2M步
  • ELECTRA-base: 12层,隐层768,12个注意力头,学习率2e-4,batch256,最大长度512,训练1M步
  • ELECTRA-small-ex: 24层,隐层256,4个注意力头,学习率5e-4,batch384,最大长度512,训练2M步
  • ELECTRA-small: 12层,隐层256,4个注意力头,学习率5e-4,batch1024,最大长度512,训练1M步

快速加载

使用Huggingface-Transformers

Huggingface-Transformers 2.8.0版本已正式支持ELECTRA模型,可通过如下命令调用。

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModel.from_pretrained(MODEL_NAME) 

其中MODEL_NAME对应列表如下:

模型名 组件 MODEL_NAME
ELECTRA-180g-large, Chinese discriminator hfl/chinese-electra-180g-large-discriminator
ELECTRA-180g-large, Chinese generator hfl/chinese-electra-180g-large-generator
ELECTRA-180g-base, Chinese discriminator hfl/chinese-electra-180g-base-discriminator
ELECTRA-180g-base, Chinese generator hfl/chinese-electra-180g-base-generator
ELECTRA-180g-small-ex, Chinese discriminator hfl/chinese-electra-180g-small-ex-discriminator
ELECTRA-180g-small-ex, Chinese generator hfl/chinese-electra-180g-small-ex-generator
ELECTRA-180g-small, Chinese discriminator hfl/chinese-electra-180g-small-discriminator
ELECTRA-180g-small, Chinese generator hfl/chinese-electra-180g-small-generator
ELECTRA-large, Chinese discriminator hfl/chinese-electra-large-discriminator
ELECTRA-large, Chinese generator hfl/chinese-electra-large-generator
ELECTRA-base, Chinese discriminator hfl/chinese-electra-base-discriminator
ELECTRA-base, Chinese generator hfl/chinese-electra-base-generator
ELECTRA-small-ex, Chinese discriminator hfl/chinese-electra-small-ex-discriminator
ELECTRA-small-ex, Chinese generator hfl/chinese-electra-small-ex-generator
ELECTRA-small, Chinese discriminator hfl/chinese-electra-small-discriminator
ELECTRA-small, Chinese generator hfl/chinese-electra-small-generator

司法领域版本:

模型名 组件 MODEL_NAME
legal-ELECTRA-large, Chinese discriminator hfl/chinese-legal-electra-large-discriminator
legal-ELECTRA-large, Chinese generator hfl/chinese-legal-electra-large-generator
legal-ELECTRA-base, Chinese discriminator hfl/chinese-legal-electra-base-discriminator
legal-ELECTRA-base, Chinese generator hfl/chinese-legal-electra-base-generator
legal-ELECTRA-small, Chinese discriminator hfl/chinese-legal-electra-small-discriminator
legal-ELECTRA-small, Chinese generator hfl/chinese-legal-electra-small-generator

使用PaddleHub

依托PaddleHub,我们只需一行代码即可完成模型下载安装,十余行代码即可完成文本分类、序列标注、阅读理解等任务。

import paddlehub as hub
module = hub.Module(name=MODULE_NAME)

其中MODULE_NAME对应列表如下:

模型名 MODULE_NAME
ELECTRA-base, Chinese chinese-electra-base
ELECTRA-small, Chinese chinese-electra-small

基线系统效果

我们将ELECTRA-small/baseBERT-baseBERT-wwmBERT-wwm-extRoBERTa-wwm-extRBT3进行了效果对比,包括以下六个任务:

对于ELECTRA-small/base模型,我们使用原论文默认的3e-41e-4的学习率。 需要注意的是,我们没有针对任何任务进行参数精调,所以通过调整学习率等超参数可能获得进一步性能提升。 为了保证结果的可靠性,对于同一模型,我们使用不同随机种子训练10遍,汇报模型性能的最大值和平均值(括号内为平均值)。

简体中文阅读理解:CMRC 2018

CMRC 2018数据集是哈工大讯飞联合实验室发布的中文机器阅读理解数据。 根据给定问题,系统需要从篇章中抽取出片段作为答案,形式与SQuAD相同。 评价指标为:EM / F1

模型 开发集 测试集 挑战集 参数量
BERT-base 65.5 (64.4) / 84.5 (84.0) 70.0 (68.7) / 87.0 (86.3) 18.6 (17.0) / 43.3 (41.3) 102M
BERT-wwm 66.3 (65.0) / 85.6 (84.7) 70.5 (69.1) / 87.4 (86.7) 21.0 (19.3) / 47.0 (43.9) 102M
BERT-wwm-ext 67.1 (65.6) / 85.7 (85.0) 71.4 (70.0) / 87.7 (87.0) 24.0 (20.0) / 47.3 (44.6) 102M
RoBERTa-wwm-ext 67.4 (66.5) / 87.2 (86.5) 72.6 (71.4) / 89.4 (88.8) 26.2 (24.6) / 51.0 (49.1) 102M
RBT3 57.0 / 79.0 62.2 / 81.8 14.7 / 36.2 38M
ELECTRA-small 63.4 (62.9) / 80.8 (80.2) 67.8 (67.4) / 83.4 (83.0) 16.3 (15.4) / 37.2 (35.8) 12M
ELECTRA-180g-small 63.8 / 82.7 68.5 / 85.2 15.1 / 35.8 12M
ELECTRA-small-ex 66.4 / 82.2 71.3 / 85.3 18.1 / 38.3 25M
ELECTRA-180g-small-ex 68.1 / 85.1 71.8 / 87.2 20.6 / 41.7 25M
ELECTRA-base 68.4 (68.0) / 84.8 (84.6) 73.1 (72.7) / 87.1 (86.9) 22.6 (21.7) / 45.0 (43.8) 102M
ELECTRA-180g-base 69.3 / 87.0 73.1 / 88.6 24.0 / 48.6 102M
ELECTRA-large 69.1 / 85.2 73.9 / 87.1 23.0 / 44.2 324M
ELECTRA-180g-large 68.5 / 86.2 73.5 / 88.5 21.8 / 42.9 324M

繁体中文阅读理解:DRCD

DRCD数据集由中国台湾台达研究院发布,其形式与SQuAD相同,是基于繁体中文的抽取式阅读理解数据集。 评价指标为:EM / F1

模型 开发集 测试集 参数量
BERT-base 83.1 (82.7) / 89.9 (89.6) 82.2 (81.6) / 89.2 (88.8) 102M
BERT-wwm 84.3 (83.4) / 90.5 (90.2) 82.8 (81.8) / 89.7 (89.0) 102M
BERT-wwm-ext 85.0 (84.5) / 91.2 (90.9) 83.6 (83.0) / 90.4 (89.9) 102M
RoBERTa-wwm-ext 86.6 (85.9) / 92.5 (92.2) 85.6 (85.2) / 92.0 (91.7) 102M
RBT3 76.3 / 84.9 75.0 / 83.9 38M
ELECTRA-small 79.8 (79.4) / 86.7 (86.4) 79.0 (78.5) / 85.8 (85.6) 12M
ELECTRA-180g-small 83.5 / 89.2 82.9 / 88.7 12M
ELECTRA-small-ex 84.0 / 89.5 83.3 / 89.1 25M
ELECTRA-180g-small-ex 87.3 / 92.3 86.5 / 91.3 25M
ELECTRA-base 87.5 (87.0) / 92.5 (92.3) 86.9 (86.6) / 91.8 (91.7) 102M
ELECTRA-180g-base 89.6 / 94.2 88.9 / 93.7 102M
ELECTRA-large 88.8 / 93.3 88.8 / 93.6 324M
ELECTRA-180g-large 90.1 / 94.8 90.5 / 94.7 324M

自然语言推断:XNLI

在自然语言推断任务中,我们采用了XNLI数据,需要将文本分成三个类别:entailmentneutralcontradictory。 评价指标为:Accuracy

模型 开发集 测试集 参数量
BERT-base 77.8 (77.4) 77.8 (77.5) 102M
BERT-wwm 79.0 (78.4) 78.2 (78.0) 102M
BERT-wwm-ext 79.4 (78.6) 78.7 (78.3) 102M
RoBERTa-wwm-ext 80.0 (79.2) 78.8 (78.3) 102M
RBT3 72.2 72.3 38M
ELECTRA-small 73.3 (72.5) 73.1 (72.6) 12M
ELECTRA-180g-small 74.6 74.6 12M
ELECTRA-small-ex 75.4 75.8 25M
ELECTRA-180g-small-ex 76.5 76.6 25M
ELECTRA-base 77.9 (77.0) 78.4 (77.8) 102M
ELECTRA-180g-base 79.6 79.5 102M
ELECTRA-large 81.5 81.0 324M
ELECTRA-180g-large 81.2 80.4 324M

情感分析:ChnSentiCorp

在情感分析任务中,二分类的情感分类数据集ChnSentiCorp。 评价指标为:Accuracy

模型 开发集 测试集 参数量
BERT-base 94.7 (94.3) 95.0 (94.7) 102M
BERT-wwm 95.1 (94.5) 95.4 (95.0) 102M
BERT-wwm-ext 95.4 (94.6) 95.3 (94.7) 102M
RoBERTa-wwm-ext 95.0 (94.6) 95.6 (94.8) 102M
RBT3 92.8 92.8 38M
ELECTRA-small 92.8 (92.5) 94.3 (93.5) 12M
ELECTRA-180g-small 94.1 93.6 12M
ELECTRA-small-ex 92.6 93.6 25M
ELECTRA-180g-small-ex 92.8 93.4 25M
ELECTRA-base 93.8 (93.0) 94.5 (93.5) 102M
ELECTRA-180g-base 94.3 94.8 102M
ELECTRA-large 95.2 95.3 324M
ELECTRA-180g-large 94.8 95.2 324M

句对分类:LCQMC

以下两个数据集均需要将一个句对进行分类,判断两个句子的语义是否相同(二分类任务)。

LCQMC由哈工大深圳研究生院智能计算研究中心发布。 评价指标为:Accuracy

模型 开发集 测试集 参数量
BERT 89.4 (88.4) 86.9 (86.4) 102M
BERT-wwm 89.4 (89.2) 87.0 (86.8) 102M
BERT-wwm-ext 89.6 (89.2) 87.1 (86.6) 102M
RoBERTa-wwm-ext 89.0 (88.7) 86.4 (86.1) 102M
RBT3 85.3 85.1 38M
ELECTRA-small 86.7 (86.3) 85.9 (85.6) 12M
ELECTRA-180g-small 86.6 85.8 12M
ELECTRA-small-ex 87.5 86.0 25M
ELECTRA-180g-small-ex 87.6 86.3 25M
ELECTRA-base 90.2 (89.8) 87.6 (87.3) 102M
ELECTRA-180g-base 90.2 87.1 102M
ELECTRA-large 90.7 87.3 324M
ELECTRA-180g-large 90.3 87.3 324M

句对分类:BQ Corpus

BQ Corpus由哈工大深圳研究生院智能计算研究中心发布,是面向银行领域的数据集。 评价指标为:Accuracy

模型 开发集 测试集 参数量
BERT 86.0 (85.5) 84.8 (84.6) 102M
BERT-wwm 86.1 (85.6) 85.2 (84.9) 102M
BERT-wwm-ext 86.4 (85.5) 85.3 (84.8) 102M
RoBERTa-wwm-ext 86.0 (85.4) 85.0 (84.6) 102M
RBT3 84.1 83.3 38M
ELECTRA-small 83.5 (83.0) 82.0 (81.7) 12M
ELECTRA-180g-small 83.3 82.1 12M
ELECTRA-small-ex 84.0 82.6 25M
ELECTRA-180g-small-ex 84.6 83.4 25M
ELECTRA-base 84.8 (84.7) 84.5 (84.0) 102M
ELECTRA-180g-base 85.8 84.5 102M
ELECTRA-large 86.7 85.1 324M
ELECTRA-180g-large 86.4 85.4 324M

司法任务效果

我们使用CAIL 2018司法评测的罪名预测数据对司法ELECTRA进行了测试。small/base/large学习率分别为:5e-4/3e-4/1e-4。 评价指标为:Accuracy

模型 开发集 测试集 参数量
ELECTRA-small 78.84 76.35 12M
legal-ELECTRA-small 79.60 77.03 12M
ELECTRA-base 80.94 78.41 102M
legal-ELECTRA-base 81.71 79.17 102M
ELECTRA-large 81.53 78.97 324M
legal-ELECTRA-large 82.60 79.89 324M

使用方法

用户可以基于已发布的上述中文ELECTRA预训练模型进行下游任务精调。 在这里我们只介绍最基本的用法,更详细的用法请参考ELECTRA官方介绍

本例中,我们使用ELECTRA-small模型在CMRC 2018任务上进行精调,相关步骤如下。假设,

  • data-dir:工作根目录,可按实际情况设置。
  • model-name:模型名称,本例中为electra-small
  • task-name:任务名称,本例中为cmrc2018。本目录中的代码已适配了以上六个中文任务,task-name分别为cmrc2018drcdxnlichnsenticorplcqmcbqcorpus

第一步:下载预训练模型并解压

模型下载章节中,下载ELECTRA-small模型,并解压至${data-dir}/models/${model-name}。 该目录下应包含electra_model.*vocab.txtcheckpoint,共计5个文件。

第二步:准备任务数据

下载CMRC 2018训练集和开发集,并重命名为train.jsondev.json。 将两个文件放到${data-dir}/finetuning_data/${task-name}

第三步:运行训练命令

python run_finetuning.py \
    --data-dir ${data-dir} \
    --model-name ${model-name} \
    --hparams params_cmrc2018.json

其中data-dirmodel-name在上面已经介绍。hparams是一个JSON词典,在本例中的params_cmrc2018.json包含了精调相关超参数,例如:

{
    "task_names": ["cmrc2018"],
    "max_seq_length": 512,
    "vocab_size": 21128,
    "model_size": "small",
    "do_train": true,
    "do_eval": true,
    "write_test_outputs": true,
    "num_train_epochs": 2,
    "learning_rate": 3e-4,
    "train_batch_size": 32,
    "eval_batch_size": 32,
}

在上述JSON文件中,我们只列举了最重要的一些参数,完整参数列表请查阅configure_finetuning.py

运行完毕后,

  1. 对于阅读理解任务,生成的预测JSON数据cmrc2018_dev_preds.json保存在${data-dir}/results/${task-name}_qa/。可以调用外部评测脚本来得到最终评测结果,例如:python cmrc2018_drcd_evaluate.py dev.json cmrc2018_dev_preds.json
  2. 对于分类任务,相关accuracy信息会直接打印在屏幕,例如:xnli: accuracy: 72.5 - loss: 0.67

FAQ

Q: 在下游任务精调的时候ELECTRA模型的学习率怎么设置?
A: 我们建议使用原论文使用的学习率作为初始基线(small是3e-4,base是1e-4)然后适当增减学习率进行调试。 需要注意的是,相比BERT、RoBERTa一类的模型来说ELECTRA的学习率要相对大一些。

Q: 有没有PyTorch版权重?
A: 有,模型下载

Q: 预训练用的数据能共享一下吗?
A: 很遗憾,不可以。

Q: 未来计划?
A: 敬请关注。

引用

如果本目录中的内容对你的研究工作有所帮助,欢迎在论文中引用下述技术报告。

@inproceedings{cui-etal-2020-revisiting,
    title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing",
    author = "Cui, Yiming  and
      Che, Wanxiang  and
      Liu, Ting  and
      Qin, Bing  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58",
    pages = "657--668",
}

关注我们

欢迎关注哈工大讯飞联合实验室官方微信公众号,了解最新的技术动态。

qrcode.png

问题反馈

Before you submit an issue:

  • You are advised to read FAQ first before you submit an issue.
  • Repetitive and irrelevant issues will be ignored and closed by [stable-bot](stale · GitHub Marketplace). Thank you for your understanding and support.
  • We cannot acommodate EVERY request, and thus please bare in mind that there is no guarantee that your request will be met.
  • Always be polite when you submit an issue.
Comments
  • 请教,是否观察到 electra 较 bert/roberta 收敛更快?

    请教,是否观察到 electra 较 bert/roberta 收敛更快?

    比较 pretraining 不同 steps 的 checkpoint。同 step 对应的 checkpoint,electra 100% label 学习的优势,在 finetuning 效果上,论文里是显著快于 bert 的。

    不知道复现是否有这个结论呢?我们在做一个类似的策略,收敛速度上并没有论文显著。

    opened by nbcc 7
  • 关于复现LCQMC任务的一些疑问

    关于复现LCQMC任务的一些疑问

    我根据readme的使用方法进行操作,提示“writing predictions is not supported for this task”,于是我在代码“ if task.name in ["cola", "mrpc", "mnli", "sst", "rte", "qnli", "qqp", "sts"]:”中增加了lcqmc。但是屏幕只输出loss没有Accuracy。请问是哪里出错了?如何复现出readme中所展示的效果?

    stale 
    opened by GJHHHH 6
  • Some weights of the model checkpoint at hfl/chinese-electra-small-ex-generator were not used when initializing

    Some weights of the model checkpoint at hfl/chinese-electra-small-ex-generator were not used when initializing

    transformers 4.0.1加载chinese-electra-small-ex-generator输出如下log,请问这个是正常的吗?

    Some weights of the model checkpoint at hfl/chinese-electra-small-ex-generator were not used when initializing BertForMultiTaskClassification: ['electra.embeddings.word_embeddings.weight', 'electra.embeddings.position_embeddings.weight', 'electra.embeddings.token_type_embeddings.weight', 'electra.embeddings.LayerNorm.weight', 'electra.embeddings.LayerNorm.bias', 'electra.embeddings_project.weight', 'electra.embeddings_project.bias', 'electra.encoder.layer.0.attention.self.query.weight', 'electra.encoder.layer.0.attention.self.query.bias', 'electra.encoder.layer.0.attention.self.key.weight', 'electra.encoder.layer.0.attention.self.key.bias', 'electra.encoder.layer.0.attention.self.value.weight', 'electra.encoder.layer.0.attention.self.value.bias', 'electra.encoder.layer.0.attention.output.dense.weight', 'electra.encoder.layer.0.attention.output.dense.bias', 'electra.encoder.layer.0.attention.output.LayerNorm.weight', 'electra.encoder.layer.0.attention.output.LayerNorm.bias', 'electra.encoder.layer.0.intermediate.dense.weight', 'electra.encoder.layer.0.intermediate.dense.bias', 'electra.encoder.layer.0.output.dense.weight', 'electra.encoder.layer.0.output.dense.bias', 'electra.encoder.layer.0.output.LayerNorm.weight', 'electra.encoder.layer.0.output.LayerNorm.bias', 'electra.encoder.layer.1.attention.self.query.weight', 'electra.encoder.layer.1.attention.self.query.bias', 'electra.encoder.layer.1.attention.self.key.weight', 'electra.encoder.layer.1.attention.self.key.bias', 'electra.encoder.layer.1.attention.self.value.weight', 'electra.encoder.layer.1.attention.self.value.bias', 'electra.encoder.layer.1.attention.output.dense.weight', 'electra.encoder.layer.1.attention.output.dense.bias', 'electra.encoder.layer.1.attention.output.LayerNorm.weight', 'electra.encoder.layer.1.attention.output.LayerNorm.bias', 'electra.encoder.layer.1.intermediate.dense.weight', 'electra.encoder.layer.1.intermediate.dense.bias', 'electra.encoder.layer.1.output.dense.weight', 'electra.encoder.layer.1.output.dense.bias', 'electra.encoder.layer.1.output.LayerNorm.weight', 'electra.encoder.layer.1.output.LayerNorm.bias', 'electra.encoder.layer.2.attention.self.query.weight', 'electra.encoder.layer.2.attention.self.query.bias', 'electra.encoder.layer.2.attention.self.key.weight', 'electra.encoder.layer.2.attention.self.key.bias', 'electra.encoder.layer.2.attention.self.value.weight', 'electra.encoder.layer.2.attention.self.value.bias', 'electra.encoder.layer.2.attention.output.dense.weight', 'electra.encoder.layer.2.attention.output.dense.bias', 'electra.encoder.layer.2.attention.output.LayerNorm.weight', 'electra.encoder.layer.2.attention.output.LayerNorm.bias', 'electra.encoder.layer.2.intermediate.dense.weight', 'electra.encoder.layer.2.intermediate.dense.bias', 'electra.encoder.layer.2.output.dense.weight', 'electra.encoder.layer.2.output.dense.bias', 'electra.encoder.layer.2.output.LayerNorm.weight', 'electra.encoder.layer.2.output.LayerNorm.bias', 'electra.encoder.layer.3.attention.self.query.weight', 'electra.encoder.layer.3.attention.self.query.bias', 'electra.encoder.layer.3.attention.self.key.weight', 'electra.encoder.layer.3.attention.self.key.bias', 'electra.encoder.layer.3.attention.self.value.weight', 'electra.encoder.layer.3.attention.self.value.bias', 'electra.encoder.layer.3.attention.output.dense.weight', 'electra.encoder.layer.3.attention.output.dense.bias', 'electra.encoder.layer.3.attention.output.LayerNorm.weight', 'electra.encoder.layer.3.attention.output.LayerNorm.bias', 'electra.encoder.layer.3.intermediate.dense.weight', 'electra.encoder.layer.3.intermediate.dense.bias', 'electra.encoder.layer.3.output.dense.weight', 'electra.encoder.layer.3.output.dense.bias', 'electra.encoder.layer.3.output.LayerNorm.weight', 'electra.encoder.layer.3.output.LayerNorm.bias', 'electra.encoder.layer.4.attention.self.query.weight', 'electra.encoder.layer.4.attention.self.query.bias', 'electra.encoder.layer.4.attention.self.key.weight', 'electra.encoder.layer.4.attention.self.key.bias', 'electra.encoder.layer.4.attention.self.value.weight', 'electra.encoder.layer.4.attention.self.value.bias', 'electra.encoder.layer.4.attention.output.dense.weight', 'electra.encoder.layer.4.attention.output.dense.bias', 'electra.encoder.layer.4.attention.output.LayerNorm.weight', 'electra.encoder.layer.4.attention.output.LayerNorm.bias', 'electra.encoder.layer.4.intermediate.dense.weight', 'electra.encoder.layer.4.intermediate.dense.bias', 'electra.encoder.layer.4.output.dense.weight', 'electra.encoder.layer.4.output.dense.bias', 'electra.encoder.layer.4.output.LayerNorm.weight', 'electra.encoder.layer.4.output.LayerNorm.bias', 'electra.encoder.layer.5.attention.self.query.weight', 'electra.encoder.layer.5.attention.self.query.bias', 'electra.encoder.layer.5.attention.self.key.weight', 'electra.encoder.layer.5.attention.self.key.bias', 'electra.encoder.layer.5.attention.self.value.weight', 'electra.encoder.layer.5.attention.self.value.bias', 'electra.encoder.layer.5.attention.output.dense.weight', 'electra.encoder.layer.5.attention.output.dense.bias', 'electra.encoder.layer.5.attention.output.LayerNorm.weight', 'electra.encoder.layer.5.attention.output.LayerNorm.bias', 'electra.encoder.layer.5.intermediate.dense.weight', 'electra.encoder.layer.5.intermediate.dense.bias', 'electra.encoder.layer.5.output.dense.weight', 'electra.encoder.layer.5.output.dense.bias', 'electra.encoder.layer.5.output.LayerNorm.weight', 'electra.encoder.layer.5.output.LayerNorm.bias', 'electra.encoder.layer.6.attention.self.query.weight', 'electra.encoder.layer.6.attention.self.query.bias', 'electra.encoder.layer.6.attention.self.key.weight', 'electra.encoder.layer.6.attention.self.key.bias', 'electra.encoder.layer.6.attention.self.value.weight', 'electra.encoder.layer.6.attention.self.value.bias', 'electra.encoder.layer.6.attention.output.dense.weight', 'electra.encoder.layer.6.attention.output.dense.bias', 'electra.encoder.layer.6.attention.output.LayerNorm.weight', 'electra.encoder.layer.6.attention.output.LayerNorm.bias', 'electra.encoder.layer.6.intermediate.dense.weight', 'electra.encoder.layer.6.intermediate.dense.bias', 'electra.encoder.layer.6.output.dense.weight', 'electra.encoder.layer.6.output.dense.bias', 'electra.encoder.layer.6.output.LayerNorm.weight', 'electra.encoder.layer.6.output.LayerNorm.bias', 'electra.encoder.layer.7.attention.self.query.weight', 'electra.encoder.layer.7.attention.self.query.bias', 'electra.encoder.layer.7.attention.self.key.weight', 'electra.encoder.layer.7.attention.self.key.bias', 'electra.encoder.layer.7.attention.self.value.weight', 'electra.encoder.layer.7.attention.self.value.bias', 'electra.encoder.layer.7.attention.output.dense.weight', 'electra.encoder.layer.7.attention.output.dense.bias', 'electra.encoder.layer.7.attention.output.LayerNorm.weight', 'electra.encoder.layer.7.attention.output.LayerNorm.bias', 'electra.encoder.layer.7.intermediate.dense.weight', 'electra.encoder.layer.7.intermediate.dense.bias', 'electra.encoder.layer.7.output.dense.weight', 'electra.encoder.layer.7.output.dense.bias', 'electra.encoder.layer.7.output.LayerNorm.weight', 'electra.encoder.layer.7.output.LayerNorm.bias', 'electra.encoder.layer.8.attention.self.query.weight', 'electra.encoder.layer.8.attention.self.query.bias', 'electra.encoder.layer.8.attention.self.key.weight', 'electra.encoder.layer.8.attention.self.key.bias', 'electra.encoder.layer.8.attention.self.value.weight', 'electra.encoder.layer.8.attention.self.value.bias', 'electra.encoder.layer.8.attention.output.dense.weight', 'electra.encoder.layer.8.attention.output.dense.bias', 'electra.encoder.layer.8.attention.output.LayerNorm.weight', 'electra.encoder.layer.8.attention.output.LayerNorm.bias', 'electra.encoder.layer.8.intermediate.dense.weight', 'electra.encoder.layer.8.intermediate.dense.bias', 'electra.encoder.layer.8.output.dense.weight', 'electra.encoder.layer.8.output.dense.bias', 'electra.encoder.layer.8.output.LayerNorm.weight', 'electra.encoder.layer.8.output.LayerNorm.bias', 'electra.encoder.layer.9.attention.self.query.weight', 'electra.encoder.layer.9.attention.self.query.bias', 'electra.encoder.layer.9.attention.self.key.weight', 'electra.encoder.layer.9.attention.self.key.bias', 'electra.encoder.layer.9.attention.self.value.weight', 'electra.encoder.layer.9.attention.self.value.bias', 'electra.encoder.layer.9.attention.output.dense.weight', 'electra.encoder.layer.9.attention.output.dense.bias', 'electra.encoder.layer.9.attention.output.LayerNorm.weight', 'electra.encoder.layer.9.attention.output.LayerNorm.bias', 'electra.encoder.layer.9.intermediate.dense.weight', 'electra.encoder.layer.9.intermediate.dense.bias', 'electra.encoder.layer.9.output.dense.weight', 'electra.encoder.layer.9.output.dense.bias', 'electra.encoder.layer.9.output.LayerNorm.weight', 'electra.encoder.layer.9.output.LayerNorm.bias', 'electra.encoder.layer.10.attention.self.query.weight', 'electra.encoder.layer.10.attention.self.query.bias', 'electra.encoder.layer.10.attention.self.key.weight', 'electra.encoder.layer.10.attention.self.key.bias', 'electra.encoder.layer.10.attention.self.value.weight', 'electra.encoder.layer.10.attention.self.value.bias', 'electra.encoder.layer.10.attention.output.dense.weight', 'electra.encoder.layer.10.attention.output.dense.bias', 'electra.encoder.layer.10.attention.output.LayerNorm.weight', 'electra.encoder.layer.10.attention.output.LayerNorm.bias', 'electra.encoder.layer.10.intermediate.dense.weight', 'electra.encoder.layer.10.intermediate.dense.bias', 'electra.encoder.layer.10.output.dense.weight', 'electra.encoder.layer.10.output.dense.bias', 'electra.encoder.layer.10.output.LayerNorm.weight', 'electra.encoder.layer.10.output.LayerNorm.bias', 'electra.encoder.layer.11.attention.self.query.weight', 'electra.encoder.layer.11.attention.self.query.bias', 'electra.encoder.layer.11.attention.self.key.weight', 'electra.encoder.layer.11.attention.self.key.bias', 'electra.encoder.layer.11.attention.self.value.weight', 'electra.encoder.layer.11.attention.self.value.bias', 'electra.encoder.layer.11.attention.output.dense.weight', 'electra.encoder.layer.11.attention.output.dense.bias', 'electra.encoder.layer.11.attention.output.LayerNorm.weight', 'electra.encoder.layer.11.attention.output.LayerNorm.bias', 'electra.encoder.layer.11.intermediate.dense.weight', 'electra.encoder.layer.11.intermediate.dense.bias', 'electra.encoder.layer.11.output.dense.weight', 'electra.encoder.layer.11.output.dense.bias', 'electra.encoder.layer.11.output.LayerNorm.weight', 'electra.encoder.layer.11.output.LayerNorm.bias', 'electra.encoder.layer.12.attention.self.query.weight', 'electra.encoder.layer.12.attention.self.query.bias', 'electra.encoder.layer.12.attention.self.key.weight', 'electra.encoder.layer.12.attention.self.key.bias', 'electra.encoder.layer.12.attention.self.value.weight', 'electra.encoder.layer.12.attention.self.value.bias', 'electra.encoder.layer.12.attention.output.dense.weight', 'electra.encoder.layer.12.attention.output.dense.bias', 'electra.encoder.layer.12.attention.output.LayerNorm.weight', 'electra.encoder.layer.12.attention.output.LayerNorm.bias', 'electra.encoder.layer.12.intermediate.dense.weight', 'electra.encoder.layer.12.intermediate.dense.bias', 'electra.encoder.layer.12.output.dense.weight', 'electra.encoder.layer.12.output.dense.bias', 'electra.encoder.layer.12.output.LayerNorm.weight', 'electra.encoder.layer.12.output.LayerNorm.bias', 'electra.encoder.layer.13.attention.self.query.weight', 'electra.encoder.layer.13.attention.self.query.bias', 'electra.encoder.layer.13.attention.self.key.weight', 'electra.encoder.layer.13.attention.self.key.bias', 'electra.encoder.layer.13.attention.self.value.weight', 'electra.encoder.layer.13.attention.self.value.bias', 'electra.encoder.layer.13.attention.output.dense.weight', 'electra.encoder.layer.13.attention.output.dense.bias', 'electra.encoder.layer.13.attention.output.LayerNorm.weight', 'electra.encoder.layer.13.attention.output.LayerNorm.bias', 'electra.encoder.layer.13.intermediate.dense.weight', 'electra.encoder.layer.13.intermediate.dense.bias', 'electra.encoder.layer.13.output.dense.weight', 'electra.encoder.layer.13.output.dense.bias', 'electra.encoder.layer.13.output.LayerNorm.weight', 'electra.encoder.layer.13.output.LayerNorm.bias', 'electra.encoder.layer.14.attention.self.query.weight', 'electra.encoder.layer.14.attention.self.query.bias', 'electra.encoder.layer.14.attention.self.key.weight', 'electra.encoder.layer.14.attention.self.key.bias', 'electra.encoder.layer.14.attention.self.value.weight', 'electra.encoder.layer.14.attention.self.value.bias', 'electra.encoder.layer.14.attention.output.dense.weight', 'electra.encoder.layer.14.attention.output.dense.bias', 'electra.encoder.layer.14.attention.output.LayerNorm.weight', 'electra.encoder.layer.14.attention.output.LayerNorm.bias', 'electra.encoder.layer.14.intermediate.dense.weight', 'electra.encoder.layer.14.intermediate.dense.bias', 'electra.encoder.layer.14.output.dense.weight', 'electra.encoder.layer.14.output.dense.bias', 'electra.encoder.layer.14.output.LayerNorm.weight', 'electra.encoder.layer.14.output.LayerNorm.bias', 'electra.encoder.layer.15.attention.self.query.weight', 'electra.encoder.layer.15.attention.self.query.bias', 'electra.encoder.layer.15.attention.self.key.weight', 'electra.encoder.layer.15.attention.self.key.bias', 'electra.encoder.layer.15.attention.self.value.weight', 'electra.encoder.layer.15.attention.self.value.bias', 'electra.encoder.layer.15.attention.output.dense.weight', 'electra.encoder.layer.15.attention.output.dense.bias', 'electra.encoder.layer.15.attention.output.LayerNorm.weight', 'electra.encoder.layer.15.attention.output.LayerNorm.bias', 'electra.encoder.layer.15.intermediate.dense.weight', 'electra.encoder.layer.15.intermediate.dense.bias', 'electra.encoder.layer.15.output.dense.weight', 'electra.encoder.layer.15.output.dense.bias', 'electra.encoder.layer.15.output.LayerNorm.weight', 'electra.encoder.layer.15.output.LayerNorm.bias', 'electra.encoder.layer.16.attention.self.query.weight', 'electra.encoder.layer.16.attention.self.query.bias', 'electra.encoder.layer.16.attention.self.key.weight', 'electra.encoder.layer.16.attention.self.key.bias', 'electra.encoder.layer.16.attention.self.value.weight', 'electra.encoder.layer.16.attention.self.value.bias', 'electra.encoder.layer.16.attention.output.dense.weight', 'electra.encoder.layer.16.attention.output.dense.bias', 'electra.encoder.layer.16.attention.output.LayerNorm.weight', 'electra.encoder.layer.16.attention.output.LayerNorm.bias', 'electra.encoder.layer.16.intermediate.dense.weight', 'electra.encoder.layer.16.intermediate.dense.bias', 'electra.encoder.layer.16.output.dense.weight', 'electra.encoder.layer.16.output.dense.bias', 'electra.encoder.layer.16.output.LayerNorm.weight', 'electra.encoder.layer.16.output.LayerNorm.bias', 'electra.encoder.layer.17.attention.self.query.weight', 'electra.encoder.layer.17.attention.self.query.bias', 'electra.encoder.layer.17.attention.self.key.weight', 'electra.encoder.layer.17.attention.self.key.bias', 'electra.encoder.layer.17.attention.self.value.weight', 'electra.encoder.layer.17.attention.self.value.bias', 'electra.encoder.layer.17.attention.output.dense.weight', 'electra.encoder.layer.17.attention.output.dense.bias', 'electra.encoder.layer.17.attention.output.LayerNorm.weight', 'electra.encoder.layer.17.attention.output.LayerNorm.bias', 'electra.encoder.layer.17.intermediate.dense.weight', 'electra.encoder.layer.17.intermediate.dense.bias', 'electra.encoder.layer.17.output.dense.weight', 'electra.encoder.layer.17.output.dense.bias', 'electra.encoder.layer.17.output.LayerNorm.weight', 'electra.encoder.layer.17.output.LayerNorm.bias', 'electra.encoder.layer.18.attention.self.query.weight', 'electra.encoder.layer.18.attention.self.query.bias', 'electra.encoder.layer.18.attention.self.key.weight', 'electra.encoder.layer.18.attention.self.key.bias', 'electra.encoder.layer.18.attention.self.value.weight', 'electra.encoder.layer.18.attention.self.value.bias', 'electra.encoder.layer.18.attention.output.dense.weight', 'electra.encoder.layer.18.attention.output.dense.bias', 'electra.encoder.layer.18.attention.output.LayerNorm.weight', 'electra.encoder.layer.18.attention.output.LayerNorm.bias', 'electra.encoder.layer.18.intermediate.dense.weight', 'electra.encoder.layer.18.intermediate.dense.bias', 'electra.encoder.layer.18.output.dense.weight', 'electra.encoder.layer.18.output.dense.bias', 'electra.encoder.layer.18.output.LayerNorm.weight', 'electra.encoder.layer.18.output.LayerNorm.bias', 'electra.encoder.layer.19.attention.self.query.weight', 'electra.encoder.layer.19.attention.self.query.bias', 'electra.encoder.layer.19.attention.self.key.weight', 'electra.encoder.layer.19.attention.self.key.bias', 'electra.encoder.layer.19.attention.self.value.weight', 'electra.encoder.layer.19.attention.self.value.bias', 'electra.encoder.layer.19.attention.output.dense.weight', 'electra.encoder.layer.19.attention.output.dense.bias', 'electra.encoder.layer.19.attention.output.LayerNorm.weight', 'electra.encoder.layer.19.attention.output.LayerNorm.bias', 'electra.encoder.layer.19.intermediate.dense.weight', 'electra.encoder.layer.19.intermediate.dense.bias', 'electra.encoder.layer.19.output.dense.weight', 'electra.encoder.layer.19.output.dense.bias', 'electra.encoder.layer.19.output.LayerNorm.weight', 'electra.encoder.layer.19.output.LayerNorm.bias', 'electra.encoder.layer.20.attention.self.query.weight', 'electra.encoder.layer.20.attention.self.query.bias', 'electra.encoder.layer.20.attention.self.key.weight', 'electra.encoder.layer.20.attention.self.key.bias', 'electra.encoder.layer.20.attention.self.value.weight', 'electra.encoder.layer.20.attention.self.value.bias', 'electra.encoder.layer.20.attention.output.dense.weight', 'electra.encoder.layer.20.attention.output.dense.bias', 'electra.encoder.layer.20.attention.output.LayerNorm.weight', 'electra.encoder.layer.20.attention.output.LayerNorm.bias', 'electra.encoder.layer.20.intermediate.dense.weight', 'electra.encoder.layer.20.intermediate.dense.bias', 'electra.encoder.layer.20.output.dense.weight', 'electra.encoder.layer.20.output.dense.bias', 'electra.encoder.layer.20.output.LayerNorm.weight', 'electra.encoder.layer.20.output.LayerNorm.bias', 'electra.encoder.layer.21.attention.self.query.weight', 'electra.encoder.layer.21.attention.self.query.bias', 'electra.encoder.layer.21.attention.self.key.weight', 'electra.encoder.layer.21.attention.self.key.bias', 'electra.encoder.layer.21.attention.self.value.weight', 'electra.encoder.layer.21.attention.self.value.bias', 'electra.encoder.layer.21.attention.output.dense.weight', 'electra.encoder.layer.21.attention.output.dense.bias', 'electra.encoder.layer.21.attention.output.LayerNorm.weight', 'electra.encoder.layer.21.attention.output.LayerNorm.bias', 'electra.encoder.layer.21.intermediate.dense.weight', 'electra.encoder.layer.21.intermediate.dense.bias', 'electra.encoder.layer.21.output.dense.weight', 'electra.encoder.layer.21.output.dense.bias', 'electra.encoder.layer.21.output.LayerNorm.weight', 'electra.encoder.layer.21.output.LayerNorm.bias', 'electra.encoder.layer.22.attention.self.query.weight', 'electra.encoder.layer.22.attention.self.query.bias', 'electra.encoder.layer.22.attention.self.key.weight', 'electra.encoder.layer.22.attention.self.key.bias', 'electra.encoder.layer.22.attention.self.value.weight', 'electra.encoder.layer.22.attention.self.value.bias', 'electra.encoder.layer.22.attention.output.dense.weight', 'electra.encoder.layer.22.attention.output.dense.bias', 'electra.encoder.layer.22.attention.output.LayerNorm.weight', 'electra.encoder.layer.22.attention.output.LayerNorm.bias', 'electra.encoder.layer.22.intermediate.dense.weight', 'electra.encoder.layer.22.intermediate.dense.bias', 'electra.encoder.layer.22.output.dense.weight', 'electra.encoder.layer.22.output.dense.bias', 'electra.encoder.layer.22.output.LayerNorm.weight', 'electra.encoder.layer.22.output.LayerNorm.bias', 'electra.encoder.layer.23.attention.self.query.weight', 'electra.encoder.layer.23.attention.self.query.bias', 'electra.encoder.layer.23.attention.self.key.weight', 'electra.encoder.layer.23.attention.self.key.bias', 'electra.encoder.layer.23.attention.self.value.weight', 'electra.encoder.layer.23.attention.self.value.bias', 'electra.encoder.layer.23.attention.output.dense.weight', 'electra.encoder.layer.23.attention.output.dense.bias', 'electra.encoder.layer.23.attention.output.LayerNorm.weight', 'electra.encoder.layer.23.attention.output.LayerNorm.bias', 'electra.encoder.layer.23.intermediate.dense.weight', 'electra.encoder.layer.23.intermediate.dense.bias', 'electra.encoder.layer.23.output.dense.weight', 'electra.encoder.layer.23.output.dense.bias', 'electra.encoder.layer.23.output.LayerNorm.weight', 'electra.encoder.layer.23.output.LayerNorm.bias', 'generator_predictions.LayerNorm.weight', 'generator_predictions.LayerNorm.bias', 'generator_predictions.dense.weight', 'generator_predictions.dense.bias', 'generator_lm_head.weight', 'generator_lm_head.bias']
    - This IS expected if you are initializing BertForMultiTaskClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    - This IS NOT expected if you are initializing BertForMultiTaskClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertForMultiTaskClassification were not initialized from the model checkpoint at hfl/chinese-electra-small-ex-generator and are newly initialized: ['embeddings.word_embeddings.weight', 'embeddings.position_embeddings.weight', 'embeddings.token_type_embeddings.weight', 'embeddings.LayerNorm.weight', 'embeddings.LayerNorm.bias', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.0.attention.self.value.weight', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.0.attention.output.dense.weight', 'encoder.layer.0.attention.output.dense.bias', 'encoder.layer.0.attention.output.LayerNorm.weight', 'encoder.layer.0.attention.output.LayerNorm.bias', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.1.attention.self.query.weight', 'encoder.layer.1.attention.self.query.bias', 'encoder.layer.1.attention.self.key.weight', 'encoder.layer.1.attention.self.key.bias', 'encoder.layer.1.attention.self.value.weight', 'encoder.layer.1.attention.self.value.bias', 'encoder.layer.1.attention.output.dense.weight', 'encoder.layer.1.attention.output.dense.bias', 'encoder.layer.1.attention.output.LayerNorm.weight', 'encoder.layer.1.attention.output.LayerNorm.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.1.output.dense.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.2.attention.self.query.weight', 'encoder.layer.2.attention.self.query.bias', 'encoder.layer.2.attention.self.key.weight', 'encoder.layer.2.attention.self.key.bias', 'encoder.layer.2.attention.self.value.weight', 'encoder.layer.2.attention.self.value.bias', 'encoder.layer.2.attention.output.dense.weight', 'encoder.layer.2.attention.output.dense.bias', 'encoder.layer.2.attention.output.LayerNorm.weight', 'encoder.layer.2.attention.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.2.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.3.attention.self.query.weight', 'encoder.layer.3.attention.self.query.bias', 'encoder.layer.3.attention.self.key.weight', 'encoder.layer.3.attention.self.key.bias', 'encoder.layer.3.attention.self.value.weight', 'encoder.layer.3.attention.self.value.bias', 'encoder.layer.3.attention.output.dense.weight', 'encoder.layer.3.attention.output.dense.bias', 'encoder.layer.3.attention.output.LayerNorm.weight', 'encoder.layer.3.attention.output.LayerNorm.bias', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.3.output.dense.weight', 'encoder.layer.3.output.dense.bias', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.4.attention.self.query.weight', 'encoder.layer.4.attention.self.query.bias', 'encoder.layer.4.attention.self.key.weight', 'encoder.layer.4.attention.self.key.bias', 'encoder.layer.4.attention.self.value.weight', 'encoder.layer.4.attention.self.value.bias', 'encoder.layer.4.attention.output.dense.weight', 'encoder.layer.4.attention.output.dense.bias', 'encoder.layer.4.attention.output.LayerNorm.weight', 'encoder.layer.4.attention.output.LayerNorm.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.5.attention.self.query.weight', 'encoder.layer.5.attention.self.query.bias', 'encoder.layer.5.attention.self.key.weight', 'encoder.layer.5.attention.self.key.bias', 'encoder.layer.5.attention.self.value.weight', 'encoder.layer.5.attention.self.value.bias', 'encoder.layer.5.attention.output.dense.weight', 'encoder.layer.5.attention.output.dense.bias', 'encoder.layer.5.attention.output.LayerNorm.weight', 'encoder.layer.5.attention.output.LayerNorm.bias', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.5.output.dense.weight', 'encoder.layer.5.output.dense.bias', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.6.attention.self.query.weight', 'encoder.layer.6.attention.self.query.bias', 'encoder.layer.6.attention.self.key.weight', 'encoder.layer.6.attention.self.key.bias', 'encoder.layer.6.attention.self.value.weight', 'encoder.layer.6.attention.self.value.bias', 'encoder.layer.6.attention.output.dense.weight', 'encoder.layer.6.attention.output.dense.bias', 'encoder.layer.6.attention.output.LayerNorm.weight', 'encoder.layer.6.attention.output.LayerNorm.bias', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.6.output.dense.weight', 'encoder.layer.6.output.dense.bias', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.7.attention.self.query.weight', 'encoder.layer.7.attention.self.query.bias', 'encoder.layer.7.attention.self.key.weight', 'encoder.layer.7.attention.self.key.bias', 'encoder.layer.7.attention.self.value.weight', 'encoder.layer.7.attention.self.value.bias', 'encoder.layer.7.attention.output.dense.weight', 'encoder.layer.7.attention.output.dense.bias', 'encoder.layer.7.attention.output.LayerNorm.weight', 'encoder.layer.7.attention.output.LayerNorm.bias', 'encoder.layer.7.intermediate.dense.weight', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.7.output.dense.weight', 'encoder.layer.7.output.dense.bias', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.8.attention.self.query.weight', 'encoder.layer.8.attention.self.query.bias', 'encoder.layer.8.attention.self.key.weight', 'encoder.layer.8.attention.self.key.bias', 'encoder.layer.8.attention.self.value.weight', 'encoder.layer.8.attention.self.value.bias', 'encoder.layer.8.attention.output.dense.weight', 'encoder.layer.8.attention.output.dense.bias', 'encoder.layer.8.attention.output.LayerNorm.weight', 'encoder.layer.8.attention.output.LayerNorm.bias', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.8.output.dense.weight', 'encoder.layer.8.output.dense.bias', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.9.attention.self.query.weight', 'encoder.layer.9.attention.self.query.bias', 'encoder.layer.9.attention.self.key.weight', 'encoder.layer.9.attention.self.key.bias', 'encoder.layer.9.attention.self.value.weight', 'encoder.layer.9.attention.self.value.bias', 'encoder.layer.9.attention.output.dense.weight', 'encoder.layer.9.attention.output.dense.bias', 'encoder.layer.9.attention.output.LayerNorm.weight', 'encoder.layer.9.attention.output.LayerNorm.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.9.output.dense.weight', 'encoder.layer.9.output.dense.bias', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.10.attention.self.query.weight', 'encoder.layer.10.attention.self.query.bias', 'encoder.layer.10.attention.self.key.weight', 'encoder.layer.10.attention.self.key.bias', 'encoder.layer.10.attention.self.value.weight', 'encoder.layer.10.attention.self.value.bias', 'encoder.layer.10.attention.output.dense.weight', 'encoder.layer.10.attention.output.dense.bias', 'encoder.layer.10.attention.output.LayerNorm.weight', 'encoder.layer.10.attention.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.10.output.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.11.attention.self.query.weight', 'encoder.layer.11.attention.self.query.bias', 'encoder.layer.11.attention.self.key.weight', 'encoder.layer.11.attention.self.key.bias', 'encoder.layer.11.attention.self.value.weight', 'encoder.layer.11.attention.self.value.bias', 'encoder.layer.11.attention.output.dense.weight', 'encoder.layer.11.attention.output.dense.bias', 'encoder.layer.11.attention.output.LayerNorm.weight', 'encoder.layer.11.attention.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.11.output.dense.weight', 'encoder.layer.11.output.dense.bias', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.12.attention.self.query.weight', 'encoder.layer.12.attention.self.query.bias', 'encoder.layer.12.attention.self.key.weight', 'encoder.layer.12.attention.self.key.bias', 'encoder.layer.12.attention.self.value.weight', 'encoder.layer.12.attention.self.value.bias', 'encoder.layer.12.attention.output.dense.weight', 'encoder.layer.12.attention.output.dense.bias', 'encoder.layer.12.attention.output.LayerNorm.weight', 'encoder.layer.12.attention.output.LayerNorm.bias', 'encoder.layer.12.intermediate.dense.weight', 'encoder.layer.12.intermediate.dense.bias', 'encoder.layer.12.output.dense.weight', 'encoder.layer.12.output.dense.bias', 'encoder.layer.12.output.LayerNorm.weight', 'encoder.layer.12.output.LayerNorm.bias', 'encoder.layer.13.attention.self.query.weight', 'encoder.layer.13.attention.self.query.bias', 'encoder.layer.13.attention.self.key.weight', 'encoder.layer.13.attention.self.key.bias', 'encoder.layer.13.attention.self.value.weight', 'encoder.layer.13.attention.self.value.bias', 'encoder.layer.13.attention.output.dense.weight', 'encoder.layer.13.attention.output.dense.bias', 'encoder.layer.13.attention.output.LayerNorm.weight', 'encoder.layer.13.attention.output.LayerNorm.bias', 'encoder.layer.13.intermediate.dense.weight', 'encoder.layer.13.intermediate.dense.bias', 'encoder.layer.13.output.dense.weight', 'encoder.layer.13.output.dense.bias', 'encoder.layer.13.output.LayerNorm.weight', 'encoder.layer.13.output.LayerNorm.bias', 'encoder.layer.14.attention.self.query.weight', 'encoder.layer.14.attention.self.query.bias', 'encoder.layer.14.attention.self.key.weight', 'encoder.layer.14.attention.self.key.bias', 'encoder.layer.14.attention.self.value.weight', 'encoder.layer.14.attention.self.value.bias', 'encoder.layer.14.attention.output.dense.weight', 'encoder.layer.14.attention.output.dense.bias', 'encoder.layer.14.attention.output.LayerNorm.weight', 'encoder.layer.14.attention.output.LayerNorm.bias', 'encoder.layer.14.intermediate.dense.weight', 'encoder.layer.14.intermediate.dense.bias', 'encoder.layer.14.output.dense.weight', 'encoder.layer.14.output.dense.bias', 'encoder.layer.14.output.LayerNorm.weight', 'encoder.layer.14.output.LayerNorm.bias', 'encoder.layer.15.attention.self.query.weight', 'encoder.layer.15.attention.self.query.bias', 'encoder.layer.15.attention.self.key.weight', 'encoder.layer.15.attention.self.key.bias', 'encoder.layer.15.attention.self.value.weight', 'encoder.layer.15.attention.self.value.bias', 'encoder.layer.15.attention.output.dense.weight', 'encoder.layer.15.attention.output.dense.bias', 'encoder.layer.15.attention.output.LayerNorm.weight', 'encoder.layer.15.attention.output.LayerNorm.bias', 'encoder.layer.15.intermediate.dense.weight', 'encoder.layer.15.intermediate.dense.bias', 'encoder.layer.15.output.dense.weight', 'encoder.layer.15.output.dense.bias', 'encoder.layer.15.output.LayerNorm.weight', 'encoder.layer.15.output.LayerNorm.bias', 'encoder.layer.16.attention.self.query.weight', 'encoder.layer.16.attention.self.query.bias', 'encoder.layer.16.attention.self.key.weight', 'encoder.layer.16.attention.self.key.bias', 'encoder.layer.16.attention.self.value.weight', 'encoder.layer.16.attention.self.value.bias', 'encoder.layer.16.attention.output.dense.weight', 'encoder.layer.16.attention.output.dense.bias', 'encoder.layer.16.attention.output.LayerNorm.weight', 'encoder.layer.16.attention.output.LayerNorm.bias', 'encoder.layer.16.intermediate.dense.weight', 'encoder.layer.16.intermediate.dense.bias', 'encoder.layer.16.output.dense.weight', 'encoder.layer.16.output.dense.bias', 'encoder.layer.16.output.LayerNorm.weight', 'encoder.layer.16.output.LayerNorm.bias', 'encoder.layer.17.attention.self.query.weight', 'encoder.layer.17.attention.self.query.bias', 'encoder.layer.17.attention.self.key.weight', 'encoder.layer.17.attention.self.key.bias', 'encoder.layer.17.attention.self.value.weight', 'encoder.layer.17.attention.self.value.bias', 'encoder.layer.17.attention.output.dense.weight', 'encoder.layer.17.attention.output.dense.bias', 'encoder.layer.17.attention.output.LayerNorm.weight', 'encoder.layer.17.attention.output.LayerNorm.bias', 'encoder.layer.17.intermediate.dense.weight', 'encoder.layer.17.intermediate.dense.bias', 'encoder.layer.17.output.dense.weight', 'encoder.layer.17.output.dense.bias', 'encoder.layer.17.output.LayerNorm.weight', 'encoder.layer.17.output.LayerNorm.bias', 'encoder.layer.18.attention.self.query.weight', 'encoder.layer.18.attention.self.query.bias', 'encoder.layer.18.attention.self.key.weight', 'encoder.layer.18.attention.self.key.bias', 'encoder.layer.18.attention.self.value.weight', 'encoder.layer.18.attention.self.value.bias', 'encoder.layer.18.attention.output.dense.weight', 'encoder.layer.18.attention.output.dense.bias', 'encoder.layer.18.attention.output.LayerNorm.weight', 'encoder.layer.18.attention.output.LayerNorm.bias', 'encoder.layer.18.intermediate.dense.weight', 'encoder.layer.18.intermediate.dense.bias', 'encoder.layer.18.output.dense.weight', 'encoder.layer.18.output.dense.bias', 'encoder.layer.18.output.LayerNorm.weight', 'encoder.layer.18.output.LayerNorm.bias', 'encoder.layer.19.attention.self.query.weight', 'encoder.layer.19.attention.self.query.bias', 'encoder.layer.19.attention.self.key.weight', 'encoder.layer.19.attention.self.key.bias', 'encoder.layer.19.attention.self.value.weight', 'encoder.layer.19.attention.self.value.bias', 'encoder.layer.19.attention.output.dense.weight', 'encoder.layer.19.attention.output.dense.bias', 'encoder.layer.19.attention.output.LayerNorm.weight', 'encoder.layer.19.attention.output.LayerNorm.bias', 'encoder.layer.19.intermediate.dense.weight', 'encoder.layer.19.intermediate.dense.bias', 'encoder.layer.19.output.dense.weight', 'encoder.layer.19.output.dense.bias', 'encoder.layer.19.output.LayerNorm.weight', 'encoder.layer.19.output.LayerNorm.bias', 'encoder.layer.20.attention.self.query.weight', 'encoder.layer.20.attention.self.query.bias', 'encoder.layer.20.attention.self.key.weight', 'encoder.layer.20.attention.self.key.bias', 'encoder.layer.20.attention.self.value.weight', 'encoder.layer.20.attention.self.value.bias', 'encoder.layer.20.attention.output.dense.weight', 'encoder.layer.20.attention.output.dense.bias', 'encoder.layer.20.attention.output.LayerNorm.weight', 'encoder.layer.20.attention.output.LayerNorm.bias', 'encoder.layer.20.intermediate.dense.weight', 'encoder.layer.20.intermediate.dense.bias', 'encoder.layer.20.output.dense.weight', 'encoder.layer.20.output.dense.bias', 'encoder.layer.20.output.LayerNorm.weight', 'encoder.layer.20.output.LayerNorm.bias', 'encoder.layer.21.attention.self.query.weight', 'encoder.layer.21.attention.self.query.bias', 'encoder.layer.21.attention.self.key.weight', 'encoder.layer.21.attention.self.key.bias', 'encoder.layer.21.attention.self.value.weight', 'encoder.layer.21.attention.self.value.bias', 'encoder.layer.21.attention.output.dense.weight', 'encoder.layer.21.attention.output.dense.bias', 'encoder.layer.21.attention.output.LayerNorm.weight', 'encoder.layer.21.attention.output.LayerNorm.bias', 'encoder.layer.21.intermediate.dense.weight', 'encoder.layer.21.intermediate.dense.bias', 'encoder.layer.21.output.dense.weight', 'encoder.layer.21.output.dense.bias', 'encoder.layer.21.output.LayerNorm.weight', 'encoder.layer.21.output.LayerNorm.bias', 'encoder.layer.22.attention.self.query.weight', 'encoder.layer.22.attention.self.query.bias', 'encoder.layer.22.attention.self.key.weight', 'encoder.layer.22.attention.self.key.bias', 'encoder.layer.22.attention.self.value.weight', 'encoder.layer.22.attention.self.value.bias', 'encoder.layer.22.attention.output.dense.weight', 'encoder.layer.22.attention.output.dense.bias', 'encoder.layer.22.attention.output.LayerNorm.weight', 'encoder.layer.22.attention.output.LayerNorm.bias', 'encoder.layer.22.intermediate.dense.weight', 'encoder.layer.22.intermediate.dense.bias', 'encoder.layer.22.output.dense.weight', 'encoder.layer.22.output.dense.bias', 'encoder.layer.22.output.LayerNorm.weight', 'encoder.layer.22.output.LayerNorm.bias', 'encoder.layer.23.attention.self.query.weight', 'encoder.layer.23.attention.self.query.bias', 'encoder.layer.23.attention.self.key.weight', 'encoder.layer.23.attention.self.key.bias', 'encoder.layer.23.attention.self.value.weight', 'encoder.layer.23.attention.self.value.bias', 'encoder.layer.23.attention.output.dense.weight', 'encoder.layer.23.attention.output.dense.bias', 'encoder.layer.23.attention.output.LayerNorm.weight', 'encoder.layer.23.attention.output.LayerNorm.bias', 'encoder.layer.23.intermediate.dense.weight', 'encoder.layer.23.intermediate.dense.bias', 'encoder.layer.23.output.dense.weight', 'encoder.layer.23.output.dense.bias', 'encoder.layer.23.output.LayerNorm.weight', 'encoder.layer.23.output.LayerNorm.bias', 'pooler.dense.weight', 'pooler.dense.bias', 'new_classifiers.ocnli.weight', 'new_classifiers.ocnli.bias', 'new_classifiers.tnews.bias', 'new_classifiers.tnews.weight', 'new_classifiers.ocemotion.weight', 'new_classifiers.ocemotion.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    
    stale 
    opened by SunYanCN 6
  • 请问下这个项目和google electra 项目的区别

    请问下这个项目和google electra 项目的区别

    您好 , 关于咱们这个项目和 Google electra 的区别。 我个人理解就是两个不同 一个是我们提供了已有的中文预训练模型, 另一个是词表 我们用的21128 也即是bert中文的那个词表 。假如我们要继续预训练 我们要用 build_dataset.py 用21128词表产生tfrecord 然后进行。 其余的 比如 run_pretraining.py run_finetuning.py 和 configure_finetune configure_pretrain 几个文件都和google那个英文版完全一样的 请问是这样吗。 谢谢

    stale 
    opened by 652994331 6
  • huggingface.co上chinese-electra-180g-small-discriminator的文件里缺少tokenizer.json

    huggingface.co上chinese-electra-180g-small-discriminator的文件里缺少tokenizer.json

    您好,下载模型及相关文件时(https://huggingface.co/hfl/chinese-electra-180g-small-discriminator/tree/main)未发现tokenizer.json,能否生成然后上传一下。 看google/electra-base-discriminator(https://huggingface.co/google/electra-base-discriminator)是有tokenizer.json的,这个tokenizer.json是不能直接用于chinese-electra-180g-small-discriminator模型吧

    opened by zihoudanyuan 5
  • 关于ELECTRA的预训练方法

    关于ELECTRA的预训练方法

    ELECTRA由生成器+判决器构成,生成器负责把[MASK]替换成实际tokens,判决器负责区分替换结果是否和实际data中相同

    但是,论文提及:

    1. Typically k = [0.15n], i.e., 15% of the tokens are masked out

    2. if the generator happens to generate the correct token, that token is considered “real” instead of “fake

    也就是说,只有 (1 - generator_inference_acc) * 0.15的token会被预测成fake,训练到后期就是一个极度不均衡的二分类问题(对于判决器而言),为什么判决器不会受到影响呢?

    opened by real-brilliant 5
  • docs: demo, experiments and live inference API on Tiyaro

    docs: demo, experiments and live inference API on Tiyaro

    Hello Maintainer of Github repo ymcui/Chinese-ELECTRA!

    Thank you for your work on ymcui/Chinese-ELECTRA. This GitHub project is interesting, and we think that it would be a great addition to make this work instantly discoverable & available as an API for all your users, to quickly try and use it in their applications.

    The list of model card(s) covered by this PR are:

    • https://console.tiyaro.ai/explore/hfl-chinese-bert-wwm
    • https://console.tiyaro.ai/explore/hfl-chinese-bert-wwm-ext
    • https://console.tiyaro.ai/explore/hfl-chinese-macbert-base
    • https://console.tiyaro.ai/explore/hfl-chinese-roberta-wwm-ext
    • https://console.tiyaro.ai/explore/hfl-chinese-roberta-wwm-ext-large
    • https://console.tiyaro.ai/explore/hfl-rbt3
    • https://console.tiyaro.ai/explore/hfl-rbt6

    On Tiyaro, every model in ymcui/Chinese-ELECTRA will get its own:

    • Dedicated model card (e.g. https://console.tiyaro.ai/explore/hfl-chinese-bert-wwm
    • Model demo (e.g. https://console.tiyaro.ai/explore/hfl-chinese-bert-wwm/demo)
    • Unique Inference API (e.g. https://api.tiyaro.ai/explore/huggingface/1//hfl/chinese-bert-wwm)
    • Sample code snippets and swagger spec for the API

    Users will also be able to compare your model with other models of similar types on various parameters using Tiyaro Experiments (https://tiyaro.ai/blog/ocr/)

    —- I am from Tiyaro.ai (https://tiyaro.ai/). We are working on enabling developers to instantly evaluate, use and customize the world’s best AI. We are constantly working on adding new features to Tiyaro EasyTrain, EasyServe & Experiments, to make the best use of your ML model, and making AI more accessible for anyone.

    Sincerely, I-Jong Lin

    stale 
    opened by ijonglin 4
  • Bert service 使用 chinese-electra-base 模型,输出结果有问题

    Bert service 使用 chinese-electra-base 模型,输出结果有问题

    环境:

    PaddleHub 1.8.1 PaddlePaddle 1.8.4 paddle-gpu-serving >=0.8.2 ujson >=1.35

    # coding: utf8
    from paddlehub.serving.bert_serving import bs_client
    
    if __name__ == "__main__":
        # 初始化bert_service客户端BSClient
        bc = bs_client.BSClient(module_name="ernie", server="127.0.0.1:8866")
    
        # 输入要做embedding的文本
        # 文本格式为[["文本1"]]
        input_text = [
            ["西风吹老洞庭波"]
        ]
    
        # BSClient.get_result()获取结果
        result = bc.get_result(input_text=input_text)
    
        # 打印输入文本的embedding结果
        for item in result:
            print(item)
    

    在使用 bert_service_client 时,引入erine 模型的 get_result是一个数组,但是 chinese-electra-base 就只有一个值, 请问是为什么呢?然后可以让 chinese-electra-base 模型的输出结果也是一个和 erine 结果维度一样的数组吗?

    image image

    wontfix stale 
    opened by Echo0117 4
  • 请问finetune时应如何设置token type id?

    请问finetune时应如何设置token type id?

    在Bert中若处理输入为两个句子的相关任务(例如语义相似性打分等),常使用token_type_embedding对两个句子分别加上不同的embedding;这一做法只需要在transformers的API中设置token_type_id(一个句子全为0,另一个全为1)即可实现。 然而,electra的预训练好像取消了NSP任务,相应的也就没有训练这个句子embedding(抱歉我不是很确定,只是看了一下论文好像没写这一点😂),所以我想请教一下使用token_type_id这一做法是否在electra中也可以通用呢?如果不行,对于两个句子的输入,推荐的处理方法是什么呢?谢谢!

    opened by AOZMH 4
  • ELECTRA-small, Chinese 权重丢了5个?

    ELECTRA-small, Chinese 权重丢了5个?

    提供开源模型的老师,您好,我通过讯飞云下载的模型, 解析ELECTRA-small模型发现有五个权重找不到。 error electra/encoder/layer_9/intermediate/dense/kernel error electra/encoder/layer_9/output/LayerNorm/beta error electra/encoder/layer_9/output/LayerNorm/gamma error electra/encoder/layer_9/output/dense/bias error electra/encoder/layer_9/output/dense/kernel

    请检查是否存在该问题,十分感谢。

    换了个源下载是ok的,抱歉

    opened by qgzang 4
  • 最好还是配上对应的json

    最好还是配上对应的json

    解压后发现没有json文件,而不少框架都是根据json文件来读取模型基本结构的,建议还是配上。

    比如small版

    {
      "attention_probs_dropout_prob": 0.1,
      "directionality": "bidi",
      "hidden_act": "gelu",
      "hidden_dropout_prob": 0.1,
      "hidden_size": 256,
      "initializer_range": 0.02,
      "intermediate_size": 1024,
      "max_position_embeddings": 512,
      "num_attention_heads": 4,
      "num_hidden_layers": 12,
      "pooler_fc_size": 768,
      "pooler_num_attention_heads": 12,
      "pooler_num_fc_layers": 3,
      "pooler_size_per_head": 128,
      "pooler_type": "first_token_transform",
      "type_vocab_size": 2,
      "vocab_size": 21128,
      "embedding_size": 128
    }
    

    还有,不知道为啥ckpt的命名不加上ckpt...

    最后,最新版bert4keras(0.6.4)已经能加载electra了,只需要在build_transformer_model里边传入model='electra',欢迎用bert4keras调用哈哈~

    opened by bojone 4
Owner
Yiming Cui
NLP Researcher. Mainly interested in Machine Reading Comprehension, Question Answering, Pre-trained Language Model, etc.
Yiming Cui
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 8, 2023
Chinese NER with albert/electra or other bert descendable model (keras)

Chinese NLP (albert/electra with Keras) Named Entity Recognization Project Structure ./ ├── NER │   ├── __init__.py │   ├── log

null 2 Nov 20, 2022
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.

Benjamin Heinzerling 1.1k Jan 3, 2023
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

DaDa 106 Dec 29, 2022
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 8, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022
Python wrapper for Stanford CoreNLP tools v3.4.1

Python interface to Stanford Core NLP tools v3.4.1 This is a Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. It can eit

Dustin Smith 610 Sep 7, 2022
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

Stanford NLP 6.4k Jan 2, 2023
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

Stanford NLP 5.2k Feb 12, 2021
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

Stanford NLP 5.2k Feb 17, 2021
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

null 117 Jan 7, 2023
ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

fastNLP 48 Dec 14, 2022
CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training This is the official repository for the code and models of the paper CCQA: A N

Meta Research 29 Nov 30, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

null 44 Dec 31, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 3, 2023