CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

fastNLP

Last update: Dec 29, 2022

Related tags

Deep Learning text-generation chinese pretrained-models ptms language-understanding transformer-architecture

Overview

CPT

This repository contains code and checkpoints for CPT.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu

Introduction

Aiming to unify both NLU and NLG tasks, We propose a novel Chinese Pre-trained Un-balanced Transformer (CPT), which is an unbalanced Transformer encoder-decoder pre-trained with MLM and DAE jointly.

The architecture of CPT is a variant of the full Transformer and consists of three parts:

Shared Encoder (S-Enc): a Transformer encoder with fully-connected self-attention, which is designed to capture the common semantic representation for both language understanding and generation.
Understanding Decoder (U-Dec): a shallow Transformer encoder with fully-connected self-attention, which is designed for NLU tasks. The input of U-Dec is the output of S-Enc.
Generation Decoder (G-Dec): a Transformer decoder with masked self-attention, which is designed for generation tasks with auto-regressive fashion. G-Dec utilizes the output of S-Enc with cross-attention.

Downloads & Usage

Coming soon.

Chinese BART

We also provide a pre-trained Chinese BART as a byproduct. The BART models is pre-trained with the same corpora, tokenization and hyper-parameters of CPT.

Load with Huggingface-Transformers

Chinese BART is available in base and large versions, and can be loaded with Huggingface-Transformers. The example code is as follows, where MODEL_NAME is fnlp/bart-base-chinese or fnlp/bart-large-chinese for base or large size of BART, respectively.

>>> tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
>>> model = BartForConditionalGeneration.from_pretrained("MODEL_NAME")

Citation

@article{shao2021cpt,
  title={CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation}, 
  author={Yunfan Shao and Zhichao Geng and Yitao Liu and Junqi Dai and Fei Yang and Li Zhe and Hujun Bao and Xipeng Qiu},
  journal={arXiv preprint arXiv:2109.05729},
  year={2021}
}

Comments

generation/LCSTS数据集上效果没达到
您好，我最近在LCSTS数据集上跑了您的代码，结果只有rouge-L：31，论文给的结果是38左右，差很多。

数据集直接在网上下载然后处理成如下格式：

{"summarization": "可穿戴技术十大设计原则", "article": "本文总结了十个可穿戴产品的设计原则，而这些原则，同样也是笔者认为是这个行业最吸引人的地方：1.为人们解决重复性问题；2.从人开始，而不是从机器开始；3.要引起注意，但不要刻意；4.提升用户能力，而不是取代人"}

代码只修改了文件路径，其余无改动。请问问题可能出在哪里呢？ run_gen.py中的默认超参数，是否是最优的超参数呢？
opened by zhoucz97 17
用huggingface代码直接进行BART large fineturning出现繁体字

以下为训练集的数据，训练了1000epoch，可以看到不仅预算变成了預算（繁简），而且A=SM变成了a=sm（大小写），也就是连训练集都没有拟合，训练过程loss是接近于0的

生成: 题目：《sm公司全面预算管理问题研究》，句式：a，其中a=sm公园公司的全面預算管辖问题探究 label: 题目：《SM 公司全面预算管理问题研究》，句式：A，其中A=SM 公司全面预算管理问题研究

想问下可能的原因

opened by yht4work 7
ner模型的问题

按照您提供的运行指令 python -m torch.distributed.launch --nproc_per_node 1 --nnodes 1
train_msra.py
--ptm_name fnlp/cpt-base
--dataset ''
--use_decoder 0
--batch_size 16
--update_every 1 运行以后，会报如下错误： RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

opened by suhejian 6
CPT多GPU卡finetuning训练报错

设置完模型参数后，使用python -m torch.distributed.launch --nproc_per_node 4 run_gen.py报错，local_rank需要作为参数进行传入，若parser.add_argument中增加--local_rank传入参数，整体多GPU训练报错。请问该如何对CPT进行多GPU卡的finetuning。恳求大佬给一份官方的使用说明！！感谢！

opened by aidejieceng 4

CPTForConditionalGeneration使用多GPU报错

如题，在做生成任务时，使用多GPU调用该接口会出现如下报错：

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 2: 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388

需要注意的是，一模一样的代码，我将CPTForConditionalGeneration接口换成BartForConditionalGeneration使用相应模型不会出现任何问题，请检查一下

opened by Biaocsu 4

可否提供run_gen.py的bart版本？

路径下CPT/blob/master/finetune/generation/run_gen.py是CPT的版本我自己按照这个改了一个bart版本，但是显示有很多层not used或者not initialized。 Some weights of the model checkpoint at model/bart-base-chinese were not used when initializing BartForConditionalGeneration Some weights of BartForConditionalGeneration were not initialized 不知道这些警告是否有影响，或者能否提供一个run_gen.py的bart版本？

详细信息如下所示：

loading weights file model/bart-base-chinese/pytorch_model.bin
Some weights of the model checkpoint at model/bart-base-chinese were not used when initializing BartForConditionalGeneration: ['encoder.layers.4.fc1.bias',
 'encoder.layers.0.self_attn.k_proj.bias',
 'encoder.layers.3.fc1.bias',
 'encoder.layers.4.fc1.weight',
 'encoder.layers.1.final_layer_norm.bias',
 'encoder.layers.0.fc2.weight',
 'encoder.layers.0.self_attn.out_proj.bias',
 'encoder.layers.1.self_attn.out_proj.weight',
 'encoder.layers.3.self_attn.k_proj.bias',
 'encoder.layernorm_embedding.weight',
 'encoder.layers.1.fc2.weight',
 'encoder.layers.5.self_attn.q_proj.weight',
 'encoder.layers.5.self_attn.q_proj.bias',
 'encoder.layers.0.final_layer_norm.weight',
 'encoder.layers.1.self_attn.v_proj.weight',
 'encoder.layers.4.self_attn.out_proj.weight',
 'encoder.layers.5.self_attn_layer_norm.bias',
 'encoder.layers.0.self_attn_layer_norm.bias',
 'encoder.layers.3.self_attn.k_proj.weight',
 'encoder.embed_tokens.weight',
 'encoder.layers.1.self_attn.v_proj.bias',
 'encoder.layers.5.final_layer_norm.bias',
 'encoder.layers.1.fc1.weight',
 'encoder.layers.5.self_attn_layer_norm.weight',
 'encoder.layers.2.fc1.weight',
 'encoder.layers.0.final_layer_norm.bias',
 'encoder.layers.1.fc2.bias',
 'encoder.layers.3.self_attn.v_proj.weight',
 'encoder.layers.3.final_layer_norm.bias',
 'encoder.layers.2.fc1.bias',
 'encoder.layers.3.self_attn.q_proj.weight',
 'encoder.layers.1.final_layer_norm.weight',
 'encoder.layers.4.fc2.bias',
 'encoder.layers.4.self_attn.out_proj.bias',
 'encoder.layers.2.self_attn.q_proj.weight',
 'encoder.layers.2.final_layer_norm.weight',
 'encoder.embed_positions.weight',
 'encoder.layers.3.self_attn.out_proj.bias',
 'encoder.layers.3.fc1.weight',
 'encoder.layers.1.fc1.bias',
 'encoder.layers.0.self_attn.k_proj.weight',
 'encoder.layers.1.self_attn.k_proj.bias',
 'encoder.layers.0.fc2.bias',
 'encoder.layers.1.self_attn.k_proj.weight',
 'encoder.layers.5.self_attn.v_proj.bias',
 'encoder.layers.1.self_attn.q_proj.weight',
 'encoder.layers.2.final_layer_norm.bias',
 'encoder.layers.4.self_attn_layer_norm.weight',
 'encoder.layers.4.self_attn.v_proj.bias',
 'encoder.layers.2.self_attn_layer_norm.weight',
 'encoder.layers.0.fc1.weight',
 'encoder.layers.4.self_attn.k_proj.bias',
 'encoder.layers.0.self_attn.q_proj.bias',
 'encoder.layers.4.final_layer_norm.bias',
 'encoder.layers.0.self_attn.v_proj.weight',
 'encoder.layers.3.final_layer_norm.weight',
 'encoder.layers.5.self_attn.out_proj.weight',
 'encoder.layers.4.self_attn.q_proj.weight',
 'encoder.layers.0.self_attn_layer_norm.weight',
 'encoder.layers.5.self_attn.v_proj.weight',
 'encoder.layers.2.self_attn.v_proj.weight',
 'encoder.layers.1.self_attn.out_proj.bias',
 'encoder.layers.2.self_attn.k_proj.bias',
 'encoder.layers.2.self_attn.out_proj.weight',
 'encoder.layers.3.self_attn.v_proj.bias',
 'encoder.layers.2.self_attn.q_proj.bias',
 'encoder.layers.2.self_attn.out_proj.bias',
 'encoder.layers.3.fc2.bias',
 'encoder.layers.5.fc1.weight',
 'encoder.layernorm_embedding.bias',
 'encoder.layers.0.fc1.bias',
 'encoder.layers.3.self_attn_layer_norm.bias',
 'encoder.layers.5.self_attn.k_proj.weight',
 'encoder.layers.5.fc1.bias',
 'encoder.layers.3.fc2.weight',
 'encoder.layers.4.fc2.weight',
 'encoder.layers.0.self_attn.v_proj.bias',
 'encoder.layers.0.self_attn.q_proj.weight',
 'encoder.layers.1.self_attn.q_proj.bias',
 'encoder.layers.3.self_attn_layer_norm.weight',
 'encoder.layers.2.self_attn.k_proj.weight',
 'encoder.layers.2.self_attn.v_proj.bias',
 'encoder.layers.5.final_layer_norm.weight',
 'encoder.layers.5.self_attn.out_proj.bias',
 'encoder.layers.0.self_attn.out_proj.weight',
 'encoder.layers.5.fc2.weight',
 'encoder.layers.5.fc2.bias',
 'encoder.layers.1.self_attn_layer_norm.bias',
 'encoder.layers.4.self_attn.k_proj.weight',
 'encoder.layers.5.self_attn.k_proj.bias',
 'encoder.layers.3.self_attn.q_proj.bias',
 'encoder.layers.4.self_attn.q_proj.bias',
 'encoder.layers.1.self_attn_layer_norm.weight',
 'encoder.layers.2.self_attn_layer_norm.bias',
 'encoder.layers.4.final_layer_norm.weight',
 'encoder.layers.4.self_attn.v_proj.weight',
 'encoder.layers.2.fc2.weight',
 'encoder.layers.2.fc2.bias',
 'encoder.layers.4.self_attn_layer_norm.bias',
 'encoder.layers.3.self_attn.out_proj.weight']
- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BartForConditionalGeneration were not initialized from the model checkpoint at model/bart-base-chinese and are newly initialized: 
['encoder.encoder.layer.1.output.dense.bias',
 'encoder.encoder.layer.3.attention.self.key.bias',
 'encoder.encoder.layer.3.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.4.attention.self.value.bias',
 'encoder.encoder.layer.2.attention.output.dense.bias',
 'encoder.encoder.layer.4.output.LayerNorm.bias',
 'encoder.encoder.layer.4.output.LayerNorm.weight',
 'encoder.encoder.layer.4.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.0.intermediate.dense.bias',
 'encoder.encoder.layer.5.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.0.output.LayerNorm.bias',
 'encoder.encoder.layer.5.attention.output.LayerNorm.bias',
 'encoder.encoder.layer.2.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.2.attention.self.key.weight',
 'encoder.embeddings.LayerNorm.weight',
 'encoder.encoder.layer.0.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.1.attention.self.key.bias',
 'encoder.encoder.layer.3.intermediate.dense.weight',
 'encoder.encoder.layer.5.intermediate.dense.weight',
 'encoder.encoder.layer.0.output.dense.weight',
 'encoder.encoder.layer.5.output.LayerNorm.bias',
 'encoder.encoder.layer.1.output.dense.weight',
 'encoder.encoder.layer.5.attention.self.query.weight',
 'encoder.encoder.layer.1.output.LayerNorm.weight',
 'encoder.encoder.layer.4.attention.self.key.bias',
 'encoder.encoder.layer.3.output.LayerNorm.bias',
 'encoder.encoder.layer.5.output.dense.bias',
 'encoder.encoder.layer.4.attention.self.key.weight',
 'encoder.encoder.layer.0.attention.self.key.bias',
 'encoder.encoder.layer.0.attention.self.query.weight',
 'encoder.encoder.layer.0.intermediate.dense.weight',
 'encoder.encoder.layer.3.output.LayerNorm.weight',
 'encoder.encoder.layer.3.attention.output.dense.bias',
 'encoder.encoder.layer.5.output.dense.weight',
 'encoder.embeddings.LayerNorm.bias',
 'encoder.encoder.layer.1.attention.self.value.weight',
 'encoder.encoder.layer.2.output.dense.weight',
 'encoder.encoder.layer.4.intermediate.dense.weight',
 'encoder.encoder.layer.2.attention.self.value.weight',
 'encoder.encoder.layer.0.attention.self.value.weight',
 'encoder.encoder.layer.0.attention.output.dense.bias',
 'encoder.encoder.layer.2.attention.output.LayerNorm.bias',
 'encoder.encoder.layer.3.output.dense.bias',
 'encoder.encoder.layer.5.output.LayerNorm.weight',
 'encoder.encoder.layer.5.attention.output.dense.bias',
 'encoder.encoder.layer.4.attention.self.value.weight',
 'encoder.encoder.layer.3.attention.self.query.bias',
 'encoder.encoder.layer.3.attention.self.value.weight',
 'encoder.encoder.layer.3.attention.self.key.weight',
 'encoder.encoder.layer.0.output.dense.bias',
 'encoder.encoder.layer.1.intermediate.dense.bias',
 'encoder.encoder.layer.0.attention.self.query.bias',
 'encoder.encoder.layer.1.intermediate.dense.weight',
 'encoder.encoder.layer.0.attention.output.dense.weight',
 'encoder.encoder.layer.5.attention.self.value.bias',
 'encoder.embeddings.token_type_embeddings.weight',
 'encoder.encoder.layer.1.attention.output.dense.weight',
 'encoder.encoder.layer.2.attention.self.query.bias',
 'encoder.encoder.layer.2.attention.self.query.weight',
 'encoder.encoder.layer.2.attention.output.dense.weight',
 'encoder.encoder.layer.5.attention.self.query.bias',
 'encoder.embeddings.position_ids',
 'encoder.embeddings.position_embeddings.weight',
 'encoder.encoder.layer.3.attention.self.query.weight',
 'encoder.embeddings.word_embeddings.weight',
 'encoder.encoder.layer.4.output.dense.bias',
 'encoder.encoder.layer.1.attention.output.LayerNorm.weight',
 'encoder.encoder.layer.4.attention.self.query.bias',
 'encoder.encoder.layer.3.attention.self.value.bias',
 'encoder.encoder.layer.5.intermediate.dense.bias',
 'encoder.encoder.layer.1.output.LayerNorm.bias',
 'encoder.encoder.layer.3.attention.output.dense.weight',
 'encoder.encoder.layer.3.attention.output.LayerNorm.bias',
 'encoder.encoder.layer.2.output.LayerNorm.weight',
 'encoder.encoder.layer.4.attention.output.dense.weight',
 'encoder.encoder.layer.4.intermediate.dense.bias',
 'encoder.encoder.layer.2.attention.self.value.bias',
 'encoder.encoder.layer.0.attention.self.key.weight',
 'encoder.encoder.layer.1.attention.self.query.weight',
 'encoder.encoder.layer.2.intermediate.dense.bias',
 'encoder.encoder.layer.2.intermediate.dense.weight',
 'encoder.encoder.layer.5.attention.self.key.bias',
 'encoder.encoder.layer.2.attention.self.key.bias',
 'encoder.encoder.layer.2.output.LayerNorm.bias',
 'encoder.encoder.layer.5.attention.self.key.weight',
 'encoder.encoder.layer.0.attention.output.LayerNorm.bias',
 'encoder.encoder.layer.5.attention.self.value.weight',
 'encoder.encoder.layer.4.attention.output.dense.bias',
 'encoder.encoder.layer.1.attention.output.LayerNorm.bias',
 'encoder.encoder.layer.1.attention.output.dense.bias',
 'encoder.encoder.layer.5.attention.output.dense.weight',
 'encoder.encoder.layer.4.output.dense.weight',
 'encoder.encoder.layer.0.attention.self.value.bias',
 'encoder.encoder.layer.1.attention.self.value.bias',
 'encoder.encoder.layer.0.output.LayerNorm.weight',
 'encoder.encoder.layer.1.attention.self.key.weight',
 'encoder.encoder.layer.3.intermediate.dense.bias',
 'encoder.encoder.layer.1.attention.self.query.bias',
 'encoder.encoder.layer.4.attention.self.query.weight',
 'encoder.encoder.layer.3.output.dense.weight',
 'encoder.encoder.layer.2.output.dense.bias',
 'encoder.encoder.layer.4.attention.output.LayerNorm.bias']

opened by 6666ev 3

有关fnlp/bart-base-chinese模型加载问题

你好：我参考A Unified Generative Framework for Aspect-Based Sentiment这篇文章，想用这个模型作中文的ABSA，于是我将原文的facebook/bart-base替换成fnlp/bart-base-chinese，但是我这里有以下几个问题： 1：transformers在4.4.1版本加载模型时会报错：RuntimeError: Error(s) in loading state_dict for BartModel: size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([514, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([514, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]). 这主要是在这里：model = BartSeq2SeqModel.build_model(bart_name, tokenizer, label_ids=label_ids, decoder_type=decoder_type,copy_gate=False, use_encoder_mlp=use_encoder_mlp, use_recur_pos=False) 2：facebook提供的batr-base中有一些文件是merges.txt和json形式的vocab，这与您在huggingface上提供的不一致。我将您在 huggingface上提供的有关bart-base-chinese提供的文件用tokenizer.from_pretrained("bart-base-chinese")使用时，pytorch报错： OSError: Can't load tokenizer for 'bart-base-chinese'. Make sure that: - 'bart-base-chinese' is a correct model identifier listed on 'https://huggingface.co/models' - or 'bart-base-chinese' is the correct path to a directory containing relevant tokenizer files 请问这个该怎么解决？

opened by yedongyu1996 2
使用自定义数据集在bart-base-chinese的继续pretrain

我想要在自己的数据集上使用Huggingface已经开源的bart-base-chinese的继续pretrain流程，但是在training.py中load_checkpoint加载模型步骤遇到了一个问题。 load_checkpoint函数中，需要得到一个tracker file，如果不存在这个文件便会有警告“will not load any checkpoints and will start from random”，但是我希望从bart-base-chinese的基础上进行pretrain，请问这个tracker file应该如何设置？以及后面torch.load是应该直接加载pytorch_model.bin吗？但是它似乎不是代码里提及的model_optim_rng.pt。

opened by Aureole-1210 2
想咨询run_gen.py如何设置GPU运行？

您好！我直接运行 whj_code1/projects/CPT/finetune/generation/run_gen.py 代码，发现是用CPU运行的。我看到日志中输出了 training_args.local_rank、training_args.device、training_args.n_gpu 参数，但是我发现代码中没有提供传参的位置，而且我也无法直接通过args传递这些参数。所以想咨询如何修改代码，使其能够用GPU来运行呢？

opened by PolarisRisingWar 2
max_position_embeddings是1024吗

我看fnlp/cpt-base里面config.json的max_position_embeddings写的1024，但实际上1024会报错，512没问题。发现代码里用了BertModel当encoder，但是没设置对应的max_position_embeddings 手动改成1024会导致预训练参数加载不进来。所以我的理解是config.json写错了，实际只支持512。希望能提供一版max_position_embeddings=1024的模型，和bart对齐一下

opened by awdrgyjilplij 2
why BertTokenizer is used instead of BartTokenizer?

Thank you for your nice work!

when preprocessing data, I follow your code to use BertTokenizer to load the cpt-base tokenizer. The tokenizer is load successfully, but I get the following warning message:

""" The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'BartTokenizer'. The class this function is called from is 'BertTokenizer'. """

Then I tried to use BartTokenizer to load it, but I failed.

The question is whether I should ignore the warning and still use the BertTokenizer? Thank you.

opened by Chen-Wang-CUHK 2

Releases(v2.0)

v2.0(Dec 30, 2022)

Source code(tar.gz)
Source code(zip)
v1.0(Dec 30, 2022)

Source code(tar.gz)
Source code(zip)

Owner

fastNLP

由复旦大学的自然语言处理（NLP）团队发起的国产自然语言处理开源项目

GitHub

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

1 Dec 13, 2021

Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

109 Dec 14, 2022

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

101 Dec 16, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

26 Oct 25, 2022

Converting CPT to bert form for use

cpt-encoder 将CPT转成bert形式使用说明刚刚刷到又出了一种模型：CPT，看论文显示，在很多中文任务上性能比mac bert还好，就迫不及待想把它用起来。根据对源码的研究，发现该模型在做nlu建模时主要用的encoder部分，也就是bert，因此我将这部分权重转为bert权重类型

1 Oct 14, 2021

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

1.4k Jan 5, 2023

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

1.1k Dec 24, 2022

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

183 Nov 9, 2022

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

138 Dec 30, 2022

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

31 Oct 13, 2022

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

3.3k Jan 6, 2023

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

22 Dec 8, 2022

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

210 Dec 18, 2022

Pre-Trained Image Processing Transformer (IPT)

Pre-Trained Image Processing Transformer (IPT) By Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Cha

332 Dec 18, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

31 Nov 19, 2022

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Related tags

Overview

CPT

Introduction

Downloads & Usage

Chinese BART

Load with Huggingface-Transformers

Citation

Comments

Releases(v2.0)

v2.0(Dec 30, 2022)

v1.0(Dec 30, 2022)

Owner

fastNLP

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Chinese clinical named entity recognition using pre-trained BERT model

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Converting CPT to bert form for use

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Pre-Trained Image Processing Transformer (IPT)

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Source code for paper: Knowledge Inheritance for Pre-trained Language Models