Mengzi Pretrained Models

Langboat

Last update: Jan 4, 2023

Related tags

Deep Learning nlp natural-language-processing deep-learning pytorch bert language-understanding chinese-bert

Overview

中文 | English

Mengzi

尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用，但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下，研发出各项指标更优的模型。

我们的目标不是追求更大的模型规模，而是轻量级但更强大，同时对部署和工业落地更友好的模型。

基于语言学信息融入和训练加速等方法，我们研发了 Mengzi 系列模型。由于与 BERT 保持一致的模型结构，Mengzi 模型可以快速替换现有的预训练模型。

详细的技术报告请参考:

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

快速上手

Mengzi-BERT

# 使用 Huggingface transformers 加载
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

Mengzi-T5

# 使用 Huggingface transformers 加载
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

Mengzi-Oscar

参考文档

依赖安装

pip install transformers

下游任务

CLUE 分数

Model	AFQMC	TNEWS	IFLYTEK	CMNLI	WSC	CSL	CMRC2018	C3	CHID
RoBERTa-wwm-ext	74.30	57.51	60.80	80.70	67.20	80.67	77.59	67.06	83.78
Mengzi-BERT-base	74.58	57.97	60.68	82.12	87.50	85.40	78.54	71.70	84.16

RoBERTa-wwm-ext 的分数来自 CLUE baseline

对应超参

Task	Learning rate	Batch size	Epochs
AFQMC	3e-5	32	10
TNEWS	3e-5	128	10
IFLYTEK	3e-5	64	10
CMNLI	3e-5	512	10
WSC	8e-6	64	50
CSL	5e-5	128	5
CMRC2018	5e-5	8	5
C3	1e-4	240	3
CHID	5e-5	256	5

下载链接

联系方式

微信讨论群

邮箱

wangyulong[at]chuangxin[dot]com

免责声明

该项目中的内容仅供技术研究参考，不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。技术报告中所呈现的实验结果仅表明在特定数据集和超参组合下的表现，并不能代表各个模型的本质。实验结果可能因随机数种子，计算设备而发生改变。

使用者以各种方式使用本模型（包括但不限于修改使用、直接使用、通过第三方使用）的过程中，不得以任何方式利用本模型直接或间接从事违反所属法域的法律法规、以及社会公德的行为。使用者需对自身行为负责，因使用本模型引发的一切纠纷，由使用者自行承担全部法律及连带责任。我们不承担任何法律及连带责任。

我们拥有对本免责声明的解释、修改及更新权。

文献引用

@misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Comments

test_ch.yaml文件

您好，多模态oscar模型，在acc-icc数据集进行推理的时候，采用语句是： python -m torch.distributed.launch --nproc_per_node=8 oscar/run_captioning.py
--data_dir
--do_test --test_yaml test_ch.yaml
--num_beams 5 --per_gpu_eval_batch_size 128 --max_gen_length 20
--eval_model_dir

请问test_ch.yaml文件是位于哪里呢

opened by jiangix-paper 14
请问Mengzi-BERT-base在CLUE的9项下游任务中，训练的平台配置和参数是多少？

开发者您好！论文中说Mengzi-BERT-base在CLUE的9项下游任务中超过了RoBERTa、BERT等baseline，我有几个问题想请教您一下： ① 请问在下游任务训练中，你们使用的硬件平台配置是多少呢？例如显卡配置、CUDA版本等。 ② 而且，方便透露下游任务训练中更具体的参数设置吗？例如优化器的参数配置、warmup的设置、模型初始化的seed值、下游任务中是否使用了fp16等。 ③ 刚刚看到FAQ中说不考虑开放training代码，请问Mengzi-BERT-base的下游任务训练代码也不会考虑开放吗？

opened by ma787639046 9
What is the input format for the model to automatically generate marketing copy?

hello langboat, thanks for sharing the good work. Regarding the automatically generated marketing copy in the paper

Given the input title and keywords, the models are required to generate a corresponding descriptive passage

What is the input of the model? Is it in the form of [cls] title [sep] [keywords1,keywords2,keywords3,keywords4] [sep] [kg11,kg12,kg13] [kg21,kg22,kg23]?

opened by Nipi64310 3
Input prefix of the model mengzi-t5-base

Hi,

I have a question regarding the input of the model mengzi-t5-base. In the original paper of T5, it mentions that "we need to add the task-specific prefix to the original input sequence before feeding it to the model". I wonder that if I want to perform text summarize task with mengzi-t5-base or other downstream tasks, do I need to add some prefix, and what the prefix should be. Thank you very much for your help, looking forward to your reply.

opened by KarenMars 2
How to incorporate knowledge graph in marketing copywriting?

Hi, Thanks for sharing this awesome work. According to the Figure 2. of your paper, you incorporate knowledge graph in marketing copywriting task, but it seems there is no further explanation about this. Could you please explain more about this method?

opened by windysavage 1
batch size究竟是128还是16384

我注意到技术报告中2.1节提到：

We limit the length of sentences in each batch to up to 512 tokens, and the batch size is 128.

这一段后面又提到：

The batch sizes for the two stages are 16384 and 32768, respectively

请问究竟batch size究竟是哪个呢？是否前一个是number of sequences，后面一个是number of tokens？还是由于使用了LAMB所以能支持这么大的batch size？LAMB的paper用的是32868。

opened by hankcs 1
mengzi-gpt-neo-base在huggingface上无法体验，有异常爆出

如题, 错误信息： Can't load tokenizer using from_pretrained, please update its configuration: Can't load tokenizer for 'Langboat/mengzi-gpt-neo-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Langboat/mengzi-gpt-neo-base' is the correct path to a directory containing all relevant files for a GPT2TokenizerFast tokenizer.

opened by liruixue 2

Owner

Langboat

GitHub

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

928 Dec 29, 2022

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

3.3k Jan 4, 2023

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

169 Dec 26, 2022

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

146 Dec 22, 2022

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

24 Dec 21, 2022

Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

41 Nov 30, 2022

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

1.4k Jan 1, 2023

Measuring and Improving Consistency in Pretrained Language Models

ParaRel ?? This repository contains the code and data for the paper: Measuring and Improving Consistency in Pretrained Language Models as well as the

26 Dec 2, 2022

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

96 Dec 21, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

140 Dec 28, 2022

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 ?? is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

73 Dec 16, 2022

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet) Introduction This repo contains the pretrained Music Source Separ

100 Dec 25, 2022

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

2.4k Dec 28, 2022

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

86 Oct 28, 2022

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

249 Dec 22, 2022

Mengzi Pretrained Models

Related tags

Overview

Mengzi

导航

快速上手

Mengzi-BERT

Mengzi-T5

Mengzi-Oscar

依赖安装

下游任务

CLUE 分数

对应超参

下载链接

联系方式

微信讨论群

邮箱

免责声明

文献引用

Comments

test_ch.yaml文件

请问Mengzi-BERT-base在CLUE的9项下游任务中，训练的平台配置和参数是多少？

What is the input format for the model to automatically generate marketing copy?

Input prefix of the model mengzi-t5-base

How to incorporate knowledge graph in marketing copywriting?

batch size究竟是128还是16384

mengzi-gpt-neo-base在huggingface上无法体验，有异常爆出

Owner

Langboat

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Using pretrained language models for biomedical knowledge graph completion.

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Measuring and Improving Consistency in Pretrained Language Models

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

A library for finding knowledge neurons in pretrained transformer models.

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

DenseNet Implementation in Keras with ImageNet Pretrained Models

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.