RoFormer_pytorch

Overview

PyTorch RoFormer

原版Tensorflow权重(https://github.com/ZhuiyiTechnology/roformer)

已经转化为PyTorch权重

安装

pip install roformer
或者
pip install git+https://github.com/JunnYu/RoFormer_pytorch.git

huggingface.co

https://huggingface.co/junnyu/roformer_chinese_base

使用

import torch
from roformer import RoFormerModel, RoFormerTokenizer
tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_base")
model = RoFormerModel.from_pretrained("junnyu/roformer_chinese_base")
text = "这里基本保留了唐宋遗留下来的坊巷格局和大量明清古建筑,其中各级文保单位29处,被誉为“里坊制度的活化石”“明清建筑博物馆”!"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs).last_hidden_state
print(outputs.shape)

MLM测试

import torch
from roformer import RoFormerForMaskedLM, RoFormerTokenizer
text = "今天[MASK]很好,我[MASK]去公园玩。"
tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_base")
model = RoFormerForMaskedLM.from_pretrained("junnyu/roformer_chinese_base")
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs).logits[0]
outputs_sentence = ""
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(outputs[i].topk(k=5)[1])
        outputs_sentence += "[" + "||".join(tokens) + "]"
    else:
        outputs_sentence += "".join(
            tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))
print(outputs_sentence)
# 今天[天气||天||心情||阳光||空气]很好,我[想||要||打算||准备||喜欢]去公园玩。

手动权重转换

python convert_roformer_original_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path=xxxxxx/chinese_roformer_L-12_H-768_A-12/bert_model.ckpt \
    --roformer_config_file=pretrained_models/chinese_roformer_base/config.json \
    --pytorch_dump_path=pretrained_models/chinese_roformer_base/pytorch_model.bin

tf与pytorch精度对齐

python compare_model.py
mean difference : tensor(4.3925e-07)
max  difference : tensor(7.6294e-06)

中文情感分类(chnsenti)

结果

model chnsenti
tensorflow-NEZHA(base-wwm) 94.75
pytorch-NEZHA(base-wwm) 94.92
pytorch-RoFormer(base) 95.08

参考

https://github.com/pengming617/bert_classification

https://github.com/bojone/bert4keras

https://github.com/ZhuiyiTechnology/roformer

https://github.com/lonePatient/NeZha_Chinese_PyTorch

https://github.com/lonePatient/TorchBlocks

Comments
  • 手动转换权重有问题

    手动转换权重有问题

    python convert_roformer_original_tf_checkpoint_to_pytorch.py
    --tf_checkpoint_path=xxxxxx/chinese_roformer_L-12_H-768_A-12/bert_model.ckpt
    --roformer_config_file=pretrained_models/chinese_roformer_base/config.json
    --pytorch_dump_path=pretrained_models/chinese_roformer_base/pytorch_model.bin

    这个直接运行好像不行,按照错误提示修改以后,只生成了一个pb文件,没有对应的config文件,这个是为啥呢?

    opened by TestNLP 13
  • 关于transformers的一些问题

    关于transformers的一些问题

    嗨,想问一下,现在使用transformers调用roformer相关模型,和使用本代码库的,是完全一样吗? 我想调用roformer-sim相关模型,是选用什么接口呀? RoFormerForMaskedLM吗? 我使用transformers RoFormerForMaskedLM调用之后,发现有一部分参数并没有被load(应该是pooler相关);测试的时候,直接拿来用是很不错的;但如果想拿来作为底座训练,发现loss降不下来(同样代码roberta是可以正常训练的),想知道是不是因为没有加pooler的原因。在你的example中没找到关于roformer-sim相关的例子,大佬有空的时候帮忙解答下哈,多谢啦!

    opened by yclzju 9
  • transformers 加载 roformer_chinese_sim_char_small 出错

    transformers 加载 roformer_chinese_sim_char_small 出错

    版本:

    transformers:4.9.1
    

    code:

    import torch
    from transformers import RoFormerModel, RoFormerTokenizer
    tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_sim_char_small")
    pt_model = RoFormerModel.from_pretrained("junnyu/roformer_chinese_sim_char_small")
    

    报错:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-62-5df5e1c416aa> in <module>
          2 from transformers import RoFormerModel, RoFormerTokenizer, TFRoFormerModel
          3 tokenizer = RoFormerTokenizer.from_pretrained("junnyu/roformer_chinese_sim_char_small")
    ----> 4 pt_model = RoFormerModel.from_pretrained("junnyu/roformer_chinese_sim_char_small")
    
    /opt/anaconda3/lib/python3.8/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
       1350             error_msg = "\n\t".join(error_msgs)
       1351             raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
    -> 1352 
       1353         return model, missing_keys, unexpected_keys, error_msgs
       1354 
    
    /opt/anaconda3/lib/python3.8/site-packages/transformers/modeling_utils.py in _load_state_dict_into_model(cls, model, state_dict, pretrained_model_name_or_path, ignore_mismatched_sizes, _fast_init)
       1496         x = self.dense_1(x).squeeze(-1)
       1497 
    -> 1498         if p_mask is not None:
       1499             if get_parameter_dtype(self) == torch.float16:
       1500                 x = x * (1 - p_mask) - 65500 * p_mask
    
    RuntimeError: Error(s) in loading state_dict for RoFormerModel:
    	size mismatch for roformer.encoder.embed_positions.weight: copying a param with shape torch.Size([1536, 64]) from checkpoint, the shape in current model is torch.Size([512, 64]).
    
    opened by liuyuzhangolvz 6
  • 关于其他模型的权重转换为RoFormer模型

    关于其他模型的权重转换为RoFormer模型

    您好,我正在进行关于长文本的模型训练,但是由于原版RoFormer模型过小,效果不佳,我想尝试large版RoFormer。 由于没有相关large模型,我想将开源的'hfl/chinese-macbert-large'权重转换为RoFormer模型,以尝试长文本训练。 苏神将绝对位置编码替换为RoPE的WoBERT模型转换为RoFormer,因此我通过相同的代码(https://github.com/ZhuiyiTechnology/roformer/blob/main/train.py) bert = build_transformer_model( config_path, checkpoint_path=None, model='roformer', with_mlm='linear', ignore_invalid_weights=True, return_keras_model=False ) model = bert.model y_in = keras.layers.Input(shape=(None,), name='Input-Label') outputs = CrossEntropy(1)([y_in, model.output]) train_model = keras.models.Model(model.inputs + [y_in], outputs) AdamW = extend_with_weight_decay(Adam, name='AdamW') AdamWLR = extend_with_piecewise_linear_lr(AdamW, name='AdamWLR') AdamWLRG = extend_with_gradient_accumulation(AdamWLR, name='AdamWLRG') optimizer = AdamWLRG( learning_rate=1e-5, weight_decay_rate=0.01, exclude_from_weight_decay=['Norm', 'bias'], grad_accum_steps=4, lr_schedule={20000: 1} ) train_model.compile(optimizer=optimizer) train_model.summary() bert.load_weights_from_checkpoint(checkpoint_path) model.save_weights('romac/bert_model.weights')

    转换了一个macbert版本的tf权重,然后想要通过您的convert_roformer_original_tf_checkpoint_to_pytorch.py将这个权重转换为pytorch版本,可却会报错,是因为我转换的权重有问题吗,还是说无法直接转换权重?

    convert_tf_checkpoint_to_pytorch('romac/bert_model.weights', 'romac/bert_config.json','romac/1') 报错: Traceback (most recent call last): File "C:\Users\14301\miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3427, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 24, in romac/1') File "", line 16, in convert_tf_checkpoint_to_pytorch load_tf_weights_in_roformer(model, config, tf_checkpoint_path) File "C:\Users\14301\miniconda3\lib\site-packages\roformer\modeling_roformer.py", line 115, in load_tf_weights_in_roformer pointer.shape == array.shape File "C:\Users\14301\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 948, in getattr type(self).name, name)) AttributeError: 'RoFormerForPreTraining' object has no attribute 'shape'

    opened by WENGSYX 6
  • 'RoFormerModel' object has no attribute 'shape'

    'RoFormerModel' object has no attribute 'shape'

    大佬,我用这个convert_tf_checkpoint_to_pytorch()函数想将苏神最近开源的RoFormer-Sim模型转成pytorch版的,但是遇到了torch.nn.modules.module.ModuleAttributeError: 'RoFormerModel' object has no attribute 'shape'的问题:

    def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, roformer_config_file, pytorch_dump_path):

    # Initialise PyTorch model
    config = RoFormerConfig.from_json_file(roformer_config_file)
    print(f"Building PyTorch model from configuration: {config}")
    model = RoFormerForMaskedLM(config)
    
    # Load weights from tf checkpoint
    load_tf_weights_in_roformer(model, config, tf_checkpoint_path)
    
    # Save pytorch-model
    print(f"Save PyTorch model to {pytorch_dump_path}")
    torch.save(model.state_dict(), pytorch_dump_path, _use_new_zipfile_serialization=False)
    

    Traceback (most recent call last): File "D:/Python/3.代码储存/1.重要代码学习笔记(重要)/21.Python自然语言处理/18.预训练语言模型专题学习/将tf1的checkpoint模型权重转换为pytorch的模型权重/convert_tf_to_pytorch.py", line 81, in pytorch_dump_path) File "D:/Python/3.代码储存/1.重要代码学习笔记(重要)/21.Python自然语言处理/18.预训练语言模型专题学习/将tf1的checkpoint模型权重转换为pytorch的模型权重/convert_tf_to_pytorch.py", line 42, in convert_tf_checkpoint_to_pytorch_roformer_model load_tf_weights_in_roformer(model, config, tf_checkpoint_path) File "D:\Python\main\lib\site-packages\transformers\models\roformer\modeling_roformer.py", line 167, in load_tf_weights_in_roformer pointer.shape == array.shape File "D:\Python\main\lib\site-packages\torch\nn\modules\module.py", line 779, in getattr type(self).name, name)) torch.nn.modules.module.ModuleAttributeError: 'RoFormerModel' object has no attribute 'shape'

    我使用的transformers版本是4.8.2的,函数里的RoFormerForMaskedLM模型换成RoFormerForCausalLM、RoFormerPreTrainedModel也不行依然会报错。

    RoFormer-Sim模型的权重下载自苏神的github: https://github.com/ZhuiyiTechnology/roformer/blob/main/README_zh.md

    opened by CurisZhou 5
  • 导包出现错误

    导包出现错误

    from transformers.file_utils import ModelOutput, add_start_docstrings, add_start_docstrings_to_model_forward ImportError: cannot import name 'add_start_docstrings_to_model_forward'

    请问出现这个错误是什么原因呢

    opened by KyrieXDL 5
  • RoFormerConfig载入报错

    RoFormerConfig载入报错

    非常感谢您开源这个项目,我在使用roformer_chinese_char_base的时候想扩大max_position_embeddings的长度,所以在尝试通过RoFormerConfig的方式载入权重时报错。

    from roformer.modeling_roformer import RoFormerModel, RoFormerConfig
    myconfig = RoFormerConfig.from_pretrained('D:/pretrain/pytorch/roformer_chinese_char_base')
    myconfig.max_position_embeddings=2000
    model = RoFormerModel(config=myconfig)
    ckpt = torch.load('D:/pretrain/pytorch/roformer_chinese_char_base/pytorch_model.bin')
    model.load_state_dict(ckpt,strict=False)
    

    Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.token_type_embeddings.weight", "embeddings.LayerNorm.weight", "embeddings.LayerNorm.bias", "encoder.embed_positions.weight", "encoder.layer.0.attention.self.query.weight", "encoder.layer.0.attention.self.query.bias", "encoder.layer.0.attention.self.key.weight", "encoder.layer.0.attention.self.key.bias", "encoder.layer.0.attention.self.value.weight", "encoder.layer.0.attention.self.value.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.weight", "encoder.layer.0.attention.output.LayerNorm.bias", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias" 看了下RoFormerModel的网络层,貌似都是不带"roformer",请问这个是不是需要修改RoFormerModel里面的层名称?

    opened by renjunxiang 4
  • 相似句生成的问题

    相似句生成的问题

    第一次接触seq2seq和相似句生成。 我按照 #17 里提到的使用RoFormerForMaskedLM,设config.is_decoder=True,参考UniLM_Mask,替换掉了对应的位置

    def get_extended_attention_mask(self, seg_id):
            idxs = torch.cumsum(seg_id,dim=1)
            mask = idxs[:, None, :] <= idxs[:, :, None]
            mask =  1.0 * mask[:,None] #-(1.0 - mask[:, None]) * 1000000.0
            return mask
    

    但输出的结果是不对的。

    我现在的后续实现:将 [CLS]你叫什么?[SEP] 输入模型;把[SEP]位置的logit拿出求出概率,使用greedysearch拿出最大概率的id,加入input_ids;同时更新seg_id以求出新的 extended attention mask;对于最后一位logit重复之前操作直到输出终止符或超出长度。 请问哪里出问题了?@hxs91@JunnYu

    opened by WordBearerYI 3
  • roformer-v2结合huggingface的库accelerate 使用时会导致保存的模型被多次删除;导致部分模型没有被保存

    roformer-v2结合huggingface的库accelerate 使用时会导致保存的模型被多次删除;导致部分模型没有被保存

    accelerate的保存模型方法:

    # How to save your 🤗 Transformer?
    accelerator.wait_for_everyone()
    unwrapped_model = accelerator.unwrap_model(model)
    unwrapped_model.save_pretrained(save_dir, save_function=accelerator.save, state_dict=accelerator.get_state_dict(model))
    

    roformer-v1 没有这个问题;roformer-v2 有这个问题,删除了多次

    opened by XiaoqingNLP 2
  • 采用长序列输入发生异常,roformer是否支持不定长度输入

    采用长序列输入发生异常,roformer是否支持不定长度输入

    您好,非常感谢您的开源以及提供pip安装。

    在使用rofromer时,使用短序列进行输入正常(<512),但使用过长输入会报错并停止运行。请问rofomer_pytorch是否支持变长输入呢?

    报错信息主要为:

    /opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [335,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    /opt/conda/conda-bld/pytorch_1646755861072/work/aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [335,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    
      File "XXX/roformer/modeling_roformer.py", line 1075, in forward
        attention_mask, input_shape, device, past_key_values_length
      File "XXX/roformer/modeling_roformer.py", line 1158, in get_extended_attention_mask
        extended_attention_mask = extended_attention_mask.to(dtype=self.dtype)  # fp16 compatibility
    RuntimeError: CUDA error: device-side assert triggered
    
    opened by likestudy 2
  • 关于rotary_value的问题

    关于rotary_value的问题

    我看到不管是现在的实现还是以前的实现,rotary_value为false的时候,对于q和k都会做改变,我不清楚这和完全不做(将rotary_value为false时的代码都注释掉)改变是否等价?至少从实验结果来看差距很多。另外如果单纯将rotary_value设置为false的时候,效果比true的时候还要好一些。

    任务为预训练,并且在下游任务上进行微调。比较了预训练时训练loss和验证loss,微调时不同任务的得分。

    opened by hxs91 2
  • 对gradient checkpointing的支持似乎有问题

    对gradient checkpointing的支持似乎有问题

    你好!

    在使用roformer v2微调的时候开启gradient checkpointing的时候会产生报错: File "/root/conda/envs/highbase/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/conda/envs/highbase/lib/python3.7/site-packages/roformer/modeling_roformer.py", line 1120, in forward return_dict=return_dict, File "/root/conda/envs/highbase/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/conda/envs/highbase/lib/python3.7/site-packages/roformer/modeling_roformer.py", line 725, in forward encoder_attention_mask, File "/root/conda/envs/highbase/lib/python3.7/site-packages/torch/utils/checkpoint.py", line 177, in checkpoint return CheckpointFunction.apply(function, preserve, *args) TypeError: save_for_backward can only save variables, but argument 2 is of type tuple

    是我使用的方式不当么?不开启是可以正常训练的

    opened by boxiaowave 0
  • 关于模型转换问题

    关于模型转换问题

    请问怎样把bert4keras保存的模型转成pytorch, 是用model.save_weights()保存的 使用您提供的convert_roformer_original_tf_checkpoint_to_pytorch提示 RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for

    opened by yang-zi-jiang 1
  • 在run_clue_no_trainer.py中,为什么train_dataloader的batch_siz=32在经过accelerator的处理后就变成None了?

    在run_clue_no_trainer.py中,为什么train_dataloader的batch_siz=32在经过accelerator的处理后就变成None了?

    在训练的时候,从Dataloader中拿数据说没有batch_size,我检查了好几遍是有输入这个参数的。结果发现 ( model, optimizer, train_dataloader, eval_dataloader, lr_scheduler, ) = accelerator.prepare( model, optimizer, train_dataloader, eval_dataloader, lr_scheduler ) 这一段代码处理完后的train_dataloader的batch_size=None

    opened by learnmore-HDU 1
  • 您好!有一个关于模型转换的问题

    您好!有一个关于模型转换的问题

    我发现在bert4keras读取苏老师开源的chinese_roformer-sim-char-ft_L-12_H-768_A-12和在transformers上读取您的junnyu/roformer_chinese_sim_char_ft_base提取的句向量的效果是差不多的,但是当我尝试将苏老师的模型转为pytorch_bin再用transformers读取后,pooler层怎么都加不上去(bert4keras里指定with_pool='linear'就可以)请问您是如何解决的?

    opened by EddieChen324 3
Releases(v0.4.1)
Owner
yujun
Please show me your code.
yujun