CMeEE 数据集医学实体抽取

Overview

医学实体抽取_GlobalPointer_torch

介绍

思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。

数据集

中文医学命名实体数据集 点这里申请,很简单,共包含九类医学实体

环境

  • python 3.8.1

  • pytorch==1.8.1

  • transformer==4.9.2

  • tqdm

  • numpy

预训练模型

1、笔者比较喜欢用RoBerta系列 RoBERTa-zh-Large-PyTorch

2、点这里直接goole drive下载

运行

注意把train/predict文件中的 bert_model_path 路径改为你自己的

train

python train_CME.py

predict

python predict_CME.py

效果

苏神的模型效果还是不错的!

image-20210914093108205.png

You might also like...
Comments
  • 关于dataloader里面的一点小白问题

    关于dataloader里面的一点小白问题

    大佬,想问一下下面这个mapping是想实现什么呢?没看明白。

                token2char_span_mapping = self.tokenizer(text, return_offsets_mapping=True, max_length=max_len, truncation=True)["offset_mapping"]
                start_mapping = {j[0]: i for i, j in enumerate(token2char_span_mapping) if j != (0, 0)}
                end_mapping = {j[-1] - 1: i for i, j in enumerate(token2char_span_mapping) if j != (0, 0)}
    
    opened by FalAnge1217 0
  • 关于加载其他Roberta模型出现的小问题

    关于加载其他Roberta模型出现的小问题

    加载自己训练的Roberta,如果使用 tokenizer = AutoTokenizer/BertTokenizerFast.from_pretrained() model = AutoModel.from_pretrained()

    会出提示: Some weights of the model checkpoint at D:\PTM3\roberta-c were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']

    • This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    • This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of RobertaModel were not initialized from the model checkpoint at D:\PTM3\roberta-c and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

    但最终训练F1有80+,我想知道如果他不出现这些提示效果会不会更好?我查阅网上资料有人回答说不用管这个提示,但是我如果把model = AutoModel.from_pretrained()按照原来代码那样使用model = BertModel.from_pretrained()的话,上面的提示会变得特别长,提示很多参数都未使用和重新被加载、初始化。

    所以我觉得应该有必要解决一下这个问题,不知您是否有建议

    opened by bleachbeauty 2
Owner
null