CMeEE 数据集医学实体抽取

Related tags

Text Data & NLP GlobalPointer_torch

Overview

医学实体抽取_GlobalPointer_torch

介绍

思想来自于苏神 GlobalPointer，原始版本是基于keras实现的，模型结构实现参考现有 pytorch 复现代码【感谢!】，基于torch百分百复现苏神原始效果。

数据集

中文医学命名实体数据集点这里申请，很简单，共包含九类医学实体

环境

python 3.8.1
pytorch==1.8.1
transformer==4.9.2
tqdm
numpy

预训练模型

1、笔者比较喜欢用RoBerta系列 RoBERTa-zh-Large-PyTorch

2、点这里直接goole drive下载

运行

注意把train/predict文件中的 bert_model_path 路径改为你自己的

train

python train_CME.py

predict

python predict_CME.py

效果

苏神的模型效果还是不错的！

You might also like...

Comments

关于dataloader里面的一点小白问题

大佬，想问一下下面这个mapping是想实现什么呢？没看明白。

            token2char_span_mapping = self.tokenizer(text, return_offsets_mapping=True, max_length=max_len, truncation=True)["offset_mapping"]
            start_mapping = {j[0]: i for i, j in enumerate(token2char_span_mapping) if j != (0, 0)}
            end_mapping = {j[-1] - 1: i for i, j in enumerate(token2char_span_mapping) if j != (0, 0)}

opened by FalAnge1217 0

关于加载其他Roberta模型出现的小问题
加载自己训练的Roberta，如果使用 tokenizer = AutoTokenizer/BertTokenizerFast.from_pretrained() model = AutoModel.from_pretrained()

会出提示： Some weights of the model checkpoint at D:\PTM3\roberta-c were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']

This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of RobertaModel were not initialized from the model checkpoint at D:\PTM3\roberta-c and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

但最终训练F1有80+，我想知道如果他不出现这些提示效果会不会更好？我查阅网上资料有人回答说不用管这个提示，但是我如果把model = AutoModel.from_pretrained()按照原来代码那样使用model = BertModel.from_pretrained()的话，上面的提示会变得特别长，提示很多参数都未使用和重新被加载、初始化。

所以我觉得应该有必要解决一下这个问题，不知您是否有建议
opened by bleachbeauty 2