Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

Overview

For better performance, you can try NLPGNN, see NLPGNN for more details.

BERT-NER Version 2

Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

The original version (see old_version for more detail) contains some hard codes and lacks corresponding annotations,which is inconvenient to understand. So in this updated version,there are some new ideas and tricks (On data Preprocessing and layer design) that can help you quickly implement the fine-tuning model (you just need to try to modify crf_layer or softmax_layer).

Folder Description:

BERT-NER
|____ bert                          # need git from [here](https://github.com/google-research/bert)
|____ cased_L-12_H-768_A-12	    # need download from [here](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)
|____ data		            # train data
|____ middle_data	            # middle data (label id map)
|____ output			    # output (final model, predict results)
|____ BERT_NER.py		    # mian code
|____ conlleval.pl		    # eval code
|____ run_ner.sh    		    # run model and eval result

Usage:

bash run_ner.sh

What's in run_ner.sh:

python BERT_NER.py\
    --task_name="NER"  \
    --do_lower_case=False \
    --crf=False \
    --do_train=True   \
    --do_eval=True   \
    --do_predict=True \
    --data_dir=data   \
    --vocab_file=cased_L-12_H-768_A-12/vocab.txt  \
    --bert_config_file=cased_L-12_H-768_A-12/bert_config.json \
    --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt   \
    --max_seq_length=128   \
    --train_batch_size=32   \
    --learning_rate=2e-5   \
    --num_train_epochs=3.0   \
    --output_dir=./output/result_dir

perl conlleval.pl -d '\t' < ./output/result_dir/label_test.txt

Notice: cased model was recommened, according to this paper. CoNLL-2003 dataset and perl Script comes from here

RESULTS:(On test set)

Parameter setting:

  • do_lower_case=False
  • num_train_epochs=4.0
  • crf=False
accuracy:  98.15%; precision:  90.61%; recall:  88.85%; FB1:  89.72
              LOC: precision:  91.93%; recall:  91.79%; FB1:  91.86  1387
             MISC: precision:  83.83%; recall:  78.43%; FB1:  81.04  668
              ORG: precision:  87.83%; recall:  85.18%; FB1:  86.48  1191
              PER: precision:  95.19%; recall:  94.83%; FB1:  95.01  1311

Result description:

Here i just use the default paramaters, but as Google's paper says a 0.2% error is reasonable(reported 92.4%). Maybe some tricks need to be added to the above model.

reference:

[1] https://arxiv.org/abs/1810.04805

[2] https://github.com/google-research/bert

Comments
  • TypeError: eval_metric_ops[confusion_matrix] must be Operation or Tensor

    TypeError: eval_metric_ops[confusion_matrix] must be Operation or Tensor

    How to solve this issue?

    TypeError: eval_metric_ops[confusion_matrix] must be Operation or Tensor, given:<tf.Variable 'total_confusion_matrix: 0' shape=(5, 5) dtype=float64_ref>
    
    opened by congchan 4
  • Question: training the model without init_checkpoint

    Question: training the model without init_checkpoint

    INFO:tensorflow:Error recorded from training_loop: local variable 'initialized_variable_names' referenced before assignment INFO:tensorflow:training_loop marked as finished WARNING:tensorflow:Reraising captured error Traceback (most recent call last): File "BERT_NER.py", line 612, in tf.app.run() File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "BERT_NER.py", line 545, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2409, in train rendezvous.raise_errors() File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\error_handling.py", line 128, in raise_errors six.reraise(typ, value, traceback) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2403, in train saving_listeners=saving_listeners File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1237, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2195, in _call_model_fn features, labels, mode, config) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1195, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2479, in _model_fn features, labels, is_export_mode=is_export_mode) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1259, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "C:\Users\Sudha\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1533, in _call_model_fn estimator_spec = self._model_fn(features=features, **kwargs) File "BERT_NER.py", line 419, in model_fn if var.name in initialized_variable_names: UnboundLocalError: local variable 'initialized_variable_names' referenced before assignment

    Training the model without using the init_checkpoint flag returns this error

    opened by sudhanshu817 3
  • TPUEstimatorSpec.predictions must be dict of Tensors

    TPUEstimatorSpec.predictions must be dict of Tensors

    When running predict on Google Colab (to use TPU) the code crashes with the following error:

    TPUEstimatorSpec.predictions must be dict of Tensors.

    To solve it one can place the following code in create_model

    predict = tf.argmax(probabilities, axis=-1)
    predict_dict = {'predictions': predict}  # this way it is not shot down by check in TPUEstimatorSpec
    return loss, per_example_loss, logits, predict_dict
    

    This of course also means changing the interpretation of the result

    result = estimator.predict(input_fn=predict_input_fn)
    result = list(result)
    result = [pred['predictions'] for pred in result]
    

    Currently I'm unable to to pull request since that would mean looking into whether it really is a solution. Just posting it here for anyone who has the same problem.

    opened by rikhuijzer 2
  • How to reproduce your results.

    How to reproduce your results.

    I use the same run command like yours, but I get worse results on dev dataset.

    eval_f = 0.89656204
    eval_precision = 0.90508
    eval_recall = 0.88843685
    global_step = 653
    loss = 17.190592
    

    I use "BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters" as checkpoint, which is public by google at November 23rd, 2018.

    opened by ljch2018 1
  • --max_seq_length=128 -> 150

    --max_seq_length=128 -> 150

    hi kyzhouhzau~

    thank you for this project :) there is a minor error which i'd like to report.

    def convert_single_example(ex_index, example, label_list, max_seq_length, tokenizer):
    ...
        input_ids = tokenizer.convert_tokens_to_ids(ntokens)
        input_mask = [1] * len(input_ids)
        while len(input_ids) < max_seq_length:
            input_ids.append(0)
            input_mask.append(0)
            segment_ids.append(0)
            label_ids.append(0)
        print('length check', len(input_ids), max_seq_length)
        assert len(input_ids) == max_seq_length  <-- error
        assert len(input_mask) == max_seq_length
        assert len(segment_ids) == max_seq_length
        assert len(label_ids) == max_seq_length
    ...
    

    tokenizer.convert_tokens_to_ids(ntokens) would generate longer list than max_seq_length when we are using --max_seq_length=128.

    so, i ran with --max_seq_length=150. it was fine.

    opened by dsindex 1
  • absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --do_train before flags were parsed.

    absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --do_train before flags were parsed.

    When I trying to train, I meet error: absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --do_train before flags were parsed. can anyone help me? thanks a lot.

    opened by YijianLiu 0
  • How to use BERT for ENTITY extraction from a Sequence without classification in the NER task?

    How to use BERT for ENTITY extraction from a Sequence without classification in the NER task?

    My requirement here is given a sentence(sequence), I would like to just extract the entities present in the sequence without classifying them to a type in the NER task. I see that BertForTokenClassification for NER does the classification. Can this be adapted for just the extraction?

    Can you give me an idea of how to do entity extraction/identification using BERT?

    opened by ManojPrabhakar 0
  • grpc error

    grpc error

    when i use tensroflow-serving grpc, the “response = stub.Predict(request, timeout)“ has a error message: status = StatusCode.FAILED_PRECONDITION details = "Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors" debug_error_string = "{"created":"@1558057036.708000000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"Batched output tensor's 0th dimension does not equal the sum of the 0th dimension sizes of the input tensors","grpc_status":9}"

    opened by Reevens 0
  • Problems: runs very slowly when converting single example to feature

    Problems: runs very slowly when converting single example to feature

    I found it cost too much time When running the convert_single_example function.

    time0 = time.time() feature = convert_single_example(ex_index, example, label_list, max_seq_length, tokenizer,mode) time1 = time.time() print("time cost:", time1-time0)

    the cost is up to a few seconds! time cost: 4.020495414733887

    In convert_single_example function we can fix it by add the following code : if not os.path.exists('./output/label2id.pkl'):

    in front of

    with open('./output/label2id.pkl','wb') as w: pickle.dump(label_map, w)

    opened by jiangpinglei 0
  • Uncased or Cased?

    Uncased or Cased?

    Thanks for sharing the code for NER task! May I know which model did you use? Cased or uncased? I am getting F1-dev 88.8 using Cased-model and F1-dev 92.6 using Uncased-model.

    opened by donovanOng 0
  • A new public notebook NER git

    A new public notebook NER git

    I come across some problems when I use this code . So I make a colab(notebook) NER model. And put it in the [https://github.com/Hou-jing/NER-public] I think this will be easy to use

    opened by Hou-jing 0
  • Use GPU for inference

    Use GPU for inference

    Hi, I'd like to use BERT-NER for inference, mainly to recognise ORG. I have been able to do so with CPU, now I'd like to know two things:

    1. Would GPU speed up inference ?

    2. Does BERT-NER automatically use CPU? I tried it on Google Colab and I don't see any changes in inference time. Please advice how, thanks !

    opened by gimseng 0
  • 像您请教一下为什么训练的时候并没有出现loss以及epoch

    像您请教一下为什么训练的时候并没有出现loss以及epoch

    训练的时候显示如下日志,如何查看每个epoch的loss过程 我的训练参数是 python BERT_NER.py
    --task_name="NER"
    --do_lower_case=False
    --crf=True
    --do_train=True
    --do_eval=True
    --do_predict=True
    --data_dir=data
    --vocab_file=cased_L-12_H-768_A-12/vocab.txt
    --bert_config_file=cased_L-12_H-768_A-12/bert_config.json
    --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt
    --max_seq_length=128
    --train_batch_size=32
    --learning_rate=2e-5
    --num_train_epochs=4.0
    --output_dir=./output/result_dir 如下训练日志: INFO:tensorflow:global_step/sec: 2.44863 I0326 15:59:02.876380 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.44863 INFO:tensorflow:examples/sec: 78.3562 I0326 15:59:02.876857 139718477698816 tpu_estimator.py:2308] examples/sec: 78.3562 INFO:tensorflow:global_step/sec: 2.24293 I0326 15:59:03.322147 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.24293 INFO:tensorflow:examples/sec: 71.7736 I0326 15:59:03.322539 139718477698816 tpu_estimator.py:2308] examples/sec: 71.7736 INFO:tensorflow:global_step/sec: 2.31614 I0326 15:59:03.753919 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.31614 INFO:tensorflow:examples/sec: 74.1165 I0326 15:59:03.754283 139718477698816 tpu_estimator.py:2308] examples/sec: 74.1165 INFO:tensorflow:global_step/sec: 2.32764 I0326 15:59:04.183665 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.32764 INFO:tensorflow:examples/sec: 74.4845 I0326 15:59:04.184118 139718477698816 tpu_estimator.py:2308] examples/sec: 74.4845 INFO:tensorflow:global_step/sec: 2.34524 I0326 15:59:04.610075 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.34524 INFO:tensorflow:examples/sec: 75.0478 I0326 15:59:04.610802 139718477698816 tpu_estimator.py:2308] examples/sec: 75.0478 INFO:tensorflow:global_step/sec: 2.24035 I0326 15:59:05.056344 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.24035 INFO:tensorflow:examples/sec: 71.6911 I0326 15:59:05.056849 139718477698816 tpu_estimator.py:2308] examples/sec: 71.6911 INFO:tensorflow:global_step/sec: 2.53696 I0326 15:59:05.450423 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.53696 INFO:tensorflow:examples/sec: 81.1828 I0326 15:59:05.450799 139718477698816 tpu_estimator.py:2308] examples/sec: 81.1828 INFO:tensorflow:global_step/sec: 2.54311 I0326 15:59:05.843605 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.54311 INFO:tensorflow:examples/sec: 81.3794 I0326 15:59:05.843933 139718477698816 tpu_estimator.py:2308] examples/sec: 81.3794 INFO:tensorflow:global_step/sec: 2.25988 I0326 15:59:06.286114 139718477698816 tpu_estimator.py:2307] global_step/sec: 2.25988 INFO:tensorflow:examples/sec: 72.316 I0326 15:59:06.286518 139718477698816 tpu_estimator.py:2308] examples/sec: 72.316 ^CINFO:tensorflow:training_loop marked as finished

    opened by YijianLiu 1
  • BERT_NER.py#L450 same code in if-else in both branches

    BERT_NER.py#L450 same code in if-else in both branches

    BERT_NER.py#L450

    if FLAGS.crf: (total_loss, logits,predicts) = create_model(bert_config, is_training, input_ids, mask, segment_ids, label_ids,num_labels,use_one_hot_embeddings)`

    else: (total_loss, logits, predicts) = create_model(bert_config, is_training, input_ids, mask, segment_ids, label_ids,num_labels, use_one_hot_embeddings)

    The same code is for both conditions.

    opened by natasasdj 0
Owner
Kaiyinzhou
Interested in machine learning, deep learning and knowledge graph. Familiar with basic machine learning algorithms, especially variational inference.
Kaiyinzhou
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

Kamal Raj 1.1k Dec 25, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.5k Feb 11, 2021
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.5k Feb 17, 2021
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

null 2 Jul 5, 2022
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

Dat Quoc Nguyen 152 Sep 2, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specialized entity categories for different domains.

Zihan Liu 89 Nov 10, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 5, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.4k Feb 17, 2021
A text augmentation tool for named entity recognition.

neraug This python library helps you with augmenting text data for named entity recognition. Augmentation Example Reference from An Analysis of Simple

Hiroki Nakayama 48 Oct 11, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

Dennis Priskorn 9 Nov 17, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

null 0 Feb 13, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

null 8 Dec 25, 2022
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022