Pytorch-Named-Entity-Recognition-with-BERT

Overview

BERT NER

Use google BERT to do CoNLL-2003 NER !

new Train model using Python and Inference using C++

ALBERT-TF2.0

BERT-NER-TENSORFLOW-2.0

BERT-SQuAD

Requirements

  • python3
  • pip3 install -r requirements.txt

Run

python run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out_base --max_seq_length=128 --do_train --num_train_epochs 5 --do_eval --warmup_proportion=0.1

Result

BERT-BASE

Validation Data

             precision    recall  f1-score   support

        PER     0.9677    0.9745    0.9711      1842
        LOC     0.9654    0.9711    0.9682      1837
       MISC     0.8851    0.9111    0.8979       922
        ORG     0.9299    0.9292    0.9295      1341

avg / total     0.9456    0.9534    0.9495      5942

Test Data

             precision    recall  f1-score   support

        PER     0.9635    0.9629    0.9632      1617
        ORG     0.8883    0.9097    0.8989      1661
        LOC     0.9272    0.9317    0.9294      1668
       MISC     0.7689    0.8248    0.7959       702

avg / total     0.9065    0.9209    0.9135      5648

Pretrained model download from here

BERT-LARGE

Validation Data

             precision    recall  f1-score   support

        ORG     0.9288    0.9441    0.9364      1341
        LOC     0.9754    0.9728    0.9741      1837
       MISC     0.8976    0.9219    0.9096       922
        PER     0.9762    0.9799    0.9781      1842

avg / total     0.9531    0.9606    0.9568      5942

Test Data

             precision    recall  f1-score   support

        LOC     0.9366    0.9293    0.9329      1668
        ORG     0.8881    0.9175    0.9026      1661
        PER     0.9695    0.9623    0.9659      1617
       MISC     0.7787    0.8319    0.8044       702

avg / total     0.9121    0.9232    0.9174      5648

Pretrained model download from here

Inference

from bert import Ner

model = Ner("out_base/")

output = model.predict("Steve went to Paris")

print(output)
'''
    [
        {
            "confidence": 0.9981840252876282,
            "tag": "B-PER",
            "word": "Steve"
        },
        {
            "confidence": 0.9998939037322998,
            "tag": "O",
            "word": "went"
        },
        {
            "confidence": 0.999891996383667,
            "tag": "O",
            "word": "to"
        },
        {
            "confidence": 0.9991968274116516,
            "tag": "B-LOC",
            "word": "Paris"
        }
    ]
'''

Inference C++

Pretrained and converted bert-base model download from here

Download libtorch from here

  • install cmake, tested with cmake version 3.10.2

  • unzip downloaded model and libtorch in BERT-NER

  • Compile C++ App

      cd cpp-app/
      cmake -DCMAKE_PREFIX_PATH=../libtorch

    cmake output image

    make

    make output image

  • Runing APP

       ./app ../base

    inference output image

NB: Bert-Base C++ model is split in to two parts.

  • Bert Feature extractor and NER classifier.
  • This is done because jit trace don't support input depended for loop or if conditions inside forword function of model.

Deploy REST-API

BERT NER model deployed as rest api

python api.py

API will be live at 0.0.0.0:8000 endpoint predict

cURL request

curl -X POST http://0.0.0.0:8000/predict -H 'Content-Type: application/json' -d '{ "text": "Steve went to Paris" }'

Output

{
    "result": [
        {
            "confidence": 0.9981840252876282,
            "tag": "B-PER",
            "word": "Steve"
        },
        {
            "confidence": 0.9998939037322998,
            "tag": "O",
            "word": "went"
        },
        {
            "confidence": 0.999891996383667,
            "tag": "O",
            "word": "to"
        },
        {
            "confidence": 0.9991968274116516,
            "tag": "B-LOC",
            "word": "Paris"
        }
    ]
}

cURL

curl output image

Postman

postman output image

C++ unicode support

Tensorflow version

Comments
  • About 'X' label

    About 'X' label

    In my opinion, you should remove the 'X' label's signal in evaluation, because you add more label than standard dataset, so I can't know very well the F1-score increase because the more label of 'X'. I think the 'X' label is not equal the 'O' label in standard dataset and the BERT paper, but in your code it may be same.

    good first issue 
    opened by kugwzk 22
  • Trained model on custom dataset. Though predicting only conll2003 entities.

    Trained model on custom dataset. Though predicting only conll2003 entities.

    I have custom entities data of around 8 entities. I combined that dataset with the conll2003 (As I am interested in conll2003 entities also). I trained the model. Though the trained model is unable to predict any entities outside conll2003. Could you please help me if I am missing anything while training on custom dataset.

    I used below command to train the model.

    nohup python3 run_ner.py --data_dir=data --bert_model=bert-large-cased --task_name=ner --output_dir=out_bert_large --max_seq_length=128 --num_train_epochs 10 --do_train --do_eval --no_cuda --warmup_proportion=0.4 > log.txt &

    opened by ranjeetds 16
  • Evaluating on the complete dev/test dataset by loading a model. Also, maybe some issues with Bert_base_uncased tokenization in Ner class in bert.py ??

    Evaluating on the complete dev/test dataset by loading a model. Also, maybe some issues with Bert_base_uncased tokenization in Ner class in bert.py ??

    Hi @Kamalkraj,

    Thanks again for sharing and maintaining this repository frequently.

    For training a system, we have to use - "python run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out_base --max_seq_length=128 --do_train --num_train_epochs 5 --do_eval --warmup_proportion=0.1"

    I wanted to ask if we can use a similar command for evaluating a trained model on the entire dev/test files ? Something like - "python run_ner.py --data_dir=data/ --bert_model=out_base --task_name=ner --max_seq_length=128 --do_eval"

    I had tried the example that you have given in Inference block for evaluation (https://github.com/kamalkraj/BERT-NER#inference). I saw before that the tokenization is hardcoded for BERT_cased models in Ner class of bert.py. I am currently working with BERT_base_uncased model and during the inference steps, it throws error of mismatch in len(labels) and len(words)

    bert.py", line 119, in predict assert len(labels) == len(words) AssertionError

    I tried fixing the tokenization in Ner class but then got few other errors. It will be very helpful if you can fix the evaluations for BERT_uncased models.

    Thanks.

    opened by vikas95 11
  • Evaluate does not work for custom dataset

    Evaluate does not work for custom dataset

    Hi, I tried using your code on my data. It finished training without problem, but it got problem during evaluation.

    Traceback (most recent call last):
      File "run_ner.py", line 674, in <module>
        main()
      File "run_ner.py", line 661, in main
        temp_1.append(label_map[label_ids[i][j]])
    KeyError: 0
    

    I printed the label_ids:

    label ids:  0 i:  0 j:  16 label:  [58  1  1  1  1  1  1  1  1  1  1  1  1  1  1 59  0  0  0  0  0  0  0  0
      0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
      0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
    
    

    Seems like it is because the element with j==16 is 0 in the label list and label 0 is not in the labelmap. I wonder how did you build the label_ids?

    Thanks

    opened by imayachita 11
  •  warmup_linear not working on BERT 0.6.2 vesrion

    warmup_linear not working on BERT 0.6.2 vesrion

    Hi Kamal, Thank you sharing the code . I installed pytorch-pretrained-bert 0.6.2 version in my PC and run your code . I am getting below error while executing the code. Can you please guide me how to solve this issue using latest BERT 0.6.2 version?

    from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear ImportError: cannot import name 'warmup_linear' from 'pytorch_pretrained_bert.optimization' (C:\ProgramData\Anaconda3\lib\site-packages\pytorch_pretrained_bert\optimization.py)

    lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion)

    Is there other way we can change the code ?

    Regards, Niranjan

    opened by niranjan8129 9
  • KeyError: 0 during evaluation

    KeyError: 0 during evaluation

    ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/BERT-NER+kamalkraj$ python3.5 run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out --max_seq_length=128 --do_train --num_train_epochs 5 --do_eval --warmup_proportion=0.4 Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex. 06/11/2019 19:02:56 - INFO - main - device: cuda n_gpu: 1, distributed training: False, 16-bits training: False 06/11/2019 19:02:57 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt from cache at /home/ub16c9/.pytorch_pretrained_bert/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1 06/11/2019 19:02:58 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz from cache at /home/ub16c9/.pytorch_pretrained_bert/distributed_-1/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c 06/11/2019 19:02:58 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /home/ub16c9/.pytorch_pretrained_bert/distributed_-1/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c to temp dir /tmp/tmpyj8ar20e 06/11/2019 19:03:01 - INFO - pytorch_pretrained_bert.modeling - Model config { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 12, "type_vocab_size": 2, "vocab_size": 28996 }

    06/11/2019 19:03:05 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias'] 06/11/2019 19:03:05 - INFO - pytorch_pretrained_bert.modeling - Weights from pretrained model not used in BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] 06/11/2019 19:03:07 - INFO - main - *** Example *** 06/11/2019 19:03:07 - INFO - main - guid: train-0 06/11/2019 19:03:07 - INFO - main - tokens: EU rejects German call to boycott British la ##mb . 06/11/2019 19:03:07 - INFO - main - input_ids: 101 7270 22961 1528 1840 1106 21423 1418 2495 12913 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - *** Example *** 06/11/2019 19:03:07 - INFO - main - guid: train-1 06/11/2019 19:03:07 - INFO - main - tokens: Peter Blackburn 06/11/2019 19:03:07 - INFO - main - input_ids: 101 1943 14428 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - *** Example *** 06/11/2019 19:03:07 - INFO - main - guid: train-2 06/11/2019 19:03:07 - INFO - main - tokens: BR ##US ##SE ##LS 1996 - 08 - 22 06/11/2019 19:03:07 - INFO - main - input_ids: 101 26660 13329 12649 15928 1820 118 4775 118 1659 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - *** Example *** 06/11/2019 19:03:07 - INFO - main - guid: train-3 06/11/2019 19:03:07 - INFO - main - tokens: The European Commission said on Thursday it disagreed with German advice to consumers to s ##hun British la ##mb until scientists determine whether mad cow disease can be transmitted to sheep . 06/11/2019 19:03:07 - INFO - main - input_ids: 101 1109 1735 2827 1163 1113 9170 1122 19786 1114 1528 5566 1106 11060 1106 188 17315 1418 2495 12913 1235 6479 4959 2480 6340 13991 3653 1169 1129 12086 1106 8892 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - *** Example *** 06/11/2019 19:03:07 - INFO - main - guid: train-4 06/11/2019 19:03:07 - INFO - main - tokens: Germany ' s representative to the European Union ' s veterinary committee Werner Z ##wing ##mann said on Wednesday consumers should buy sheep ##me ##at from countries other than Britain until the scientific advice was clearer . 06/11/2019 19:03:07 - INFO - main - input_ids: 101 1860 112 188 4702 1106 1103 1735 1913 112 188 27431 3914 14651 163 7635 4119 1163 1113 9031 11060 1431 4417 8892 3263 2980 1121 2182 1168 1190 2855 1235 1103 3812 5566 1108 27830 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:07 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:03:10 - INFO - main - ***** Running training ***** 06/11/2019 19:03:10 - INFO - main - Num examples = 14041 06/11/2019 19:03:10 - INFO - main - Batch size = 32 06/11/2019 19:03:10 - INFO - main - Num steps = 2190 Epoch: 40%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 2/5 [07:01<10:33, 211.32s/it^Epoch: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [17:28<00:00, 209.73s/it] 06/11/2019 19:20:40 - INFO - main - *** Example ***████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 439/439 [03:27<00:00, 2.24it/s] 06/11/2019 19:20:40 - INFO - main - guid: dev-0 06/11/2019 19:20:40 - INFO - main - tokens: CR ##IC ##KE ##T - L ##EI ##CE ##ST ##ER ##S ##H ##IR ##E T ##A ##KE O ##VE ##R AT TO ##P A ##FT ##ER IN ##NI ##NG ##S VI ##CT ##OR ##Y . 06/11/2019 19:20:40 - INFO - main - input_ids: 101 15531 9741 22441 1942 118 149 27514 10954 9272 9637 1708 3048 18172 2036 157 1592 22441 152 17145 2069 13020 16972 2101 138 26321 9637 15969 27451 11780 1708 7118 16647 9565 3663 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - *** Example *** 06/11/2019 19:20:40 - INFO - main - guid: dev-1 06/11/2019 19:20:40 - INFO - main - tokens: L ##ON ##D ##ON 1996 - 08 - 30 06/11/2019 19:20:40 - INFO - main - input_ids: 101 149 11414 2137 11414 1820 118 4775 118 1476 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - *** Example *** 06/11/2019 19:20:40 - INFO - main - guid: dev-2 06/11/2019 19:20:40 - INFO - main - tokens: West Indian all - round ##er Phil Simmons took four for 38 on Friday as Leicestershire beat Somerset by an innings and 39 runs in two days to take over at the head of the county championship . 06/11/2019 19:20:40 - INFO - main - input_ids: 101 1537 1890 1155 118 1668 1200 5676 14068 1261 1300 1111 3383 1113 5286 1112 21854 3222 8860 1118 1126 6687 1105 3614 2326 1107 1160 1552 1106 1321 1166 1120 1103 1246 1104 1103 2514 2899 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - *** Example *** 06/11/2019 19:20:40 - INFO - main - guid: dev-3 06/11/2019 19:20:40 - INFO - main - tokens: Their stay on top , though , may be short - lived as title rivals Essex , Derbyshire and Surrey all closed in on victory while Kent made up for lost time in their rain - affected match against Nottinghamshire . 06/11/2019 19:20:40 - INFO - main - input_ids: 101 2397 2215 1113 1499 117 1463 117 1336 1129 1603 118 2077 1112 1641 9521 8493 117 15964 1105 9757 1155 1804 1107 1113 2681 1229 5327 1189 1146 1111 1575 1159 1107 1147 4458 118 4634 1801 1222 21942 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - *** Example *** 06/11/2019 19:20:40 - INFO - main - guid: dev-4 06/11/2019 19:20:40 - INFO - main - tokens: After bowling Somerset out for 83 on the opening morning at Grace Road , Leicestershire extended their first innings by 94 runs before being bowled out for 29 ##6 with England disc ##ard Andy C ##ad ##dick taking three for 83 . 06/11/2019 19:20:40 - INFO - main - input_ids: 101 1258 11518 8860 1149 1111 6032 1113 1103 2280 2106 1120 4378 1914 117 21854 2925 1147 1148 6687 1118 5706 2326 1196 1217 21663 1149 1111 1853 1545 1114 1652 6187 2881 4827 140 3556 25699 1781 1210 1111 6032 119 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:40 - INFO - main - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06/11/2019 19:20:41 - INFO - main - ***** Running evaluation ***** 06/11/2019 19:20:41 - INFO - main - Num examples = 3250 06/11/2019 19:20:41 - INFO - main - Batch size = 8 Evaluating: 11%|████████████████████████▉ | 45/407 [00:01<00:14, 24.73it/s] Traceback (most recent call last): File "run_ner.py", line 534, in main() File "run_ner.py", line 518, in main temp_2.append(label_map[logits[i][j]]) KeyError: 0 ub16c9@ub16c9-gpu

    opened by loveJasmine 9
  • Label [SEP] in results

    Label [SEP] in results

    If use the small --max_seq_length (in example bellow 32), we get SEP in results. The lower max_seq_length the greater SEPs

             precision    recall  f1-score   support
        LOC     0.9248    0.9248    0.9248      1529
        PER     0.9556    0.9582    0.9569      1436
        ORG     0.8860    0.8993    0.8926      1539
       MISC     0.7620    0.8242    0.7919       637
      [SEP]     1.0000    1.0000    1.0000       637
    

    avg / total 0.9125 0.9235 0.9178 5778

    opened by mcfly5 8
  • suggestion for own dataset to train on  pretrained model.

    suggestion for own dataset to train on pretrained model.

    Thanks, for the great project. I checked the train format. it contains text, pos , bio-tag, entity tag(bio-schema) for example: -DOCSTART- -X- -X- O

    EU NNP B-NP B-ORG rejects VBZ B-VP O German JJ B-NP B-MISC call NN I-NP O to TO B-VP O boycott VB I-VP O British JJ B-NP B-MISC lamb NN I-NP O . . O O I having a 10,000 own sentence data, I can do entity tagging for my dataset. but How can i do pos and bio-tag is there any python library especially for bio-tag.

    Thanks

    opened by vigneshgig 6
  • Just a question

    Just a question

    Hello , finally, excellent script for bert-NER. I am just wondering if this script can be used(slight changes) to train a token based classification task. i.e. similar to NER task but the token(target word) to be classified are given in advance. For example, train a model for word-sense disambiguation. Given a word in a sentence determine/classify its sense. e.g. "He went to the store" went here has sense “motion”. the target word here is "went" Any idea?

    opened by moh55m55 6
  • N-Gram support

    N-Gram support

    Hey Kamal, Great work on the repo and major thanks for hosting the trained model.

    Would it be possible to add N-Gram support to this model, for say, 'New York' instead of detecting on 'New' and then 'York'?

    Also, any plans to train a larger model for QA or NER? Say RoBERTa or XL-NET?

    opened by gittb 5
  • Cannot reproduce your reported F1 score

    Cannot reproduce your reported F1 score

    after downloading your pretrained model in the master branch and run this command: ''' python run_ner.py --data_dir=data/ --bert_model=bert-base-cased --task_name=ner --output_dir=out --max_seq_length=128 --num_train_epochs 5 --do_eval --warmup_proportion=0.4 ''' I got the F1 score of 90.9 in the test set, which is far away from what you reported. Could you help with my issue? Thanks!

    opened by jind11 5
  • Bump flask-cors from 3.0.8 to 3.0.9

    Bump flask-cors from 3.0.8 to 3.0.9

    Bumps flask-cors from 3.0.8 to 3.0.9.

    Release notes

    Sourced from flask-cors's releases.

    Release 3.0.9

    Security

    • Escape path before evaluating resource rules (thanks @​praetorian-colby-morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"
    Changelog

    Sourced from flask-cors's changelog.

    3.0.9

    Security

    • Escape path before evaluating resource rules (thanks to Colby Morgan). Prior to this, flask-cors incorrectly evaluated CORS resource matching before path expansion. E.g. "/api/../foo.txt" would incorrectly match resources for "/api/*" whereas the path actually expands simply to "/foo.txt"
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • How to  convert all cpp,header files to DLL file?

    How to convert all cpp,header files to DLL file?

    @kamalkraj In cpp-app/unlib folder there is so many header and cpp files. Along with these files there is one more app.cpp file in cpp-app folder. I want to create DLL file from these all header and cpp files.. How can I do that? Thanks

    opened by yugaljain1999 1
  • Model training does not work on CPU

    Model training does not work on CPU

    I have cloned code from dev branch and executing following command to fine-tune model on CPU: python run_ner.py --cache_dir=path_to_cache --data_dir=path_to_data --bert_model=bert-base-uncased --task_name=ner --output_dir=path_to_output --no_cuda --do_train --do_eval --warmup_proportion=0.1

    But I am facing the following error: Traceback (most recent call last): File "run_ner.py", line 611, in main() File "run_ner.py", line 503, in main loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "run_ner.py", line 43, in forward logits = self.classifier(sequence_output) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

    I am not getting when I am passing CPU flag, why is it expecting a tensor to be on GPU?

    opened by saurabhhssaurabh 1
  • Understanding the Evaluation Code

    Understanding the Evaluation Code

    Thanks for the nice code.

    I am confused with the evaluation code in run_ner.py L570. In particular, why "if j==0: continue" and what is variable "temp_" for? It would be nice if someone can walk me through.

           for i, label in enumerate(label_ids):
                temp_1 = []
                temp_2 = []
                for j,m in enumerate(label):
                    if j == 0:
                        continue
                    elif label_ids[i][j] == len(label_map):
                        y_true.append(temp_1)
                        y_pred.append(temp_2)
                        break
                    else:
                        temp_1.append(label_map[label_ids[i][j]])
                        temp_2.append(label_map[logits[i][j]])
    

    Also, I got the report file as follows, which seems incorrect (the extra "SEP]"):

              precision    recall  f1-score   support
    
         LOC     0.9213    0.9329    0.9270      1668
        MISC     0.7756    0.8319    0.8027       702
         ORG     0.8867    0.9001    0.8933      1661
         PER     0.9521    0.9586    0.9553      1617
        SEP]     0.0000    0.0000    0.0000         0
    

    micro avg 0.9005 0.9180 0.9092 5648 macro avg 0.7071 0.7247 0.7157 5648 weighted avg 0.9018 0.9180 0.9098 5648

    Any help would be highly appreciated. Thank you for your time.

    opened by Vincent950129 0
  • How can i use this project in Chinese NER?

    How can i use this project in Chinese NER?

    @kamalkraj Or do you know any repositories which can get a Chinese NER result like below:

    [ {'word': 'Steve', 'tag': 'B-PER', 'confidence': 0.9993595480918884},

    {'word': 'went', 'tag': 'O', 'confidence': 0.9997534155845642},

    {'word': 'to', 'tag': 'O', 'confidence': 0.999747097492218},

    {'word': 'Paris', 'tag': 'B-LOC', 'confidence': 0.9990612864494324} ]

    I will be very thankful for your help.

    opened by helloWorldLZ 1
Owner
Kamal Raj
DeepLearning | NLP | COMPUTER VISION | TF | KERAS | PYTORCH | SWIFT
Kamal Raj
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition (CoNLL-2003

Kaiyinzhou 1.2k Dec 26, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 7, 2022
Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

null 2 Jul 5, 2022
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specialized entity categories for different domains.

Zihan Liu 89 Nov 10, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 5, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.5k Feb 11, 2021
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.4k Feb 17, 2021
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.5k Feb 17, 2021
A text augmentation tool for named entity recognition.

neraug This python library helps you with augmenting text data for named entity recognition. Augmentation Example Reference from An Analysis of Simple

Hiroki Nakayama 48 Oct 11, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

Dennis Priskorn 9 Nov 17, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

null 0 Feb 13, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

null 8 Dec 25, 2022
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022