Source code for Pack Together: Entity and Relation Extraction with Levitated Marker.

In this work, we present a novel span representation approach, named Packed Levitated Markers, to consider the dependencies between the spans (pairs) by strategically packing the markers in the encoder. Our approach is evaluated on two typical span (pair) representation tasks:

  1. Named Entity Recognition (NER): Adopt a group packing strategy for enabling our model to process massive spans together to consider their dependencies with limited resources.

  2. Relation Extraction (RE): Adopt a subject-oriented packing strategy for packing each subject and all its objects into an instance to model the dependencies between the same-subject span pairs

Please find more details of this work in our paper.


Install Dependencies

The code is based on huggaface's transformers.

Install dependencies and apex:

pip3 install -r requirement.txt
pip3 install --editable transformers

Download and preprocess the datasets

Our experiments are based on three datasets: ACE04, ACE05, and SciERC. Please find the links and pre-processing below:

  • CoNLL03: We use the Enlish part of CoNLL03
  • OntoNotes: We use to preprocess the OntoNote 5.0.
  • Few-NERD: The dataseet can be downloaed in their website
  • ACE04/ACE05: We use the preprocessing code from DyGIE repo. Please follow the instructions to preprocess the ACE05 and ACE04 datasets.
  • SciERC: The preprocessed SciERC dataset can be downloaded in their project website.

Pre-trained Models

We release our pre-trained NER models and RE models for ACE05 and SciERC datasets on Google Drive/Tsinghua Cloud.

Note: the performance of the pre-trained models might be slightly different from the reported numbers in the paper, since we reported the average numbers based on multiple runs.

Training Script

Train NER Models:

bash scripts/
bash scripts/
bash scripts/

Train RE Models:


Quick Start

The following commands can be used to run our pre-trained models on SciERC.

Evaluate the NER model:

CUDA_VISIBLE_DEVICES=0  python3  --model_type bertspanmarker  \
    --model_name_or_path  ../bert_models/scibert-uncased  --do_lower_case  \
    --data_dir scierc  \
    --learning_rate 2e-5  --num_train_epochs 50  --per_gpu_train_batch_size  8  --per_gpu_eval_batch_size 16  --gradient_accumulation_steps 1  \
    --max_seq_length 512  --save_steps 2000  --max_pair_length 256  --max_mention_ori_length 8    \
    --do_eval  --evaluate_during_training   --eval_all_checkpoints  \
    --fp16  --seed 42  --onedropout  --lminit  \
    --train_file train.json --dev_file dev.json --test_file test.json  \
    --output_dir sciner_models/sciner-scibert  --overwrite_output_dir  --output_results

Evaluate the RE model:

CUDA_VISIBLE_DEVICES=0  python3  --model_type bertsub  \
    --model_name_or_path  ../bert_models/scibert-uncased  --do_lower_case  \
    --data_dir scierc  \
    --learning_rate 2e-5  --num_train_epochs 10  --per_gpu_train_batch_size  8  --per_gpu_eval_batch_size 16  --gradient_accumulation_steps 1  \
    --max_seq_length 256  --max_pair_length 16  --save_steps 2500  \
    --do_eval  --evaluate_during_training   --eval_all_checkpoints  --eval_logsoftmax  \
    --fp16  --lminit   \
    --test_file sciner_models/sciner-scibert/ent_pred_test.json  \
    --use_ner_results \
    --output_dir scire_models/scire-scibert

Here, --use_ner_results denotes using the original entity type predicted by NER models.


if we use the flag --use_typemarker for the RE models, the results will be:

Model Ent Rel Rel+
ACE05-UnTypeMarker (in paper) 89.7 68.8 66.3
ACE05-TypeMarker 89.7 67.5 65.2
SciERC-UnTypeMarker (in paper) 69.9 52.0 40.6
SciERC-TypeMarker 69.9 52.5 40.9

Since the Typemarker increase the performance of SciERC but decrease the performance of ACE05, we didn't use it in the paper.


If you use our code in your research, please cite our work:

  author    = {Deming Ye and Yankai Lin and Maosong Sun},
  title     = {Pack Together: Entity and Relation Extraction with Levitated Marker},
  journal   = {arXiv Preprint},
  • 512 and 1024?

    512 and 1024?

    As I know, BERT is limit the position embedding as 512. However, when I look at the code, I found position id, input id and etc. have 1024 size. I quite confusing about this concept. Could you explain about the difference above those?

    opened by Jay0412 11
  • Question about the Quick Start

    Question about the Quick Start

    Hello, I was curious that in the Quick Start section, what does this "--max_mention_ori_length: 8" mean? If I run the different dataset, should I change it based on my data size? Thanks.

    opened by Zephyr1022 10

    In BertForACEBothOneDropoutSub, why ner classifier doesn't concatenate m1_states while BrtForSpanMarkerNER concatenate them to make a feature vector? Could you explain in more detail about the e1,e2, and m1? As I see the code, I think can train ner and re together with options, is it possible? if it is possible what are the exact options that I need? Also, I want to know, Is a subject-oriented packaging strategy only used in evaluation?

    opened by Jay0412 8
  • Trouble running

    Trouble running "Quick Start"-scripts

    Hi! Firstly, thanks for publishing your research and models! :)

    I have trouble evaluating the NER model with the given command CUDA_VISIBLE_DEVICES=0 python3 --model_type bertspanmarker ... The output is a json-file with only one line: {"dev_best_f1": 0}

    The last 3 lines of the log-output are:

    02/04/2022 17:37:16 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, alpha=1, cache_dir='', config_name='', data_dir='../scierc/raw_data', dev_file='dev.json', device=device(type='cuda'), do_eval=True, do_lower_case=True, do_test=False, do_train=False, eval_all_checkpoints=True, evaluate_during_training=True, fp16=True, fp16_opt_level='O1', gradient_accumulation_steps=1, group_axis=-1, group_edge=False, group_sort=False, learning_rate=2e-05, lminit=True, local_rank=-1, logging_steps=5, max_grad_norm=1.0, max_mention_ori_length=8, max_pair_length=256, max_seq_length=512, max_steps=-1, model_name_or_path='../bert_models/scibert_scivocab_uncased', model_type='bertspanmarker', n_gpu=1, no_cuda=False, no_test=False, norm_emb=False, num_train_epochs=50.0, onedropout=True, output_dir='../sciner_models/sciner-scibert', output_results=True, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=16, per_gpu_train_batch_size=8, save_steps=2000, save_total_limit=1, seed=42, server_ip='', server_port='', shuffle=False, test_file='test.json', tokenizer_name='', train_file='train.json', use_full_layer=-1, warmup_steps=-1, weight_decay=0.0)
    02/04/2022 17:37:16 - INFO - __main__ -   Evaluate on test set
    02/04/2022 17:37:16 - INFO - __main__ -   Evaluate the following checkpoints: []

    As you can see in the first line, I changed the original command in the following way:

    1. --model_name_or_path ../bert_models/scibert_scivocab_uncased I couldn't find a folder "scibert-uncased", so I downloaded the 4th model from huggingface as described in the "Training Script"-section (AllenAI) - is this maybe the wrong model?
    2. --data_dir ../scierc/raw_data I downloaded the SciERC raw_data from their website to execute the evaluation on - is this the wrong dataset?
    opened by Clemens123 8
  • 关于代码的几个疑问



    1. 下列代码中的[30002][30003][3][4]表示什么?有何作用?

    2. 下列代码为什么要加一个(10000, 10000, 'NIL')的命名实体信息?并且在实体两两组合成候选关系对时,sub可以是(10000, 10000, 'NIL'),obj又不能是(10000, 10000, 'NIL'),这又是为什么?

    opened by lairunlin 7
  • How to prepare dataset for training the model?

    How to prepare dataset for training the model?

    Hi, Thanks for sharing this awesome work. I have a few doubts please help me to understand:

    1. I have a set of text paragraphs and want to extract entities and relationships between the entities detected. How would I prepare my dataset for NER and Relation Extraction model on this paragraph? What formate should I follow?
    2. If any tool you could recommend or any way to prepare tor annotate he data according to the desired format that the model is expecting, it would be a great help.


    opened by karndeepsingh 7
  • 关于``的疑问



    opened by lairunlin 7
  • f1_with_ner2




    opened by WangSheng21s 6
  • f1_with_ner



    为什么在运行关系抽取任务中,在验证集中f1_with_ner的结果能够达到1.0呀, 难道运用的是对应的golden ner嘛,如果是的话能否指出对应代码在run_re.py中的位置,我看好像用的是模型预测的结果做的呀,但是按道理应该不可能到1.0.


    opened by WangSheng21s 6
  • Conll03数据集处理


    你好,请问,为神魔要将Conll03数据集处理为I-label的形式,这样的话, 数据集的labelmap= {'O':0,'I-label':num}了吗?就不存在‘B-label’了吧, 但是,代码中定义的label_map,包括了B-label的呀。 而且,在分类中,模型给出的target-label=9, 所以,数据集,为什么要把B-label替换为I-label呢?


    opened by Hou-jing 6
  • 使用albert-xxlarge-v1, apex在训练ner_PLMarker时出错

    使用albert-xxlarge-v1, apex在训练ner_PLMarker时出错

    您好,我们在ace05数据上训练ner_PLMarker模型时,如果使用bert-base-uncased + fp16参数,或者albert-xxlarge-v1 没有fp16参数时都可以正常训练,但使用albert-xxlarge-v1 + fp16时会出错,错误出现在AlbertAttention 的 mixed_query_layer = self.query(input_ids) 处,amp cached_cast 会报 IndexError: tuple index out of range的错误。不知道你们有没有遇到过这种问题。

    opened by yanzhh 6
