Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

Overview

PL-Marker

Source code for Pack Together: Entity and Relation Extraction with Levitated Marker.

Quick links

Overview

In this work, we present a novel span representation approach, named Packed Levitated Markers, to consider the dependencies between the spans (pairs) by strategically packing the markers in the encoder. Our approach is evaluated on two typical span (pair) representation tasks:

  1. Named Entity Recognition (NER): Adopt a group packing strategy for enabling our model to process massive spans together to consider their dependencies with limited resources.

  2. Relation Extraction (RE): Adopt a subject-oriented packing strategy for packing each subject and all its objects into an instance to model the dependencies between the same-subject span pairs

Please find more details of this work in our paper.

Setup

Install Dependencies

The code is based on huggaface's transformers.

Install dependencies and apex:

pip3 install -r requirement.txt
pip3 install --editable transformers

Download and preprocess the datasets

Our experiments are based on three datasets: ACE04, ACE05, and SciERC. Please find the links and pre-processing below:

  • CoNLL03: We use the Enlish part of CoNLL03
  • OntoNotes: We use preprocess_ontonotes.py to preprocess the OntoNote 5.0.
  • Few-NERD: The dataseet can be downloaed in their website
  • ACE04/ACE05: We use the preprocessing code from DyGIE repo. Please follow the instructions to preprocess the ACE05 and ACE04 datasets.
  • SciERC: The preprocessed SciERC dataset can be downloaded in their project website.

Pre-trained Models

We release our pre-trained NER models and RE models for ACE05 and SciERC datasets on Google Drive/Tsinghua Cloud.

Note: the performance of the pre-trained models might be slightly different from the reported numbers in the paper, since we reported the average numbers based on multiple runs.

Training Script

Train NER Models:

bash scripts/run_train_ner_PLMarker.sh
bash scripts/run_train_ner_BIO.sh
bash scripts/run_train_ner_TokenCat.sh

Train RE Models:

bash run_train_re.sh

Quick Start

The following commands can be used to run our pre-trained models on SciERC.

Evaluate the NER model:

CUDA_VISIBLE_DEVICES=0  python3  run_acener.py  --model_type bertspanmarker  \
    --model_name_or_path  ../bert_models/scibert-uncased  --do_lower_case  \
    --data_dir scierc  \
    --learning_rate 2e-5  --num_train_epochs 50  --per_gpu_train_batch_size  8  --per_gpu_eval_batch_size 16  --gradient_accumulation_steps 1  \
    --max_seq_length 512  --save_steps 2000  --max_pair_length 256  --max_mention_ori_length 8    \
    --do_eval  --evaluate_during_training   --eval_all_checkpoints  \
    --fp16  --seed 42  --onedropout  --lminit  \
    --train_file train.json --dev_file dev.json --test_file test.json  \
    --output_dir sciner_models/sciner-scibert  --overwrite_output_dir  --output_results

Evaluate the RE model:

CUDA_VISIBLE_DEVICES=0  python3  run_re.py  --model_type bertsub  \
    --model_name_or_path  ../bert_models/scibert-uncased  --do_lower_case  \
    --data_dir scierc  \
    --learning_rate 2e-5  --num_train_epochs 10  --per_gpu_train_batch_size  8  --per_gpu_eval_batch_size 16  --gradient_accumulation_steps 1  \
    --max_seq_length 256  --max_pair_length 16  --save_steps 2500  \
    --do_eval  --evaluate_during_training   --eval_all_checkpoints  --eval_logsoftmax  \
    --fp16  --lminit   \
    --test_file sciner_models/sciner-scibert/ent_pred_test.json  \
    --use_ner_results \
    --output_dir scire_models/scire-scibert

Here, --use_ner_results denotes using the original entity type predicted by NER models.

TypeMarker

if we use the flag --use_typemarker for the RE models, the results will be:

Model Ent Rel Rel+
ACE05-UnTypeMarker (in paper) 89.7 68.8 66.3
ACE05-TypeMarker 89.7 67.5 65.2
SciERC-UnTypeMarker (in paper) 69.9 52.0 40.6
SciERC-TypeMarker 69.9 52.5 40.9

Since the Typemarker increase the performance of SciERC but decrease the performance of ACE05, we didn't use it in the paper.

Citation

If you use our code in your research, please cite our work:

@article{ye2021plmarker,
  author    = {Deming Ye and Yankai Lin and Maosong Sun},
  title     = {Pack Together: Entity and Relation Extraction with Levitated Marker},
  journal   = {arXiv Preprint},
  year={2021}
}
Comments
  • 512 and 1024?

    512 and 1024?

    As I know, BERT is limit the position embedding as 512. However, when I look at the code, I found position id, input id and etc. have 1024 size. I quite confusing about this concept. Could you explain about the difference above those?

    opened by Jay0412 11
  • Question about the Quick Start

    Question about the Quick Start

    Hello, I was curious that in the Quick Start section, what does this "--max_mention_ori_length: 8" mean? If I run the different dataset, should I change it based on my data size? Thanks.

    opened by Zephyr1022 10
  • Modeling_bert.py

    Modeling_bert.py

    In Modeling_bert.py BertForACEBothOneDropoutSub, why ner classifier doesn't concatenate m1_states while BrtForSpanMarkerNER concatenate them to make a feature vector? Could you explain in more detail about the e1,e2, and m1? As I see the code, I think train_re.sh can train ner and re together with options, is it possible? if it is possible what are the exact options that I need? Also, I want to know, Is a subject-oriented packaging strategy only used in evaluation?

    opened by Jay0412 8
  • Trouble running

    Trouble running "Quick Start"-scripts

    Hi! Firstly, thanks for publishing your research and models! :)

    I have trouble evaluating the NER model with the given command CUDA_VISIBLE_DEVICES=0 python3 run_acener.py --model_type bertspanmarker ... The output is a json-file with only one line: {"dev_best_f1": 0}

    The last 3 lines of the log-output are:

    02/04/2022 17:37:16 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, alpha=1, cache_dir='', config_name='', data_dir='../scierc/raw_data', dev_file='dev.json', device=device(type='cuda'), do_eval=True, do_lower_case=True, do_test=False, do_train=False, eval_all_checkpoints=True, evaluate_during_training=True, fp16=True, fp16_opt_level='O1', gradient_accumulation_steps=1, group_axis=-1, group_edge=False, group_sort=False, learning_rate=2e-05, lminit=True, local_rank=-1, logging_steps=5, max_grad_norm=1.0, max_mention_ori_length=8, max_pair_length=256, max_seq_length=512, max_steps=-1, model_name_or_path='../bert_models/scibert_scivocab_uncased', model_type='bertspanmarker', n_gpu=1, no_cuda=False, no_test=False, norm_emb=False, num_train_epochs=50.0, onedropout=True, output_dir='../sciner_models/sciner-scibert', output_results=True, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=16, per_gpu_train_batch_size=8, save_steps=2000, save_total_limit=1, seed=42, server_ip='', server_port='', shuffle=False, test_file='test.json', tokenizer_name='', train_file='train.json', use_full_layer=-1, warmup_steps=-1, weight_decay=0.0)
    02/04/2022 17:37:16 - INFO - __main__ -   Evaluate on test set
    02/04/2022 17:37:16 - INFO - __main__ -   Evaluate the following checkpoints: []
    

    As you can see in the first line, I changed the original command in the following way:

    1. --model_name_or_path ../bert_models/scibert_scivocab_uncased I couldn't find a folder "scibert-uncased", so I downloaded the 4th model from huggingface as described in the "Training Script"-section (AllenAI) - is this maybe the wrong model?
    2. --data_dir ../scierc/raw_data I downloaded the SciERC raw_data from their website to execute the evaluation on - is this the wrong dataset?
    opened by Clemens123 8
  • 关于代码的几个疑问

    关于代码的几个疑问

    你好,在阅读代码时碰上几个疑问,可否解惑一下:

    1. 下列代码中的[30002][30003][3][4]表示什么?有何作用? https://github.com/thunlp/PL-Marker/blob/91b03f3ff58ad29fd7b9920a954a02d12756f05d/run_re.py#L373-L378

    2. 下列代码为什么要加一个(10000, 10000, 'NIL')的命名实体信息?并且在实体两两组合成候选关系对时,sub可以是(10000, 10000, 'NIL'),obj又不能是(10000, 10000, 'NIL'),这又是为什么? https://github.com/thunlp/PL-Marker/blob/91b03f3ff58ad29fd7b9920a954a02d12756f05d/run_re.py#L283-L284

    opened by lairunlin 7
  • How to prepare dataset for training the model?

    How to prepare dataset for training the model?

    Hi, Thanks for sharing this awesome work. I have a few doubts please help me to understand:

    1. I have a set of text paragraphs and want to extract entities and relationships between the entities detected. How would I prepare my dataset for NER and Relation Extraction model on this paragraph? What formate should I follow?
    2. If any tool you could recommend or any way to prepare tor annotate he data according to the desired format that the model is expecting, it would be a great help.

    Thanks.

    opened by karndeepsingh 7
  • 关于`run_ner.py`的疑问

    关于`run_ner.py`的疑问

    run_ner.py中第221行到238行主要是为了获取target_tokens,能不能麻烦解释一下其中的逻辑?为什么要这么处理?其中涉及到的half_context_lengthleft_context_lengthright_context_length都表示什么意思?非常感谢。 https://github.com/thunlp/PL-Marker/blob/b4863d47e2197b8d410e3d693d684707df9df2a1/run_ner.py#L221-L238

    opened by lairunlin 7
  • f1_with_ner2

    f1_with_ner2

    您好,我在您代码基础上改了一版代码,想要实现全悬浮标记的方法。运行结果表示f1达到了预期效果,但是f1_with_ner,和ner_f1的结果特别差,并且ner_f1的结果随着训练变得越来越差。我找了很久没有找到问题,我似乎用的也是golden的dev文件做的训练呀,为什么ner的f1一直在下降,但是您的代码中对应的ner_f1一直是1.0呢。如果您能够帮我看看代码问题出在哪了就更好了,或者您告诉问题可能出现在哪里也非常感谢。下面是训练的部分截图,最后附上我修改后的代码和运行脚本,谢谢

    image run_train_re_approx.zip

    opened by WangSheng21s 6
  • f1_with_ner

    f1_with_ner

    首先感谢作者出色的工作,有个小小的疑问问您:

    为什么在运行关系抽取任务中,在验证集中f1_with_ner的结果能够达到1.0呀, 难道运用的是对应的golden ner嘛,如果是的话能否指出对应代码在run_re.py中的位置,我看好像用的是模型预测的结果做的呀,但是按道理应该不可能到1.0.

    谢谢

    opened by WangSheng21s 6
  • Conll03数据集处理

    Conll03数据集处理

    你好,请问,为神魔要将Conll03数据集处理为I-label的形式,这样的话, 数据集的labelmap= {'O':0,'I-label':num}了吗?就不存在‘B-label’了吧, 但是,代码中定义的label_map,包括了B-label的呀。 而且,在分类中,模型给出的target-label=9, 所以,数据集,为什么要把B-label替换为I-label呢?

    image

    opened by Hou-jing 6
  • 使用albert-xxlarge-v1, apex在训练ner_PLMarker时出错

    使用albert-xxlarge-v1, apex在训练ner_PLMarker时出错

    您好,我们在ace05数据上训练ner_PLMarker模型时,如果使用bert-base-uncased + fp16参数,或者albert-xxlarge-v1 没有fp16参数时都可以正常训练,但使用albert-xxlarge-v1 + fp16时会出错,错误出现在AlbertAttention 的 mixed_query_layer = self.query(input_ids) 处,amp cached_cast 会报 IndexError: tuple index out of range的错误。不知道你们有没有遇到过这种问题。

    opened by yanzhh 6
Owner
THUNLP
Natural Language Processing Lab at Tsinghua University
THUNLP
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

Google 2.2k Jan 1, 2023
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Graph Transformer Architecture. Source code for

Graph Transformer Architecture Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bres

NTU Graph Deep Learning Lab 561 Jan 8, 2023
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

Twisted Fields 198 Jan 2, 2023
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

null 67 Dec 5, 2022
Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"

Hold me tight! Influence of discriminative features on deep network boundaries This is the source code to reproduce the experiments of the NeurIPS 202

EPFL LTS4 19 Dec 10, 2021
The open source code of SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.

SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation(ICPR 2020) Overview This code is for the paper: Spatial Attention U-Net for Retinal V

Changlu Guo 151 Dec 28, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022
Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

DialogBERT This is a PyTorch implementation of the DialogBERT model described in DialogBERT: Neural Response Generation via Hierarchical BERT with Dis

Xiaodong Gu 67 Jan 6, 2023
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval This repository contains source code and pre-trained/fine-tun

Siqi 65 Dec 26, 2022
Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

CopeNLU 36 Dec 5, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022