Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

Wang Yijun

Last update: Nov 29, 2022

Related tags

Deep Learning UniRE

Overview

UniRE

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

Requirements

python: 3.7.6
pytorch: 1.8.1
transformers: 4.2.2
configargparse: 1.2.3
bidict: 0.20.0
fire: 0.3.1

Datasets

We provide scripts and instructions for processing three datasets (ACE2004,ACE2005,SciERC) are provided in data/.

Training

ACE2004

python python entity_relation_joint_decoder.py \
    --config_file config.yml \
    --save_dir ckpt/ace2004_bert \
    --data_dir data/ACE2004/fold1 \
    --fine_tune \
    --device 0

ACE2005

python python entity_relation_joint_decoder.py \
    --config_file config.yml \
    --save_dir ckpt/ace2005_bert \
    --data_dir data/ACE2005 \
    --fine_tune \
    --device 0

SciERC

python python entity_relation_joint_decoder.py \
    --config_file config.yml \
    --save_dir ckpt/scierc_scibert \
    --data_dir data/SciERC \
    --bert_model_name allenai/scibert_scivocab_uncased \
    --epochs 300 \ 
    --early_stop 50 \
    --fine_tune \
    --device 0

Note that a GPU with 32G is required to run the default setting. If OOM occurs, we suggest that reducing train_batch_size and increasing gradient_accumulation_steps (gradient_accumulation_steps is used to perform Gradient Accumulation).

Inference

We provide an example ACE2005. Note that save_dir should contain the trained best_model.

python python entity_relation_joint_decoder.py \
    --config_file config.yml \
    --save_dir ckpt/ace2005_bert \
    --data_dir data/ACE2005 \
    --device 0 \
    --log_file test.log \
    --test

Cite

If you find our code is useful, please cite:

@inproceedings{wang2021unire,
    title = "{UniRE}: A Unified Label Space for Entity Relation Extraction",
    author = "Wang, Yijun and Sun, Changzhi and Wu, Yuanbin and Zhou, Hao and Li, Lei and Yan, Junchi",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

Comments

How to reproduce the result for SciERC in the paper?

Thanks for your amazing work! I do the data-preprocess and train as the guide in the Readme.md I see the code set the seed for random、torch and np. I train the model with the SciERC dataset and early stop at epoch 229 , get the result: I train it with a RTX3090 GPU without any modify the code (same as the Readme.md) I see, in the paper, the result is: the F1 for the ent is 68.4 and i get 66.03 How to reproduce the result in the paper? Sincerely looking forward to your reply!

opened by zplovekq 3
如何处理存在overlap的entity？

作者你好，感谢你们开源了代码，代码非常整洁易读。我注意到在论文中你们也提到UniRE无法处理overlap entity的情况，在阅读代码时，我发现这部分包含overlap的entity的句子似乎被直接扔掉了，我是根据 https://github.com/Receiling/UniRE/blob/4f59bea2997c8edb5d4659b1cfde9789d5965127/inputs/dataset_readers/ace_reader_for_joint_decoding.py#L174 这个，但我不确定我的理解是否是正确的。希望能得到你的解答，另外从https://github.com/Receiling/UniRE/blob/4f59bea2997c8edb5d4659b1cfde9789d5965127/entity_relation_joint_decoder.py#L281 这里看起来，似乎所有的数据集都被设定为train了，因此所有超过长度的句子也被直接扔掉了。所以我的疑问是（1）会把test中含有overlap entity的句子直接扔掉了嘛；
（2）会扔掉在test中长度超过阈值的sample嘛？

opened by yhcc 2

RuntimeError: Error(s) in loading state_dict for EntRelJointDecoder:

When I test on scierc dataset, the following errors are reported: error message:

Traceback (most recent call last):
  File "entity_relation_joint_decoder.py", line 330, in <module>
    main()
  File "entity_relation_joint_decoder.py", line 316, in main
    model.load_state_dict(state_dict,False)
  File "/home/jsj201-9/anaconda3/envs/unire/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for EntRelJointDecoder:
        size mismatch for embedding_model.bert_encoder.bert_model.embeddings.word_embeddings.weight: copying a param with shapetorch.Size([31090, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).

opened by lww1 2

您好，数据集相关问题希望请教一下

使用如下提供的流程和代码，以及ACE2004数据集进行处理

https://github.com/LorrinWWW/two-are-better-than-one/tree/master/datasets

① 执行python ace2json.py后生成了如下文件

② 执行python unify.py后在unified目录下生成了如下文件：

请问针对您的脚本命令./ace2004.sh ace2004_folder，以及split.py，我该使用哪一个路径作为ace2004_folder这一参数？谢谢！

opened by yixuan004 1
KeyError: 'articleId'

当我训练SciERC时，发生如下错误，有人遇到过吗？ [2021-12-30 12:30:40,742 - entity_relation_joint_decoder.py - line:274 - INFO]: Load bert tokenizer successfully. Traceback (most recent call last): File "C:\Users\Administrator\Desktop\UniRE-master\entity_relation_joint_decoder.py", line 330, in main() File "C:\Users\Administrator\Desktop\UniRE-master\entity_relation_joint_decoder.py", line 295, in main ace_dataset.build_dataset(vocab=vocab, File "C:\Users\Administrator\Desktop\UniRE-master\inputs\datasets\dataset.py", line 78, in build_dataset instance_settting['instance'].count_vocab_items(counter, File "C:\Users\Administrator\Desktop\UniRE-master\inputs\instance.py", line 61, in count_vocab_items field.count_vocab_items(counter, sentences) File "C:\Users\Administrator\Desktop\UniRE-master\inputs\fields\token_field.py", line 39, in count_vocab_items for sentence in sentences: File "C:\Users\Administrator\Desktop\UniRE-master\inputs\dataset_readers\ace_reader_for_joint_decoding.py", line 37, in iter state, results = self.get_tokens(line) File "C:\Users\Administrator\Desktop\UniRE-master\inputs\dataset_readers\ace_reader_for_joint_decoding.py", line 90, in get_tokens logger.error("article id: {} sentence id: {} doesn't contain 'sentText'.".format(line['articleId'], line['sentId'])) KeyError: 'articleId'

opened by yang-liguang 1

Owner

Wang Yijun

keep simple, keep doing!

GitHub

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 6, 2022

Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

AMR-Dialogue An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here. Requirements python 3.6 pytorch

45 Dec 26, 2022

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

42 Dec 9, 2022

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the

108 Dec 23, 2022

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 7, 2022

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 1, 2023

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

56 Nov 15, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

2.2k Jan 1, 2023

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

1 Oct 26, 2021

Graph Transformer Architecture. Source code for

Graph Transformer Architecture Source code for the paper "A Generalization of Transformer Networks to Graphs" by Vijay Prakash Dwivedi and Xavier Bres

561 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

198 Jan 2, 2023

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

150 Dec 28, 2022

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 5, 2022

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

Related tags

Overview

UniRE

Requirements

Datasets

Training

ACE2004

ACE2005

SciERC

Inference

Cite

Comments

How to reproduce the result for SciERC in the paper?

如何处理存在overlap的entity？

RuntimeError: Error(s) in loading state_dict for EntRelJointDecoder:

您好，数据集相关问题希望请教一下

KeyError: 'articleId'

Owner

Wang Yijun

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Source-to-Source Debuggable Derivatives in Pure Python

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Graph Transformer Architecture. Source code for

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Source code for Acorn, the precision farming rover by Twisted Fields

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"