Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Overview

Re-TACRED

Re-TACRED: Addressing Shortcomings of the TACRED Dataset
George Stoica, Emmanouil Antonios Platanios, and Barnabás Póczos
In Proceedings of the Thirty-fifth AAAI Conference on Artificial Intelligence 2021

Primary Contact: George Stoica. As of Jan 2021, I am no longer at CMU, and the cs.cmu.edu email may no longer work. Please contact me instead at: [email protected].

Changelog

  • 1.0 - Initial dataset release: Data consisted of 105,206 total instances spread across 40 relations.
  • 1.1 - Updated dataset release: After extensive discussion, we have elected to prune Re-TACRED by ~ 14K instances. The new dataset has 91,467 instances, spread across 40 relations. Pruned data consisted of a mixture of messily segmented entities (and corresponding types), or sentences whose relations were ambigious. While this version is smaller, it is cleaner, and better defined.

This repository contains all relevant resources for using Re-TACRED, a new relation extraction dataset.

For details on this work please check out our:

Below we describe the contents of the four repository directories by name.

Re-TACRED

This directory contains version 1.1 of our revised TACRED dataset patches for each split. Due to licensing restrictions, we cannot provide the complete dataset. However, following Alt, Gabryszak, and Hennig (2020), our patch consists of json files mapping TACRED instances by their id to our revised labels.

The original TACRED dataset is available for download from the LDC here. It is free for members, or $25 for non-members.

Applying the patch is simple and only requires replacing each TACRED instance (where applicable) with our revised relation. For convenience, we provide a script for this named apply_patch.py in the Re-TACRED directory. In the script, you only need to replace

tacred_dir = None
save_dir = None

With the path to your TACRED dataset save directory, and the directory where you wish to save the patched data to respectively.

PA-LSTM, C-GCN & SpanBERT

We base our experiments off of the open-source model repositories of:

However, it is not possible to simply pass Re-TACRED to each model repository because each is hardcoded for TACRED. Thus, we must modify certain files to make each model Re-TACRED compatible. To make it as easy as possible, we provide all our altered files in each named model directory (e.g., the provided PA-LSTM directory). All that needs to be done is to replace the corresponding file in our provided directory with the corresponding file in the original model repository. For instance, you may replace SpanBERT's "run_tacred.py" file with our "run_tacred.py" file. Running experiments is equivalent to how it is performed in the original model repositories.

Note that our files also contain certain "quality of life" changes that make running each model more convenient for us. Examples include adding and tracking the test split while training (as opposed to only the dev set).

You might also like...
Code, Models and Datasets for OpenViDial Dataset
Code, Models and Datasets for OpenViDial Dataset

OpenViDial This repo contains downloading instructions for the OpenViDial dataset in 《OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Vis

Transfer SemanticKITTI labeles into other dataset/sensor formats.
Transfer SemanticKITTI labeles into other dataset/sensor formats.

LiDAR-Transfer Transfer SemanticKITTI labeles into other dataset/sensor formats. Content Convert datasets (NUSCENES, FORD, NCLT) to KITTI format Minim

The MATH Dataset
The MATH Dataset

Measuring Mathematical Problem Solving With the MATH Dataset This is the repository for Measuring Mathematical Problem Solving With the MATH Dataset b

Object Depth via Motion and Detection Dataset
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

NeurIPS 2020 SEVIR Code for paper: SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology Requirement

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Dataset Cartography Code for the paper Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics at EMNLP 2020. This repository cont

dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

Comments
  • Label mapping Error

    Label mapping Error

    Hello,

    I am trying to run PALSTM model following the instruction, but while loading the dataset, I ran into a KeyError: 'org:city_of_branch.

    Some implementation details:

    1. I have created the new dataset by applying the patch.
    2. I am using the new dataset.
    3. I have all the necessary files in place from TACRED's repository.

    I believe there is some inconsistency in utils/constant LABEL_TO_ID dictionary. Could you please help me resolve the issue.

    opened by akshayparakh25 1
Owner
George Stoica
PhD ML @ Georgia Tech
George Stoica
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 8, 2023
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022