PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

Overview

End-to-End Coreference Resolution with Different Higher-Order Inference Methods

This repository contains the implementation of the paper: Revealing the Myth of Higher-Order Inference in Coreference Resolution.

Architecture

The basic end-to-end coreference model is a PyTorch re-implementation based on the TensorFlow model following similar preprocessing (see this repository).

There are four higher-order inference (HOI) methods experimented: Attended Antecedent, Entity Equalization, Span Clustering, and Cluster Merging. All are included here except for Entity Equalization which is experimented in the equivalent TensorFlow environment (see this separate repository).

Files:

Basic Setup

Set up environment and data for training and evaluation:

  • Install Python3 dependencies: pip install -r requirements.txt
  • Create a directory for data that will contain all data files, models and log files; set data_dir = /path/to/data/dir in experiments.conf
  • Prepare dataset (requiring OntoNotes 5.0 corpus): ./setup_data.sh /path/to/ontonotes /path/to/data/dir

For SpanBERT, download the pretrained weights from this repository, and rename it /path/to/data/dir/spanbert_base or /path/to/data/dir/spanbert_large accordingly.

Evaluation

Provided trained models:

The name of each directory corresponds with a configuration in experiments.conf. Each directory has two trained models inside.

If you want to use the official evaluator, download and unzip conll 2012 scorer under this directory.

Evaluate a model on the dev/test set:

  • Download the corresponding model directory and unzip it under data_dir
  • python evaluate.py [config] [model_id] [gpu_id]
    • e.g. Attended Antecedent:python evaluate.py train_spanbert_large_ml0_d2 May08_12-38-29_58000 0

Prediction

Prediction on custom input: see python predict.py -h

  • Interactive user input: python predict.py --config_name=[config] --model_identifier=[model_id] --gpu_id=[gpu_id]
    • E.g. python predict.py --config_name=train_spanbert_large_ml0_d1 --model_identifier=May10_03-28-49_54000 --gpu_id=0
  • Input from file (jsonlines file of this format): python predict.py --config_name=[config] --model_identifier=[model_id] --gpu_id=[gpu_id] --jsonlines_path=[input_path] --output_path=[output_path]

Training

python run.py [config] [gpu_id]

  • [config] can be any configuration in experiments.conf
  • Log file will be saved at your_data_dir/[config]/log_XXX.txt
  • Models will be saved at your_data_dir/[config]/model_XXX.bin
  • Tensorboard is available at your_data_dir/tensorboard

Configurations

Some important configurations in experiments.conf:

  • data_dir: the full path to the directory containing dataset, models, log files
  • coref_depth and higher_order: controlling the higher-order inference module
  • bert_pretrained_name_or_path: the name/path of the pretrained BERT model (HuggingFace BERT models)
  • max_training_sentences: the maximum segments to use when document is too long; for BERT-Large and SpanBERT-Large, set to 3 for 32GB GPU or 2 for 24GB GPU

Citation

@inproceedings{xu-choi-2020-revealing,
    title = "Revealing the Myth of Higher-Order Inference in Coreference Resolution",
    author = "Xu, Liyan  and  Choi, Jinho D.",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.686",
    pages = "8527--8533"
}
Comments
  • Training issue: with bert_base

    Training issue: with bert_base

    Hi @lxucs,

    I want to train a model for bert_base with no HOI like the spanbert_large_ml0_d1 model

    python run.py bert_base 0

    Got this issue:


    Traceback (most recent call last): File "run.py", line 289, in model = runner.initialize_model() File "run.py", line 51, in initialize_model model = CorefModel(self.config, self.device) File "/VL/space/sushantakp/research_work/coref-hoi/model.py", line 33, in init self.bert = BertModel.from_pretrained(config['bert_pretrained_name_or_path']) File "/VL/space/sushantakp/.conda/envs/skp_env376/lib/python3.7/site-packages/transformers/modeling_utils.py", line 935, in from_pretrained raise EnvironmentError(msg) OSError: Can't load weights for 'bert-base-cased'. Make sure that:

    • 'bert-base-cased' is a correct model identifier listed on 'https://huggingface.co/models'
    • or 'bert-base-cased' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

    Is it needed to change any parameter in experiments.conf ?

    • To handle above issue

    • to train with HOI/ No HOI

    opened by sushantakpani 11
  • CUDA out of memory error

    CUDA out of memory error

    Hi,

    First, I want to thank you so much for your valuable efforts, and this perfectly comprehensible and clean code.

    I do not know whether I should ask this here, but I ran into CUDA out of memory error in the evaluation phase (something like this: RuntimeError: CUDA out of memory. Tried to allocate 1.02 GiB (GPU 0; 7.93 GiB total capacity; 4.76 GiB already allocated; 948.81 MiB free; 6.23 GiB reserved in total by PyTorch).

    First, I ran into this error in the training phase. I reduced the size of some parameters in the experiments.conf file, which I think would help to reduce the GPU usage and they did because I am now able to pass the training phase. However, this error appears in the evaluation phase no matter how much I decrease the parameters like span width, max_sentence_len, or the ffnn size. I wonder if you had the same problem or do you have any suggestions for me.

    I am currently using GeForce GTX 1080 with 8GB memory.

    Many thanks, Arad

    opened by AradAshrafi 6
  • Preprocess - Split into segments function

    Preprocess - Split into segments function

    Hi again Liyan,

    I had some brief questions regarding splitting documents into segments. I think the segments contain more than one sentence (based on the split_into_segments function in the preprocess.py file). Was not it be better if segments contain one sentence at last? I could not see the intuition behind it. Is it better to have longer segments or it is for having more efficient use of resources? or Is it practically tested and the trained model gained better accuracy this way?

    Thanks, Arad

    opened by AradAshrafi 2
  • Data Set up issue in Basic Set up

    Data Set up issue in Basic Set up

    1. Install Python3 dependencies: pip install -r requirements.txt
    2. Create a directory for data that will contain all data files, models and log files; set data_dir = /path/to/data/dir in experiments.conf

    After step 1 and 2 I tried step 3 of the Basic setup

    Prepare dataset (requiring OntoNotes 5.0 corpus): ./setup_data.sh /path/to/ontonotes /path/to/data/dir

    . .

    reference-coreference-scorers/v8.01/test/DataFiles/TC-N.key reference-coreference-scorers/v8.01/test/test.pl reference-coreference-scorers/v8.01/test/TestCases.README bash: conll-2012/v3/scripts/skeleton2conll.sh: No such file or directory

    Though there exists a coref_hoi/data/dir/conll-2012/v3/scripts/skeleton2conll.sh file. Do I need to change any other file prior to running setup_data.sh ?

    opened by sushantakpani 2
  • which checkpoint of the trained weights should I use?

    which checkpoint of the trained weights should I use?

    Hi lxucs, There are 2 checkpoint of the trained weights, which one is the one used in your paper? Thanks

    Below is an example:

    train_spanbert_large_ml0_cm_fn1000_max_dloss/model_May14_05-15-38_63000.bin
    train_spanbert_large_ml0_cm_fn1000_max_dloss/model_May22_23-31-16_66000.bin
    
    opened by world2vec 1
  • License

    License

    The repo does not contain any license specification. It would be great if you could license it explicitly under a FOSS license so that further research can build upon this great code! Personally I'd suggest the MIT license but a Apache or a GPL variety could also be a great choice.

    Most of these licenses require attribution in source code distributions so you would have to be credited (as you should be :smiley:).

    opened by hatzel 1
  • Custom training data for coref-hoi

    Custom training data for coref-hoi

    Hi all, I was wondering if it is possible to use custom data that one can prepare themselves for training this model. If so, how does one do this with coref-hoi? Will it convert a txt file to the right format or does one have to convert it to a ConLL file first? Can it be ConLL-U? Thank you very much.

    opened by AlanQuille 0
  • Train on spanbert large, but get F1 1 point lower than presented in paprer

    Train on spanbert large, but get F1 1 point lower than presented in paprer

    Hi,

    I use spanbert large model with default parameters in config file, and I get Avg F1 78.27, lower than Avg.F1 79.9 in paper. config as following:

    num_docs = 2802 bert_learning_rate = 1e-05 task_learning_rate = 0.0003 max_segment_len = 512 ffnn_size = 3000 cluster_ffnn_size = 3000 max_training_sentences = 3 bert_tokenizer_name = bert-base-cased

    max_top_antecedents = 50 max_training_sentences = 5 top_span_ratio = 0.4 max_num_extracted_spans = 3900 max_num_speakers = 20 max_segment_len = 256

    Learning

    bert_learning_rate = 1e-5 task_learning_rate = 2e-4 loss_type = marginalized # {marginalized, hinge} mention_loss_coef = 0 false_new_delta = 1.5 # For loss_type = hinge adam_eps = 1e-6 adam_weight_decay = 1e-2 warmup_ratio = 0.1 max_grad_norm = 1 # Set 0 to disable clipping gradient_accumulation_steps = 1

    Model hyperparameters.

    coref_depth = 1 # when 1: no higher order (except for cluster_merging) higher_order = attended_antecedent # {attended_antecedent, max_antecedent, entity_equalization, span_clustering, cluster_merging} coarse_to_fine = true fine_grained = true dropout_rate = 0.3 ffnn_size = 1000 ffnn_depth = 1 cluster_ffnn_size = 1000 # For cluster_merging cluster_reduce = mean # For cluster_merging easy_cluster_first = false # For cluster_merging cluster_dloss = false # cluster_merging num_epochs = 24 feature_emb_size = 20 max_span_width = 30 use_metadata = true use_features = true use_segment_distance = true model_heads = true use_width_prior = true # For mention score use_distance_prior = true # For mention-ranking score

    Other.

    conll_eval_path = dev.english.v4_gold_conll # gold_conll file for dev conll_test_path = test.english.v4_gold_conll # gold_conll file for test genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"] eval_frequency = 1000 report_frequency = 100

    opened by yangjingyi 2
  • ValueError when predicting

    ValueError when predicting

    All the data and models required have been downloaded into proper path.

    Trying to run predict.py with command: python predict.py --config_name=train_spanbert_large_ml0_d2 --model_identifier=May08_12-38-29_58000 --gpu_id=0 and encounter ValueError:

    Traceback (most recent call last): File "predict.py", line 71, in nlp.add_pipe(nlp.create_pipe('sentencizer')) File "/home/qliu/anaconda3/envs/e2e/lib/python3.6/site-packages/spacy/language.py", line 754, in add_pipe raise ValueError(err) ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.sentencizer.Sentencizer object at 0x7f7fabe3f288> (name: 'None').

    • If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.

    • If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').

    • If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

    opened by QLiu-NLP 0
  • train on bert base

    train on bert base

    Hello, I'd to know how about the result of this model training on Bert_base? I have trianed on bert base with c2f , python run.py train_bert_base_ml0_d2, but only get a result about 67 F1

    opened by L-hongbin 9
Owner
Liyan
PhD student at Emory University (NLP Lab).
Liyan
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

Benedek Rozemberczki 393 Dec 13, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022
Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

Arie Cattan 29 Nov 28, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

null 93 Nov 6, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 419 Jan 3, 2023
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

RawVSR This repo contains the official codes for our paper: Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference Xiaoh

Xiaohong Liu 23 Oct 8, 2022
Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

null 342 Dec 2, 2022
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 2, 2023
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.6k Jan 2, 2023
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.5k Feb 12, 2021
This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields This is the code for "HyperNeRF: A Higher-Dimensional

Google 702 Jan 2, 2023