This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Overview

Word-Level Coreference Resolution

This is a repository with the code to reproduce the experiments described in the paper of the same name, which was accepted to EMNLP 2021. The paper is available here.

Table of contents

  1. Preparation
  2. Training
  3. Evaluation

Preparation

The following instruction has been tested with Python 3.7 on an Ubuntu 20.04 machine.

You will need:

  • OntoNotes 5.0 corpus (download here, registration needed)
  • Python 2.7 to run conll-2012 scripts
  • Java runtime to run Stanford Parser
  • Python 3.7+ to run the model
  • Perl to run conll-2012 evaluation scripts
  • CUDA-enabled machine (48 GB to train, 4 GB to evaluate)
  1. Extract OntoNotes 5.0 arhive. In case it's in the repo's root directory:

     tar -xzvf ontonotes-release-5.0_LDC2013T19.tgz
    
  2. Switch to Python 2.7 environment (where python would run 2.7 version). This is necessary for conll scripts to run correctly. To do it with with conda:

     conda create -y --name py27 python=2.7 && conda activate py27
    
  3. Run the conll data preparation scripts (~30min):

     sh get_conll_data.sh ontonotes-release-5.0 data
    
  4. Download conll scorers and Stanford Parser:

     sh get_third_party.sh
    
  5. Prepare your environment. To do it with conda:

     conda create -y --name wl-coref python=3.7 openjdk perl
     conda activate wl-coref
     python -m pip install -r requirements.txt
    
  6. Build the corpus in jsonlines format (~20 min):

     python convert_to_jsonlines.py data/conll-2012/ --out-dir data
     python convert_to_heads.py
    

You're all set!

Training

If you have completed all the steps in the previous section, then just run:

python run.py train roberta

Use -h flag for more parameters and CUDA_VISIBLE_DEVICES environment variable to limit the cuda devices visible to the script. Refer to config.toml to modify existing model configurations or create your own.

Evaluation

Make sure that you have successfully completed all steps of the Preparation section.

  1. Download and save the pretrained model to the data directory.

     https://www.dropbox.com/s/vf7zadyksgj40zu/roberta_%28e20_2021.05.02_01.16%29_release.pt?dl=0
    
  2. Generate the conll-formatted output:

     python run.py eval roberta --data-split test
    
  3. Run the conll-2012 scripts to obtain the metrics:

     python calculate_conll.py roberta test 20
    
Comments
  • about the training process

    about the training process

    Here is the following error i met: Epoch 1: bc/cnn/00/cnn_0001 c_loss: 2.11580 s_loss: 0.57502: 14% 394/2802 [01:04<04:54, 8.18docs/s] It seems the training process stopped. can u tell me why? thanks.

    opened by leileilin 27
  • some confusions about convert_to_head.py

    some confusions about convert_to_head.py

    Hello, I have a new question about convert_ to_ heads.py file, in which some span and clusters will be deleted. Is this the case as follows? In those cases "A" and "A & B" are different spans with the same head word, "A". In our implementation such cases were simply discarded from the training set, because they were few and we were able to perform well, even though we couldn't predict any of such cases during inference. like u said in #2 thanks.

    opened by leileilin 14
  • about chinese dataset

    about chinese dataset

    Hello, thank you for your great work of open source. I want to process Chinese datasets according to your process, but in convert_ to_ jsonlines.py. Py this step reports an error, do you know why? Thanks.

    opened by leileilin 14
  • what is the equivalent of

    what is the equivalent of "edu.stanford.nlp.trees.EnglishGrammaticalStructure" for arabic coreference resolution task

    Hi,

    I can't find the ArabicGrammaticalStructure class from the nlp.stanford. It works for english data but not for Arabic .

    Converting constituents to dependencies... development: 0% 0/44 [00:00<?, ?docs/s]Exception in thread "main" java.lang.IllegalArgumentException: No head rule defined for PV+PVSUFF using class edu.stanford.nlp.trees.SemanticHeadFinder in PV+PVSUFF-39 at edu.stanford.nlp.trees.AbstractCollinsHeadFinder.determineNonTrivialHead(AbstractCollinsHeadFinder.java:222) at edu.stanford.nlp.trees.SemanticHeadFinder.determineNonTrivialHead(SemanticHeadFinder.java:348) at edu.stanford.nlp.trees.AbstractCollinsHeadFinder.determineHead(AbstractCollinsHeadFinder.java:179) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:476) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.GrammaticalStructure.(GrammaticalStructure.java:94) at edu.stanford.nlp.trees.EnglishGrammaticalStructure.(EnglishGrammaticalStructure.java:86) at edu.stanford.nlp.trees.EnglishGrammaticalStructure.(EnglishGrammaticalStructure.java:66) at edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams.getGrammaticalStructure(EnglishTreebankParserParams.java:2271) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper$GsIterator.primeGs(GrammaticalStructure.java:1361) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper$GsIterator.(GrammaticalStructure.java:1348) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper.iterator(GrammaticalStructure.java:1325) at edu.stanford.nlp.trees.GrammaticalStructure.main(GrammaticalStructure.java:1604) development: 0% 0/44 [00:00<?, ?docs/s] Traceback (most recent call last): File "convert_to_jsonlines.py", line 392, in convert_con_to_dep(args.tmp_dir, conll_filenames) File "convert_to_jsonlines.py", line 195, in convert_con_to_dep subprocess.run(cmd, check=True, stdout=out) File "/home/souid/anaconda3/envs/wl-coref/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['java', '-cp', 'downloads/stanford-parser.jar', 'edu.stanford.nlp.trees.EnglishGrammaticalStructure', '-basic', '-keepPunct', '-conllx', '-treeFile', 'temp/data/conll-2012/v4/data/development/data/arabic/annotations/nw/ann/00/ann_0010.v4_gold_conll']' returned non-zero exit status 1.

    opened by aymen-souid 11
  • Conll perl script refusing to score because of 10) in the response"">

    Conll perl script refusing to score because of "too many repeated mentions (>10) in the response"

    I ran the preparation scripts successfully.

    Downloaded the roberta checkpoint from dropbox link, and placed it in data folder.

    Ran the command: python calculate_conll.py roberta test 20

    I noticed some errors due to subprocess because I was using python3.6 instead of python3.7.

    Error was: unexpected keyword argument 'capture_output'

    Fixed the issue with this

    But then I got an error: 'NoneType' object has no attribute 'group' origin of error --> line 15

    I ran the perl script directly in bash: perl reference-coreference-scorers/scorer.pl all data/conll_logs/roberta_test_e20.gold.conll data/conll_logs/roberta_test_e20.pred.conll none

    MUC came out to be 86 (f1) but while calculating b3, I got this error: Found too many repeated mentions (> 10) in the response, so refusing to score. Please fix the output

    I think it is because of this error only that the line 15 above was throwing that error (because output was empty).

    How to proceed forward now? How to evaluate the results?

    opened by ritwikmishra 9
  • Inference from the box?

    Inference from the box?

    Hi! Thank you for posting the model. Could you please provide how to make inference from the box? If I understood correctly, model from dropbox has already been fitted, so we should be able to run it, but by the design model require original data and building of optimisers to run

    class CorefModel:
        Attributes:
            config (coref.config.Config): the model's configuration,
                see config.toml for the details
            epochs_trained (int): number of epochs the model has been trained for
            trainable (Dict[str, torch.nn.Module]): trainable submodules with their
                names used as keys
            training (bool): used to toggle train/eval modes
    
        Submodules (in the order of their usage in the pipeline):
            tokenizer (transformers.AutoTokenizer)
            bert (transformers.AutoModel)
            we (WordEncoder)
            rough_scorer (RoughScorer)
            pw (PairwiseEncoder)
            a_scorer (AnaphoricityScorer)
            sp (SpanPredictor)
        """
        def __init__(self,
                     config_path: str,
                     section: str,
                     epochs_trained: int = 0):
            """
            A newly created model is set to evaluation mode.
    
            Args:
                config_path (str): the path to the toml file with the configuration
                section (str): the selected section of the config file
                epochs_trained (int): the number of epochs finished
                    (useful for warm start)
            """
            self.config = CorefModel._load_config(config_path, section)
            self.epochs_trained = epochs_trained
            self._docs: Dict[str, List[Doc]] = {}
            self._build_model()
            self._build_optimizers()
            self._set_training(False)
            self._coref_criterion = CorefLoss(self.config.bce_loss_weight)
            self._span_criterion = torch.nn.CrossEntropyLoss(reduction="sum")
    

    So maybe there is an option to run it without all this stuff?

    opened by Dzz1th 7
  • Questions_dataset-representation

    Questions_dataset-representation

    Based on my observation in this code base, training use the following features, e.g: cased_words, sent_id, speaker, pos, deprel, head, clusters.

    then converted into: cased_words, sent_id, speaker, pos, deprel, head, head2span, word_clusters, span_clusters.

    while in inference data example, the feature used only cased_words, sent_id, and optionally speaker information.

    My questions is.

    1. how we get the pos, deprel, head, and clusters data from in inference mode? It is derived from cased_words or not?
    2. in training mode, is the speaker, pos, deprel, head, clusters data is used as well?

    Thank you

    opened by fajarmuslim 7
  • shall I use convert_to_heads when using CoNLL-U?

    shall I use convert_to_heads when using CoNLL-U?

    Hi, thanks so much for your work! I have a question regarding convert_to_heads.py script. I'm trying to make RoBERTa learn coreference resolution, but my data is in .conllu format. I have quite hard time trying to preprocess data/modify some of your code to make it work. Can you share some insights/thoughts on that? I would be very much obliged.

    Cheers

    opened by brgsk 7
  • about the training data format

    about the training data format

    Hello, I'd like to ask about the .jsonlines file executived through convert_ to_ jsonlines. py, Can some attributes in the jsonlines file be successfully trained after being discarded? Such as speaker, pos.

    opened by leileilin 5
  • Inference on conversation.

    Inference on conversation.

    Hello, great work.

    I had two questions:

    1. what sent_id in the sample input file supposed to refer to??

    2. If I want to make an inference for dialogue like tc genre, what should be the conversation format ??

    opened by maherr13 5
  • Questions about training

    Questions about training

    Currently, when running this source code. I have an error cuda running out of memory. Since single GPU have only 32GB memory.

    but, in another side, I have access to server which have 8 GPU (each of them having 32GB memory). Can I run this training experiment with the paralel mode?

    If it can, how to achieve that?

    thanks in advance..

    opened by fajarmuslim 5
  • Reduce training memory requirement

    Reduce training memory requirement

    CUDA-enabled machine (48 GB to train, 4 GB to evaluate)

    @vdobrovolskii friendly ping Are 48GB really needed to train? Can't we train longer (how long) with less ? couldn't your project leverage FP16, FP8 and other optimizations ? You can get them out of the box if you use roberta from the Transformers library https://github.com/huggingface/transformers Also there is accelerate https://huggingface.co/docs/accelerate/index

    I have a 3070 with 8GB of GDDR6 :/

    opened by LifeIsStrange 2
Owner
null
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
null 190 Jan 3, 2023
This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Omnimatte in PyTorch This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effect

Erika Lu 728 Dec 28, 2022
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 30, 2022
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Arun Verma 1 Nov 9, 2021
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 1, 2023
Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

Contra-OOD Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers. Requirements PyTorch Transformers datasets

Wenxuan Zhou 27 Oct 28, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 3, 2023
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Don’t be Contradicted with Anything!CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System This repository contains the PyTorch im

Libo Qin 25 Sep 6, 2022
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 12 Sep 26, 2021
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

null 12 Nov 22, 2022
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

Phong Nguyen Ha 4 May 26, 2022
Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

Interactive Machine Comprehension with Dynamic Knowledge Graphs Implementation for the EMNLP 2021 paper. Dependencies apt-get -y update apt-get instal

Xingdi (Eric) Yuan 19 Aug 23, 2022
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 3, 2023
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 1, 2023
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Junxian He 57 Jan 1, 2023
EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation.

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at [email protected]

null 9 Oct 28, 2022