This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"


Word-Level Coreference Resolution

This is a repository with the code to reproduce the experiments described in the paper of the same name, which was accepted to EMNLP 2021. The paper is available here.

Table of contents

  1. Preparation
  2. Training
  3. Evaluation


The following instruction has been tested with Python 3.7 on an Ubuntu 20.04 machine.

You will need:

  • OntoNotes 5.0 corpus (download here, registration needed)
  • Python 2.7 to run conll-2012 scripts
  • Java runtime to run Stanford Parser
  • Python 3.7+ to run the model
  • Perl to run conll-2012 evaluation scripts
  • CUDA-enabled machine (48 GB to train, 4 GB to evaluate)
  1. Extract OntoNotes 5.0 arhive. In case it's in the repo's root directory:

     tar -xzvf ontonotes-release-5.0_LDC2013T19.tgz
  2. Switch to Python 2.7 environment (where python would run 2.7 version). This is necessary for conll scripts to run correctly. To do it with with conda:

     conda create -y --name py27 python=2.7 && conda activate py27
  3. Run the conll data preparation scripts (~30min):

     sh ontonotes-release-5.0 data
  4. Download conll scorers and Stanford Parser:

  5. Prepare your environment. To do it with conda:

     conda create -y --name wl-coref python=3.7 openjdk perl
     conda activate wl-coref
     python -m pip install -r requirements.txt
  6. Build the corpus in jsonlines format (~20 min):

     python data/conll-2012/ --out-dir data

You're all set!


If you have completed all the steps in the previous section, then just run:

python train roberta

Use -h flag for more parameters and CUDA_VISIBLE_DEVICES environment variable to limit the cuda devices visible to the script. Refer to config.toml to modify existing model configurations or create your own.


Make sure that you have successfully completed all steps of the Preparation section.

  1. Download and save the pretrained model to the data directory.
  2. Generate the conll-formatted output:

     python eval roberta --data-split test
  3. Run the conll-2012 scripts to obtain the metrics:

     python roberta test 20
