public repo for ESTER dataset and modeling (EMNLP'21)

PlusLab

Last update: Oct 27, 2022

Related tags

Deep Learning ESTER

Overview

Project / Paper Introduction

This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350

Here, we provide brief descriptions of the final data and detailed instructions to reproduce results in our paper. For more details, please refer to the paper.

Data

Final data used for the experiments are saved in ./data/ folder with train/dev/test splits. Most data fields are straightforward. Just a few notes,

question_event: this field is not provided by annotators nor used for our experiments. We simply use some heuristic rules based on POS tags to extract possible events in the questions. Users are encourages to try alternative tools such semantic role labeling.
original_events and indices are the annotator-provided event triggers plus their indices in the context.
answer_texts and answer_indices (in train and dev) are the annotator-provided answers plus their indices in the context.

Please Note: the evaluation script below (II) only works for the dev set. Please refer to Section III for submission to our leaderboard: https://eventqa.github.io

Models

I. Install packages.

We list the packages in our environment in env.yml file for your reference. Below are a few key packages.

python=3.8.5
pytorch=1.6.0
transformers=3.1.0
cudatoolkit=10.1.243
apex=0.1

To install apex, you can either follow official instruction: https://github.com/NVIDIA/apex or conda: https://anaconda.org/conda-forge/nvidia-apex

II. Replicate results in our paper.

1. Download trained models.

For reproduction purpose, we release all trained models.

Download link: https://drive.google.com/drive/folders/1bTCb4gBUCaNrw2chleD4RD9JP1_DOWjj?usp=sharing.
We only provide models with the best "hyper-parameters", and each comes with three random seeds: 5, 7, 23.
Make several directories to save models ./output/, ./output/facebook/ and ./output/allenai/.
For BART models, download them into ./output/facebook/.
For UnifiedQA models, download them into ./output/allenai/.
All other models can be saved in ./output/ directly. These ensure evaluation scripts run properly below.

2. Zero-shot performances in Table 3.

Run bash ./code/eval_zero_shot.sh. Model options are provided in the script.

3. Generative QA Fine-tuning performances in Table 3.

Run bash ./code/eval_ans_gen.sh. Make sure the following arguments are set correctly in the script.

Model Options provided in the script
Set suffix=""
Set lrs and batch according to model options. You can find these numbers in Appendix G of the paper.

4. Figure 6: UnifiedQA-large model trained with sub-samples.

Run bash ./code/eval_ans_gen.sh`. Make sure the following arguments are set correctly in the script.

model="allenai/unifiedqa-t5-large"
suffix={"_500" | "_1000" | "_2000" | "_3000" | "_4000"}
Set lrs and batch accordingly. You can find these information in the folder name containing the trained model objects.

5. Table 4: 500 original annotations v.s. completed

bash ./code/eval_ans_gen.sh with model="allenai/unifiedqa-t5-large and suffix="_500original
bash ./code/eval_ans_gen.sh with model="allenai/unifiedqa-t5-large and suffix="_500completed
Set lrs and batch accordingly again.

6. Extractive QA Fine-tuning performances in Table 3.

Simply run bash ./code/eval_span_pred.sh as it is.

7. Figure 8: Extractive QA Fine-tuning performances by changing positive weights.

Run bash ./code/eval_span_pred.sh.
Set pw, lrs and batch according to model folder names again.

III. Submission to ESTER Leaderboard

Set model_dir to your target models
Run leaderboard.sh, which outputs pred_dev.json and pred_test.json under ./output
If you write your own code to output predictions, make sure they follow our original sample order.
Email pred_test.json to us following in the format specified here: https://eventqa.github.io Sample outputs (using one of our UnifiedQA-large models) are provided under ./output

IV. Model Training

We also provide the model training scripts below.

1. Generative QA: Fine-tuning in Table 3.

Run bash ./code/run_ans_generation.sh.
Model options and hyper-parameter search range are provided in the script.
We use --fp16 argument to activate apex for GPU memory efficient training except for UnifiedQA-t5-large (trained on A100 GPU).

2. Figure 6: UnifiedQA-large model trained with sub-samples.

Run bash ./code/run_ans_gen_subsample.sh.
Set sample_size variable accordingly in the script.

3. Table 4: 500 original annotations v.s. completed

Run bash ./code/run_ans_gen.sh with model="allenai/unifiedqa-t5-large and suffix="_500original
Run bash ./code/run_ans_gen.sh with model="allenai/unifiedqa-t5-large and suffix="_500completed

4. Extractive QA Fine-tuning in Table 3 + Figure 8

Simply run bash ./code/run_span_pred.sh as it is.

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

174 Dec 22, 2022

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

Comments

Running error
Hello, Thanks for your released source code. However, I met an error as follows when I ran run_span_pred.py:

File "/home/cuishiyao/ESTER-master/code/utils.py", line 198, in filter_outputs filtered_logits.append(logits[b, offset[i], :].unsqueeze(0))

IndexError: index 320 is out of bounds for dimension 1 with size 320

I am looking forward to your early reply!
opened by carrie0307 1

public repo for ESTER dataset and modeling (EMNLP'21)

Related tags

Overview

Project / Paper Introduction

Data

Please Note: the evaluation script below (II) only works for the dev set. Please refer to Section III for submission to our leaderboard: https://eventqa.github.io

Models

I. Install packages.

II. Replicate results in our paper.

1. Download trained models.

2. Zero-shot performances in Table 3.

3. Generative QA Fine-tuning performances in Table 3.

4. Figure 6: UnifiedQA-large model trained with sub-samples.

5. Table 4: 500 original annotations v.s. completed

6. Extractive QA Fine-tuning performances in Table 3.

7. Figure 8: Extractive QA Fine-tuning performances by changing positive weights.

III. Submission to ESTER Leaderboard

IV. Model Training

1. Generative QA: Fine-tuning in Table 3.

2. Figure 6: UnifiedQA-large model trained with sub-samples.

3. Table 4: 500 original annotations v.s. completed

4. Extractive QA Fine-tuning in Table 3 + Figure 8

You might also like...

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Public scripts, services, and configuration for running a smart home K3S network cluster

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Comments

Running error

Owner

PlusLab

A public available dataset for road boundary detection in aerial images

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Repo for the Video Person Clustering dataset, and code for the associated paper

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

This repo tries to recognize faces in the dataset you created

Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

This is the dataset and code release of the OpenRooms Dataset.

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.