DUE: End-to-End Document Understanding Benchmark

Overview

This is the repository that provide tools to download data, reproduce the baseline results and evaluation.

What can you achieve with this guide

Based on this repository, you may be able to:

  1. download data for benchmark in a unified format.
  2. run all the baselines.
  3. evaluate already trained baseline models.

Install benchmark-related repositories

Start the container:

sudo userdocker run nvcr.io/nvidia/pytorch:20.12-py3

Clone the repo with:

git clone [email protected]:due-benchmark/baselines.git

Install the requirements:

pip install -e .

1. Download datasets and the base model

The datasets are re-hosted on the https://duebenchmark.com/data and can be downloaded from there. Moreover, since the baselines are finetuned based on the T5 model, you need to download the original model. Again it is re-hosted at https://duebenchmark.com/data. Please place it into the due_benchmark_data directory after downloading.

TODO: dopisać resztę

2. Run baseline trainings

2.1 Process datasets into memmaps (binarization)

In order to process datasets into memmaps, set the directory downloaded_data_path to downloaded data, set memmap_directory to a new directory that will store binarized datas, and use the following script:

./create_memmaps.sh

2.2 Run training script

Single training can be started with the following command, assuming out_dir is set as an output for the trained model's checkpoints and generated outputs. Additionally, set datas to any of the previously generated datasets (e.g., to DeepForm).

python benchmarker/cli/l5/train.py \
    --model_name_or_path ${downloaded_data_path}/t5-base \
    --relative_bias_args="[{\"type\":\"1d\"}]" \
    --dropout_rate 0.15 \
    --model_type=t5 \
    --output_dir ${out_dir} \
    --data_dir ${memmap_directory}/${datas}_memmap/train \
    --val_data_dir ${memmap_directory}/${datas}_memmap/dev \
    --test_data_dir ${memmap_directory}/${datas}_memmap/test \
    --gpus 1 \
    --max_epochs 30 \
    --train_batch_size 1 \
    --eval_batch_size 2 \
    --overwrite_output_dir \
    --accumulate_grad_batches 64 \
    --max_source_length 1024 \
    --max_target_length 256 \
    --eval_max_gen_length 16 \
    --learning_rate 2e-4 \
    --lr_scheduler constant \
    --warmup_steps 100 \
    --trim_batches \ 
    --do_train \
    --do_predict \ 
    --additional_data_fields doc_id label_name \
    --early_stopping_patience 20 \
    --segment_levels tokens pages \
    --optimizer adamw \
    --weight_decay 1e-5 \
    --adam_epsilon 1e-8 \
    --num_workers 4 \
    --val_check_interval 1

The models presented in the paper differs only in two places. The first is the choice of --relative_bias_args. T5 uses [{'type': '1d'}] whereas both +2D and +DALL-E use [{'type': '1d'}, {'type': 'horizontal'}, {'type': 'vertical'}]

Moreover +DALL-E had --context_embeddings set to [{'dimension': 1024, 'use_position_bias': False, 'embedding_type': 'discrete_vae', 'pretrained_path': '', 'image_width': 256, 'image_height': 256}]

3. Evaluate

3.1 Convert output to the submission file

In order to compare two files (generated by the model with the provided library and the gold-truth answers), one has to convert the generated output into a format that can be directly compared with documents.jsonl. Please use:

python to_submission_file.py ${downloaded_data_path} ${out_dir}

3.2 Evaluate reproduced models

Finally outputs can be evaluated using the provided evaluator. First, get back into main directory, where this README.md is placed and install it by cd due_evaluator-master && pip install -r requirement And run:

python due_evaluator --out-files baselines/test_generations.jsonl --reference ${downloaded_data_path}/DeepForm

3.3 Evaluate baseline outputs

We provide an examples of outputs generated by our baseline (DeepForm). They should be processed with:

python benchmarker-code/to_submission_file.py ${downloaded_data_path}/model_outputs_example ${downloaded_data_path}
python due_evaluator --out-files ./benchmarker/cli/l5/baselines/test_generations.txt.jsonl --reference ${downloaded_data_path}/DeepForm/test/document.jsonl

The expected output should be:

       Label       F1  Precision   Recall
  advertiser 0.512909   0.513793 0.512027
contract_num 0.778761   0.780142 0.777385
 flight_from 0.794376   0.795775 0.792982
   flight_to 0.804921   0.806338 0.803509
gross_amount 0.355476   0.356115 0.354839
         ALL 0.649771   0.650917 0.648630
You might also like...
 PURE: End-to-End Relation Extraction
PURE: End-to-End Relation Extraction

PURE: End-to-End Relation Extraction This repository contains (PyTorch) code and pre-trained models for PURE (the Princeton University Relation Extrac

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

An end-to-end PyTorch framework for image and video classification
An end-to-end PyTorch framework for image and video classification

What's New: March 2021: Added RegNetZ models November 2020: Vision Transformers now available, with training recipes! 2020-11-20: Classy Vision v0.5 R

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Official code for
Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.
KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems This is the implementation of the paper: Learning Knowledge Bases with Par

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

 Neural Dynamic Policies for End-to-End Sensorimotor Learning
Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

Comments
  • Pre-training dataset release

    Pre-training dataset release

    Thank you for sharing great work! I was wondering if there is an expected date on when you will be releasing your pre-training dataset, some code for crawling, or download URL? To be fair with comparing your baseline model (T5+2D+U), please share above information.

    opened by rtanaka-lab 3
  • Several questions regarding the fine-tuned models and fine-tuning code

    Several questions regarding the fine-tuned models and fine-tuning code

    Thank the authors for making the fine-tuned models available for download! Could you please help me with several issues regarding the fine-tuned models and the fine-tuning code?

    Questions for fine-tuned models:

    • The links to the fine-tuned models PWC T5+2D and PWC T5+2D+U are invalid. Could you please update the links on the website?
    • Is it possible to also release the fine-tuned models for T5 and T5+U in addition to T5+2D and T5+2D+U? Especially since the pre-training code is not released, it is a bit hard to reproduce the results of T5+U.

    Questions for the fine-tuning code: When running the fine-tuning using the provided scripts for DocVQA and InfographicsVQA, it reports an error when val_metric is set to anls:

    RuntimeError: Early stopping conditioned on metric `val_anls` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `generation_results`, `loss`, `lr_group_0`, `lr_group_1`, `lr_group_2`, `val_gen_time`, `val_gen_len`, `val_loss`.
    

    Is there any suggestion for fixing this error? Thank you!

    opened by rui-yan 0
Owner
null
A embed able annotation tool for end to end cross document co-reference

CoRefi CoRefi is an emebedable web component and stand alone suite for exaughstive Within Document and Cross Document Coreference Anntoation. For a de

PythicCoder 39 Dec 12, 2022
BlockUnexpectedPackets - Preventing BungeeCord CPU overload due to Layer 7 DDoS attacks by scanning BungeeCord's logs

BlockUnexpectedPackets This script automatically blocks DDoS attacks that are sp

SparklyPower 3 Mar 31, 2022
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 7, 2023
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

UAV-Human Official repository for CVPR2021: UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicle Paper arXiv Res

null 129 Jan 4, 2023
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ ?? ??‍?? ??‍⚖️ Dataset Summary Inspired by the recent widespread use of th

null 95 Dec 8, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

null 34 May 24, 2022
End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

null 472 Dec 22, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 7, 2023