Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Mohamed Ali Souibgui

Last update: Jan 7, 2023

Related tags

Deep Learning deep-learning transformers auto-encoders image-enhancement document-binarization vision-transformer

Overview

DocEnTR

Description

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images	Our Binarization

Download Code

clone the repository:

git clone https://github.com/dali92002/DocEnTR
cd DocEnTr

Requirements

install requirements.txt

Process Data

Data Path

We gathered the DIBCO, H-DIBCO and PALM datasets and organized them in one folder. You can download it from this link. After downloading, extract the folder named DIBCOSETS and place it in your desired data path. Means: /YOUR_DATA_PATH/DIBCOSETS/

Data Splitting

Specify the data path, split size, validation and testing sets to prepare your data. In this example, we set the split size as (256 X 256), the validation set as 2016 and the testing as 2018 while running the process_dibco.py file.

python process_dibco.py --data_path /YOUR_DATA_PATH/ --split_size 256 --testing_dataset 2018 --validation_dataset 2016

Using DocEnTr

Training

For training, specify the desired settings (batch_size, patch_size, model_size, split_size and training epochs) when running the file train.py. For example, for a base model with a patch_size of (16 X 16) and a batch_size of 32 we use the following command:

python train.py --data_path /YOUR_DATA_PATH/ --batch_size 32 --vit_model_size base --vit_patch_size 16 --epochs 151 --split_size 256 --validation_dataset 2016

You will get visualization results from the validation dataset on each epoch in a folder named vis+"YOUR_EXPERIMENT_SETTINGS" (it will be created). In the previous case it will be named visbase_256_16. Also, the best weights will be saved in the folder named "weights".

Testing on a DIBCO dataset

To test the trained model on a specific DIBCO dataset (should be matched with the one specified in Section Process Data, if not, run process_dibco.py again). Download the model weights (In section Model Zoo), or use your own trained model weights. Then, run the following command. Here, I test on H-DIBCO 2018, using the Base model with 8X8 patch_size, and a batch_size of 16. The binarized images will be in the folder ./vis+"YOUR_CONFIGS_HERE"/epoch_testing/

python test.py --data_path /YOUR_DATA_PATH/ --model_weights_path  /THE_MODEL_WEIGHTS_PATH/  --batch_size 16 --vit_model_size base --vit_patch_size 8 --split_size 256 --testing_dataset 2018

Demo

To be added ... (Using our Pretrained Models To Binarize A Single Degraded Image)

Model Zoo

In this section we release the pre-trained weights for all the best DocEnTr model variants trained on DIBCO benchmarks.

	Testing data	Models	Patch size	URL	PSNR
0	DIBCO 2011	DocEnTr-Base	8x8	model	20.81
0	DIBCO 2011	DocEnTr-Large	16x16	model	20.62
1	H-DIBCO 2012	DocEnTr-Base	8x8	model	22.29
1	H-DIBCO 2012	DocEnTr-Large	16x16	model	22.04
2	DIBCO 2017	DocEnTr-Base	8x8	model	19.11
2	DIBCO 2017	DocEnTr-Large	16x16	model	18.85
3	H-DIBCO 2018	DocEnTr-Base	8x8	model	19.46
3	H-DIBCO 2018	DocEnTr-Large	16x16	model	19.47

Citation

If you find this useful for your research, please cite it as follows:

@article{souibgui2022docentr,
  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={ Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  journal={arXiv preprint arXiv:2201.10252},
  year={2022}
}

Authors

Conclusion

There should be no bugs in this code, but if there is, we are sorry for that :') !!

Comments

What is the `masking_ratio`?

Hello! Could you help me figure out this part of code? Here is created masking_ratio variable, but it not used anywhere else. What was the intended purpose of these lines? Maybe a different model was meant originally here?

https://github.com/dali92002/DocEnTR/blob/2e09b9e4e904802fb83f6b8e8176bdca150c53f9/models/binae.py#L21-L22

opened by theotheo 2
Looking forward to ability to do Demo...

Hi I read your paper on DocEnTR with interest and am looking forward to trying a demo (i.e., Using our Pretrained Models To Binarize A Single Degraded Image)... do you know when that will be available? Thanks!

opened by sjscotti 2
Multi core

Hi, great paper and thank you for sharing the code.

I was able to run test.py. I did some code correction for paths in utils.py. I needed to update path. My root folder for dataset was difrent: gt_folder = 'data/DIBCOSETS/'+valid_data+'/gt_imgs'

I test it on my PC with NVIDIA GeForce GTX 1660 Ti (6GB GDDR6 memory). In the test folder were 266 images (255x255) and it took ~27sec to process.

The second test was on the CPU. And I can see that use only one core. The time for processing is ~12min=720sec.

Is there a way to run this model prediction on a multi-core CPU and optimize time?

And I'm hoping for code "Process a single image or multiple images without GT" (https://github.com/dali92002/DocEnTR/issues/2

Thanks again

opened by grungert 1
Process a single image or multiple images without GT

Dear Authors,

Your solution seems to be remarkably good and we would like to include in our tests for a new publication. I believe your solution has potential to be among the best tested.

If providing a way of running the code for a single image is too much for now, could you modify your dibco test code in a way that it reads several images without ground-truth?

opened by rbbernardino 1
Add Replicate demo and API
Hey @dali92002! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can run your model! We have added some of the pre-trained models to the demo, view it here: https://replicate.com/cjwbw/docentr

Replicate also have an API, so people can easily run your model from their code:

import replicate model = replicate.models.get("cjwbw/docentr") output = model.predict(image="...")

If you'd like to modify the Replicate page (e.g. Example Gallery), let me know and I can transfer ownership to your account.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊
opened by chenxwh 0
Fix Demo to run on CPU
Fixes demo.ipynb to run on CPU

Fixes demo.ipynb to create output dir if it doesn't exist

Add working python version to README (latest version doesn't work)

Adds another example input/output
opened by kym6464 0

Owner

Mohamed Ali Souibgui

PhD Student in Computer Vision

GitHub

A embed able annotation tool for end to end cross document co-reference

CoRefi CoRefi is an emebedable web component and stand alone suite for exaughstive Within Document and Cross Document Coreference Anntoation. For a de

39 Dec 12, 2022

DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

348 Jan 7, 2023

Pytorch library for end-to-end transformer models training and serving

768 Jan 1, 2023

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

The code release of paper Low-Light Image Enhancement with Normalizing Flow

[AAAI 2022] Low-Light Image Enhancement with Normalizing Flow Paper | Project Page Low-Light Image Enhancement with Normalizing Flow Yufei Wang, Renji

176 Jan 6, 2023

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Tensorflow implementation of MIRNet for Low-light image enhancement

MIRNet Tensorflow implementation of the MIRNet architecture as proposed by Learning Enriched Features for Real Image Restoration and Enhancement. Lanu

91 Jan 6, 2023

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch>=1.0 OpenCV-Python, TensorboardX

30 Jan 1, 2023

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

6 Dec 23, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Related tags

Overview

DocEnTR

Description

Download Code

Requirements

Process Data

Data Path

Data Splitting

Using DocEnTr

Training

Testing on a DIBCO dataset

Demo

Model Zoo

Citation

Authors

Conclusion

Comments

What is the `masking_ratio`?

Looking forward to ability to do Demo...

Multi core

Process a single image or multiple images without GT

Add Replicate demo and API

Fix Demo to run on CPU

Owner

Mohamed Ali Souibgui

A embed able annotation tool for end to end cross document co-reference

DUE: End-to-End Document Understanding Benchmark

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Pytorch library for end-to-end transformer models training and serving

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

The code release of paper Low-Light Image Enhancement with Normalizing Flow

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Tensorflow implementation of MIRNet for Low-light image enhancement

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

An end-to-end PyTorch framework for image and video classification

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

METER: Multimodal End-to-end TransformER

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch