Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

CVLab@StonyBrook

Last update: Jan 1, 2023

Related tags

Computer Vision DewarpNet

Overview

DewarpNet

This repository contains the codes for DewarpNet training.

Recent Updates

[May, 2020] Added evaluation images and an important note about Matlab SSIM.
[Dec, 2020] Added OCR evaluation details.

Training

Prepare Data: train.txt & val.txt. Contents should be like:

1/824_8-cp_Page_0503-7Ns0001
1/824_1-cp_Page_0504-2Cw0001

Train Shape Network: python trainwc.py --arch unetnc --data_path ./data/DewarpNet/doc3d/ --batch_size 50 --tboard
Train Texture Mapping Network: python trainbm.py --arch dnetccnl --img_rows 128 --img_cols 128 --img_norm --n_epoch 250 --batch_size 50 --l_rate 0.0001 --tboard --data_path ./DewarpNet/doc3d

Inference:

Run: python infer.py --wc_model_path ./eval/models/unetnc_doc3d.pkl --bm_model_path ./eval/models/dnetccnl_doc3d.pkl --show

Evaluation (Image Metrics):

We use the same evaluation code as DocUNet. To reproduce the quantitative results reported in the paper use the images available here.
[Important note about Matlab version] We noticed that Matlab 2020a uses a different SSIM implementation which gives a better MS-SSIM score (0.5623). Whereas we have used Matlab 2018b. Please compare the scores according to your Matlab version.

Evaluation (OCR Metrics):

The 25 images used for OCR evaluation is /eval/ocr_eval/ocr_files.txt
The corresponding ground-truth text is given in /eval/ocr_eval/tess_gt.json
For the OCR errors reported in the paper we had used cv2.blur as pre-processing which gives higher error in all the cases. For convenience, we provide the updated numbers (without using blur) in the following table:

Method	ED	CER	ED (no blur)	CER (no blur)
DocUNet	1975.86	0.4656(0.263)	1671.80	0.403 (0.256)
DocUNet on Doc3D	1684.34	0.3955 (0.272)	1296.00	0.294 (0.235)
DewarpNet	1288.60	0.3136 (0.248)	1007.28	0.249 (0.236)
DewarpNet (ref)	1114.40	0.2692 (0.234)	812.48	0.204 (0.228)

We had used the Tesseract (v4.1.0) default configuration for evaluation with PyTesseract (v0.2.6).

Models:

Pre-trained models are available here. These models are captured prior to end-to-end training, thus won't give you the end-to-end results reported in Table 2 of the paper. Use the images provided above to get the exact numbers as Table 2.

Dataset:

The doc3D dataset can be downloaded using the scripts here.

More Stuff:

Citation:

If you use the dataset or this code, please consider citing our work-

@inproceedings{SagnikKeICCV2019, 
Author = {Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, Roy Shilkrot}, 
Booktitle = {Proceedings of International Conference on Computer Vision}, 
Title = {DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks}, 
Year = {2019}}

Acknowledgements:

These codes are heavily structured on pytorch-semseg.

Comments

Joint training phase

Thanks for the wonderful work~! I've done the separate training of two sub-network(depth prediction + texture mapping) with given data, the result looks good. But there's no code for joint training in origin paper(4.2), is it possible to add loss function to infer.py ?

opened by RunshengZhu 22
No augtexnames.txt file

Hi! For training WC flag augmentations=True is passed in dataloater https://github.com/cvlab-stonybrook/DewarpNet/blob/bdf01e613792f940c6ec45c4e3ebc5789b1703c6/trainwc.py#L40

So, there is augtexnames.txt file: https://github.com/cvlab-stonybrook/DewarpNet/blob/bdf01e613792f940c6ec45c4e3ebc5789b1703c6/loaders/doc3dwc_loader.py#L41

I couldn't find it in repository. How should this file look like?

Thanks~

opened by Enuvesta 14
The correct loss combination for bm training

Hi! I have been using this repo for months and have trained many models. For bm training, I have trained many models but none of them are usable. I noticed that for loss of bm model, there are comment session of SSIM loss. Should we add this to bm training? What is the correct setup for the loss of bm training? I tried 10* l1loss + 0.5*rloss but the performances are terrible.

opened by sdsajk 10
dense block

Hello, I'm confused of the implementation of dense block https://github.com/cvlab-stonybrook/DewarpNet/blob/f91bdf6248789e58c836cee828493d962396ec0c/models/densenetccnl.py#L62-L66 1、while i >0, even no use the layer 2、do you want use sum instead of torch.cat?

opened by SugarMasuo 6
Where is a Refinement Network?

Hi, I'm a student studying your project, DewarpNet! First of all, Thank you for sharing this awesome project :)

I read a paper, and Refinement Network to adjust for illumination effects in the rectified image is obviously recorded in the paper. But, I can't find this network in this code.

Is there any training code or pretrained model?

opened by mini102 4
Alignment of the bm network

I implemented the two-phase training schedule by exploiting the provided doc3D dataset (~10w). However, the output provided by infer.py is not correct and it seems that the networks are not properly trained.

To pinpoint the problem, I download the provided official pre-trained models. With official wc network + official bm network, the results are pretty good. With my trained wc network + official bm network, the results look fine. With my bm network + official wc network, the result is not correct, and it is the problem.

The loss of training bm seems fine (with lr = 1e-4, the train and val MSE loss decreases continuously and approximate 5 * 1e-5 at epoch 50). The mentioned solutions in other issues like pytorch version (pt1.4 + pt0.4.1) are tried but do not help.

The only modification is in doc3dbmnoimgc_loader.py, line 29, the self.altroot = "xxx/swat3d". Since I can not find this folder in the project, it is altered to self.root according to my understanding.

Would you please point out other possible reasons?

Looking forward for your reply, thanks!

opened by Inosonnia 4
Model performance

Hello, Thanks for sharing your code.

I was able to run the model but it seems doesn't perform as well as illustrated by the paper.

First, I tried the provided test images under /eval/inp, running with infer.py as suggested by README.md, the output doesn't look as good as those in /eval/uw;

Second, I tested inference with a few photo shots of an article angled on a black background, no other objects were in the photo. However, the output is either highly twisted or not fully angle-corrected.

Other info: Since there is no requirements info provided. I'm running with pytorch==1.3, CPU only, I also replaced numpy.misc.imread function to PIL and fixed dtypes.

Thank you, Julian

opened by julianyulu 4
question about joint training

Hi, I find that the output world coordinate is resized to (128, 128) frim (256, 256). So for joint learning, which size should the supervision put on, the size (256, 256) or (128, 128)?

Thank you!

opened by fh2019ustc 3
about data argumentation
Hi，Thanks for your released code. I have one questions：

For shape network, whether the intensity and color of each training image are also randomly jittered. In code,

For function color_jitter(im, brightness=0, contrast=0, saturation=0, hue=0): , should we use the parameter "saturation=0, hue=0";

Should we use the function change_intensity()， I find you comment out this code.

What is your implementation of these settings in your public model？
opened by fh2019ustc 3
Release the rectified images of the refinement network

Thanks for your solid work!

Could you provide the rectified images of the refinement network or the pre-trained model? We want to compare the results of our implementation.

Thank you!

opened by fh2019ustc 2
I need a requirement list please

Please update a requirement list here, cuz a error appears when I run infer.py and error is the 'scipy.misc' module doesn't have 'imread' and others, and I think the reason may be the version mismatch. Thx!!

opened by EinKung 2
Permission to use as a package

Hi, I'm working on a project that use text in image with extracting OCR machine and your source code is great and can increase accuracy and if I use your source be sure I will consider citing your work. But the problem is, how can I do that ? for example How can I write pip install ...??? Best regard,

opened by kobrafarshidi 0
Blender camera details

Hi @sagniklp,

I was wondering if you have any details about the camera parameters/model used when rendering doc3D in Blender? Is this constant throughout the dataset or do the camera parameters (other than position & orientation) change? Do you know the position of the camera in the scene relative to the meshes for each sample from doc3D?

opened by FloorVerhoeven 0
Can't reproduce the results with the final models

Hello!Thanks your great work! I used the final models you provided,and run infer.py to dewarp the DocUNet images, but got the MS-SSIM:0.449739 ,LD: 9.131016 with MATLAB R2018b and pytorch 0.4.1

opened by runxi607 0
Download Doc3D dataset

I've completed the form a few days ago, could you please send the username and password to [email protected]? Thanks so much for your work!

opened by KingRicardo 0

Owner

CVLab@StonyBrook

Computer Vision Lab at Stony Brook University

GitHub

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

91 Nov 22, 2022

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

30 Nov 5, 2022

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 1, 2022

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

84 Jan 5, 2022

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow

27 Dec 15, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

218 Dec 31, 2022

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

81 Jan 1, 2023

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

146 Dec 24, 2022

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

14 Oct 12, 2022

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

12 Oct 8, 2022

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

235 Dec 18, 2022

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

31 Nov 22, 2022

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Related tags

Overview

DewarpNet

Recent Updates

Training

Inference:

Evaluation (Image Metrics):

Evaluation (OCR Metrics):

Models:

Dataset:

More Stuff:

Citation:

Acknowledgements:

Comments

Owner

CVLab@StonyBrook

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"