Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Overview

DewarpNet

This repository contains the codes for DewarpNet training.

Recent Updates

  • [May, 2020] Added evaluation images and an important note about Matlab SSIM.
  • [Dec, 2020] Added OCR evaluation details.

Training

  • Prepare Data: train.txt & val.txt. Contents should be like:
1/824_8-cp_Page_0503-7Ns0001
1/824_1-cp_Page_0504-2Cw0001
  • Train Shape Network: python trainwc.py --arch unetnc --data_path ./data/DewarpNet/doc3d/ --batch_size 50 --tboard
  • Train Texture Mapping Network: python trainbm.py --arch dnetccnl --img_rows 128 --img_cols 128 --img_norm --n_epoch 250 --batch_size 50 --l_rate 0.0001 --tboard --data_path ./DewarpNet/doc3d

Inference:

  • Run: python infer.py --wc_model_path ./eval/models/unetnc_doc3d.pkl --bm_model_path ./eval/models/dnetccnl_doc3d.pkl --show

Evaluation (Image Metrics):

  • We use the same evaluation code as DocUNet. To reproduce the quantitative results reported in the paper use the images available here.

  • [Important note about Matlab version] We noticed that Matlab 2020a uses a different SSIM implementation which gives a better MS-SSIM score (0.5623). Whereas we have used Matlab 2018b. Please compare the scores according to your Matlab version.

Evaluation (OCR Metrics):

  • The 25 images used for OCR evaluation is /eval/ocr_eval/ocr_files.txt
  • The corresponding ground-truth text is given in /eval/ocr_eval/tess_gt.json
  • For the OCR errors reported in the paper we had used cv2.blur as pre-processing which gives higher error in all the cases. For convenience, we provide the updated numbers (without using blur) in the following table:
Method ED CER ED (no blur) CER (no blur)
DocUNet 1975.86 0.4656(0.263) 1671.80 0.403 (0.256)
DocUNet on Doc3D 1684.34 0.3955 (0.272) 1296.00 0.294 (0.235)
DewarpNet 1288.60 0.3136 (0.248) 1007.28 0.249 (0.236)
DewarpNet (ref) 1114.40 0.2692 (0.234) 812.48 0.204 (0.228)
  • We had used the Tesseract (v4.1.0) default configuration for evaluation with PyTesseract (v0.2.6).

Models:

  • Pre-trained models are available here. These models are captured prior to end-to-end training, thus won't give you the end-to-end results reported in Table 2 of the paper. Use the images provided above to get the exact numbers as Table 2.

Dataset:

  • The doc3D dataset can be downloaded using the scripts here.

More Stuff:

Citation:

If you use the dataset or this code, please consider citing our work-

@inproceedings{SagnikKeICCV2019, 
Author = {Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, Roy Shilkrot}, 
Booktitle = {Proceedings of International Conference on Computer Vision}, 
Title = {DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks}, 
Year = {2019}}   

Acknowledgements:

Comments
  • Joint training phase

    Joint training phase

    Thanks for the wonderful work~! I've done the separate training of two sub-network(depth prediction + texture mapping) with given data, the result looks good. But there's no code for joint training in origin paper(4.2), is it possible to add loss function to infer.py ?

    opened by RunshengZhu 22
  • No augtexnames.txt file

    No augtexnames.txt file

    Hi! For training WC flag augmentations=True is passed in dataloater https://github.com/cvlab-stonybrook/DewarpNet/blob/bdf01e613792f940c6ec45c4e3ebc5789b1703c6/trainwc.py#L40

    So, there is augtexnames.txt file: https://github.com/cvlab-stonybrook/DewarpNet/blob/bdf01e613792f940c6ec45c4e3ebc5789b1703c6/loaders/doc3dwc_loader.py#L41

    I couldn't find it in repository. How should this file look like?

    Thanks~

    opened by Enuvesta 14
  • The correct loss combination for bm training

    The correct loss combination for bm training

    Hi! I have been using this repo for months and have trained many models. For bm training, I have trained many models but none of them are usable. I noticed that for loss of bm model, there are comment session of SSIM loss. Should we add this to bm training? What is the correct setup for the loss of bm training? I tried 10* l1loss + 0.5*rloss but the performances are terrible.

    opened by sdsajk 10
  • dense block

    dense block

    Hello, I'm confused of the implementation of dense block https://github.com/cvlab-stonybrook/DewarpNet/blob/f91bdf6248789e58c836cee828493d962396ec0c/models/densenetccnl.py#L62-L66 1、while i >0, even no use the layer 2、do you want use sum instead of torch.cat?

    opened by SugarMasuo 6
  • Where is a Refinement Network?

    Where is a Refinement Network?

    Hi, I'm a student studying your project, DewarpNet! First of all, Thank you for sharing this awesome project :)

    I read a paper, and Refinement Network to adjust for illumination effects in the rectified image is obviously recorded in the paper. But, I can't find this network in this code.

    Is there any training code or pretrained model?

    opened by mini102 4
  • Alignment of the bm network

    Alignment of the bm network

    I implemented the two-phase training schedule by exploiting the provided doc3D dataset (~10w). However, the output provided by infer.py is not correct and it seems that the networks are not properly trained.

    To pinpoint the problem, I download the provided official pre-trained models. With official wc network + official bm network, the results are pretty good. With my trained wc network + official bm network, the results look fine. With my bm network + official wc network, the result is not correct, and it is the problem.

    The loss of training bm seems fine (with lr = 1e-4, the train and val MSE loss decreases continuously and approximate 5 * 1e-5 at epoch 50). The mentioned solutions in other issues like pytorch version (pt1.4 + pt0.4.1) are tried but do not help.

    The only modification is in doc3dbmnoimgc_loader.py, line 29, the self.altroot = "xxx/swat3d". Since I can not find this folder in the project, it is altered to self.root according to my understanding.

    Would you please point out other possible reasons?

    Looking forward for your reply, thanks!

    opened by Inosonnia 4
  • Model performance

    Model performance

    Hello, Thanks for sharing your code.

    I was able to run the model but it seems doesn't perform as well as illustrated by the paper.

    First, I tried the provided test images under /eval/inp, running with infer.py as suggested by README.md, the output doesn't look as good as those in /eval/uw;

    Second, I tested inference with a few photo shots of an article angled on a black background, no other objects were in the photo. However, the output is either highly twisted or not fully angle-corrected.

    Other info: Since there is no requirements info provided. I'm running with pytorch==1.3, CPU only, I also replaced numpy.misc.imread function to PIL and fixed dtypes.

    Thank you, Julian

    opened by julianyulu 4
  • question about joint training

    question about joint training

    Hi, I find that the output world coordinate is resized to (128, 128) frim (256, 256). So for joint learning, which size should the supervision put on, the size (256, 256) or (128, 128)?

    Thank you!

    opened by fh2019ustc 3
  • about data argumentation

    about data argumentation

    Hi,Thanks for your released code. I have one questions:

    For shape network, whether the intensity and color of each training image are also randomly jittered. In code,

    503bc4f91c3981aed685ad933146ecc be069f24553b4ffc32bd73434121499

    1. For function color_jitter(im, brightness=0, contrast=0, saturation=0, hue=0): , should we use the parameter "saturation=0, hue=0";
    2. Should we use the function change_intensity(), I find you comment out this code.

    What is your implementation of these settings in your public model?

    opened by fh2019ustc 3
  • Release the rectified images of the refinement network

    Release the rectified images of the refinement network

    Thanks for your solid work!

    Could you provide the rectified images of the refinement network or the pre-trained model? We want to compare the results of our implementation.

    Thank you!

    opened by fh2019ustc 2
  • I need a requirement list please

    I need a requirement list please

    Please update a requirement list here, cuz a error appears when I run infer.py and error is the 'scipy.misc' module doesn't have 'imread' and others, and I think the reason may be the version mismatch. Thx!!

    opened by EinKung 2
  • Permission to use as a package

    Permission to use as a package

    Hi, I'm working on a project that use text in image with extracting OCR machine and your source code is great and can increase accuracy and if I use your source be sure I will consider citing your work. But the problem is, how can I do that ? for example How can I write pip install ...??? Best regard,

    opened by kobrafarshidi 0
  • Blender camera details

    Blender camera details

    Hi @sagniklp,

    I was wondering if you have any details about the camera parameters/model used when rendering doc3D in Blender? Is this constant throughout the dataset or do the camera parameters (other than position & orientation) change? Do you know the position of the camera in the scene relative to the meshes for each sample from doc3D?

    opened by FloorVerhoeven 0
  • Can't reproduce the results with the final models

    Can't reproduce the results with the final models

    Hello!Thanks your great work! I used the final models you provided,and run infer.py to dewarp the DocUNet images, but got the MS-SSIM:0.449739 ,LD: 9.131016 with MATLAB R2018b and pytorch 0.4.1 1658913690752

    opened by runxi607 0
  • Download Doc3D dataset

    Download Doc3D dataset

    I've completed the form a few days ago, could you please send the username and password to [email protected]? Thanks so much for your work!

    opened by KingRicardo 0
Owner
CVLab@StonyBrook
Computer Vision Lab at Stony Brook University
CVLab@StonyBrook
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 5, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 5, 2022
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

This repository hosts the source code of our paper: [AAAI 2021]Sequential End-to-end Network for Efficient Person Search. SeqNet achieves the state-of

Zj Li 218 Dec 31, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

null 81 Jan 1, 2023
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 8, 2022
This repository contains the code for the paper "SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks"

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks (CVPR 2021 Oral) This repository contains the official PyTorch implementation

Shunsuke Saito 235 Dec 18, 2022
SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

SceneCollisionNet This repo contains the code for "Object Rearrangement Using Learned Implicit Collision Functions", an ICRA 2021 paper. For more info

NVIDIA Research Projects 31 Nov 22, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

Fusformer Code for the paper: "Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution" Plateform Python 3.8.5 + Pytor

Jin-Fan Hu (胡锦帆) 11 Dec 12, 2022
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

null 10 Oct 21, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022