This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Last update: Dec 22, 2022

Related tags

Computer Vision GRCNN-for-OCR

Overview

Gated Recurrent Convolution Neural Network for OCR

This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: https://papers.nips.cc/paper/6637-gated-recurrent-convolution-neural-network-for-ocr.pdf

Update

The journal version of GRCNN has been accepted by T-PAMI 2021, and the code is available at:

https://github.com/Jianf-Wang/GRCNN

Build

The GRCNN is built upon the CRNN. The requirements are:

Ubuntu 14.04
CUDA 7.5
CUDNN 5

For the convenience of compiling, we provide the dependencies from here: https://pan.baidu.com/s/1c21zl1e#list/path=%2F

It is more convenient if you use nivdia-docker image (@rremani supplied) : https://hub.docker.com/r/rremani/cuda_crnn_torch/

After installing the dependencies, go to src/ and execute build_cpp.sh to build the C++ code. If successful, a file named libcrnn.so should be produced in the src/ directory.

Inference

We provide the pretrained model from here. Put the downloaded model file into directory model/GRCL/. Moreover, we provide the IC03 dataset in the "./data/IC03" directory. You need to change the directories listed in the "test.txt". The "test_label.txt" is the ground truth of each image. The "lexicon_50.txt" is the lexicon of IC03.

"src/evaluation.lua": Lexicon-free evaluation

"src/evaluation_lex.lua" Lexicon-based evaluation

The evaluation code will output the recognition accuracy.

Train a new model

Follow the following steps to train a new model on your own dataset.

Create a new LMDB dataset.src/create_own_dataset.py(need to pip install lmdb first).
You can modify the configuration in model/GRCL/GRCL_LSTM_pretrain.lua
Go to src/ and execute th main_train.lua ../model/GRCL/ ../model/saved_model. Model snapshots will be saved into ../model/saved_model.

Visualization

We visualize the RCNN , DenseNet and GRCNN to verify the dynamic receptive fields in GRCNN for OCR. There are clearly gaps among different characters, and for each character, the unrelated parts do not provide strong signal.

Citation

@inproceedings{jianfeng2017deep,
 author    = {Wang, Jianfeng and Hu, Xiaolin},
 title     = {Gated Recurrent Convolution Neural Network for OCR},
 booktitle = {Advances in Neural Information Processing Systems},
 year      = {2017}
}

You might also like...

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image

840 Dec 26, 2022

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 2, 2023

Comments

Pretrained model link is broken

https://pan.baidu.com/s/1c21zl1e#list/path=%2F Link can't be reached.

"The page you are trying to view cannot be shown because the authenticity of the received data could not be verified."

opened by dgsbicak 0

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Related tags

Overview

Gated Recurrent Convolution Neural Network for OCR

Update

Build

Inference

Train a new model

Visualization

Citation

You might also like...

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

A post-processing tool for scanned sheets of paper.

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Comments

Pretrained model link is broken

Owner

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.