Introduction

GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on OCR domain. You can use them to infer and train the model with your customized dataset. The solution architecture of this project is re-implemented from facebook Detectron and openmm-cv.

Installation

Refer to the guideline of gen_ocr installation

Inference

Configuration

Model text detection

Supported Algorithms:

Text Detection

Algorithm	Paper	Python argument (--det)
- [x] DBNet (AAAI'2020)	https://arxiv.org/pdf/1911.08947	DB_r18, DB_r50
- [x] Mask R-CNN (ICCV'2017)	https://arxiv.org/abs/1703.06870	MaskRCNN_CTW, MaskRCNN_IC15, MaskRCNN_IC17
- [x] PANet (ICCV'2019)	https://arxiv.org/abs/1908.06391	PANet_CTW, PANet_IC15
- [x] PSENet (CVPR'2019)	https://arxiv.org/abs/1903.12473	PS_CTW, PS_IC15
- [x] TextSnake (ECCV'2018)	https://arxiv.org/abs/1807.01544	TextSnake
- [x] DRRG (CVPR'2020)	https://arxiv.org/abs/2003.07493	DRRG
- [x] FCENet (CVPR'2021)	https://arxiv.org/abs/2104.10442	FCE_IC15, FCE_CTW_DCNv2

Table 1: Text detection algorithms, papers and arguments configuration in package.

Model text recognition

Text Recognition

Algorithm	Paper	Python argument (--recog)
- [x] CRNN (TPAMI'2016)	https://arxiv.org/abs/1507.05717	CRNN, CRNN_TPS
- [x] NRTR (ICDAR'2019)	https://arxiv.org/abs/1806.00926	NRTR_1/8-1/4, NRTR_1/16-1/8
- [x] RobustScanner (ECCV'2020)	https://arxiv.org/abs/2007.07542	RobustScanner
- [x] SAR (AAAI'2019)	https://arxiv.org/abs/1811.00751	SAR
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)	https://arxiv.org/abs/1910.04396	SATRN, SATRN_sm
- [x] SegOCR (Manuscript'2021)	-	SEG

Table 2: Text recognition algorithms, papers and arguments configuration in package.

Inference

# Activate your conda environment
conda activate gen_ocr
python general_ocr/utils/ocr.py demo/demo_text_ocr_2.jpg --print-result --imshow --det TextSnake --recog SEG

--det and --recog argument values are supplied in table 1 and table 2.

The result as below:

Training

Training with toy dataset

We prepare toy datasets for you to train on /tests/data folder in which you can do your experiment before training with the official datasets.

python tools/train.py configs/textrecog/robust_scanner/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg

To change text recognition algorithm into sag:

python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar

Training with Academic dataset

When you train Academic dataset, you need to setup dataset directory as this guideline. The main point you should forecus is that your model point to the right dataset directory. Assume that you want to train model TextSnake on CTW1500 dataset, thus your config file of that model in configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py should be as below:

dataset_type = 'IcdarDataset'
data_root = 'data/ctw1500/'


data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_training.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline))

Your data_root folder data/ctw1500/ have to be right. Afterward, train your model:

python ./tools/train.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py --work-dir textsnake

To study other configuration parameters on training.

Testing

Now you completed training of TextSnake and get the checkpoint textsnake/lastest.pth. You should evaluate peformance on test set using hmean-iou metric:

python tools/test.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py textsnake/latest.pth --eval hmean-iou

Model details

Design pattern: Each type of object such as dataset, encoder, decoder, backbone, layer, model, loss is registered according to group and scope context coming from facebook-detectron.
Layers: Layers architecture and pretrained-models is mainly refered from open-mmcv.

Next plane

Implements: Paddle-OCR, TR-OCR, TessaractOCR.

Citation

If you find this project is useful in your reasearch, kindly consider cite:

@article{genearal_ocr,
    title={GeneralOCR:  A Comprehensive package for OCR models},
    author={khanhphamdinh},
    email= {phamdinhkhanh.tkt53.neu@gmail.com},
    year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
demo		demo
docs		docs
general_ocr		general_ocr
requirements		requirements
seg		seg
test		test
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.sh		test.sh

phamdinhkhanh/general_ocr

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Inference

Configuration

Model text detection

Model text recognition

Inference

Training

Training with toy dataset

Training with Academic dataset

Testing

Model details

Next plane

Citation

About

Resources

Stars

Watchers

Forks

Languages