GeneralOCR is open source Optical Character Recognition based on PyTorch.

Related tags

general_ocr
Overview

Introduction

GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on OCR domain. You can use them to infer and train the model with your customized dataset. The solution architecture of this project is re-implemented from facebook Detectron and openmm-cv.

Installation

Refer to the guideline of gen_ocr installation

Inference

Configuration

Model text detection

Supported Algorithms:

Text Detection
Algorithm Paper Python argument (--det)
- [x] DBNet (AAAI'2020) https://arxiv.org/pdf/1911.08947 DB_r18, DB_r50
- [x] Mask R-CNN (ICCV'2017) https://arxiv.org/abs/1703.06870 MaskRCNN_CTW, MaskRCNN_IC15, MaskRCNN_IC17
- [x] PANet (ICCV'2019) https://arxiv.org/abs/1908.06391 PANet_CTW, PANet_IC15
- [x] PSENet (CVPR'2019) https://arxiv.org/abs/1903.12473 PS_CTW, PS_IC15
- [x] TextSnake (ECCV'2018) https://arxiv.org/abs/1807.01544 TextSnake
- [x] DRRG (CVPR'2020) https://arxiv.org/abs/2003.07493 DRRG
- [x] FCENet (CVPR'2021) https://arxiv.org/abs/2104.10442 FCE_IC15, FCE_CTW_DCNv2

Table 1: Text detection algorithms, papers and arguments configuration in package.

Model text recognition

Text Recognition
Algorithm Paper Python argument (--recog)
- [x] CRNN (TPAMI'2016) https://arxiv.org/abs/1507.05717 CRNN, CRNN_TPS
- [x] NRTR (ICDAR'2019) https://arxiv.org/abs/1806.00926 NRTR_1/8-1/4, NRTR_1/16-1/8
- [x] RobustScanner (ECCV'2020) https://arxiv.org/abs/2007.07542 RobustScanner
- [x] SAR (AAAI'2019) https://arxiv.org/abs/1811.00751 SAR
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era) https://arxiv.org/abs/1910.04396 SATRN, SATRN_sm
- [x] SegOCR (Manuscript'2021) - SEG

Table 2: Text recognition algorithms, papers and arguments configuration in package.

Inference

# Activate your conda environment
conda activate gen_ocr
python general_ocr/utils/ocr.py demo/demo_text_ocr_2.jpg --print-result --imshow --det TextSnake --recog SEG

--det and --recog argument values are supplied in table 1 and table 2.

The result as below:

demo image 1

Training

Training with toy dataset

We prepare toy datasets for you to train on /tests/data folder in which you can do your experiment before training with the official datasets.

python tools/train.py configs/textrecog/robust_scanner/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg

To change text recognition algorithm into sag:

python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar

Training with Academic dataset

When you train Academic dataset, you need to setup dataset directory as this guideline. The main point you should forecus is that your model point to the right dataset directory. Assume that you want to train model TextSnake on CTW1500 dataset, thus your config file of that model in configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py should be as below:

dataset_type = 'IcdarDataset'
data_root = 'data/ctw1500/'


data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    val_dataloader=dict(samples_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_training.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=f'{data_root}/instances_test.json',
        img_prefix=f'{data_root}/imgs',
        pipeline=test_pipeline))

Your data_root folder data/ctw1500/ have to be right. Afterward, train your model:

python tools/train.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py --work-dir textsnake

To study other configuration parameters on training.

Testing

Now you completed training of TextSnake and get the checkpoint textsnake/lastest.pth. You should evaluate peformance on test set using hmean-iou metric:

python tools/test.py configs/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500.py textsnake/latest.pth --eval hmean-iou

Citation

If you find this project is useful in your reasearch, kindly consider cite:

@article{genearal_ocr,
    title={GeneralOCR:  A Comprehensive package for OCR models},
    author={khanhphamdinh},
    email= {[email protected]},
    year={2021}
}
Issues
  • Please consider License seriously

    Please consider License seriously

    I found that your repository is based on the mmocr repo of OpenMMLab (https://github.com/open-mmlab/mmocr). Please at least cite the repo and preserve the copyrights before redistribution to acknowledge the authors' works.

    Thanks.

    opened by VinhLoiIT 1
  • Pretrained model

    Pretrained model

    Can you share some pretrained models of some networks ?

    opened by huuthieu 1
  • ModuleNotFoundError: No module named 'general_ocr.utils.config'

    ModuleNotFoundError: No module named 'general_ocr.utils.config'

    ModuleNotFoundError: No module named 'general_ocr.utils.config'

    opened by Preethse 0
  • ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found

    Setup:

    Screen Shot 2021-10-17 at 1 17 03 AM

    Log ERROR:

    Traceback (most recent call last):
      File "general_ocr/utils/ocr.py", line 7, in <module>
        import general_ocr
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/__init__.py", line 10, in <module>
        from .apis import *
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/__init__.py", line 2, in <module>
        from .inference import init_detector, model_inference, inference_detector
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/apis/inference.py", line 10, in <module>
        from general_ocr.core import get_classes
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/__init__.py", line 4, in <module>
        from .bbox import *  # noqa: F401, F403
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/__init__.py", line 8, in <module>
        from .samplers import (BaseSampler, CombinedSampler,
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/__init__.py", line 10, in <module>
        from .score_hlr_sampler import ScoreHLRSampler
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/core/bbox/samplers/score_hlr_sampler.py", line 3, in <module>
        from general_ocr.ops import nms_match
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/__init__.py", line 2, in <module>
        from .ball_query import ball_query
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/ops/ball_query.py", line 7, in <module>
        ext_module = ext_loader.load_ext('_ext', ['ball_query_forward'])
      File "/usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/utils/ext_loader.py", line 13, in load_ext
        ext = importlib.import_module('general_ocr.' + name)
      File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/dist-packages/general_ocr-0.0.1-py3.7.egg/general_ocr/_ext.cpython-37m-x86_64-linux-gnu.so)
    
    opened by Baristi000 1
GeneralOCR is open source Optical Character Recognition based on PyTorch.

Introduction GeneralOCR is open source Optical Character Recognition based on PyTorch. It makes a fidelity and useful tool to implement SOTA models on

null 38 Oct 23, 2021
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 89 Oct 22, 2021
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 1.3k Oct 26, 2021
A lightweight deep network for fast and accurate optical flow estimation.

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation The official PyTorch implementation of FastFlowNet (ICRA 2021). Authors: Lingtong

Tone 64 Sep 29, 2021
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.3k Oct 18, 2021
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 2.6k Oct 19, 2021
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

null 7 Oct 15, 2021
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 2.6k Oct 21, 2021
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provide a Top Academic Paper Chart for beginners and reseachers to take one step faster.

Qiulin Zhang 162 Oct 18, 2021
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.5k Oct 22, 2021
Implementation of character based convolutional neural network

Character Based CNN This repo contains a PyTorch implementation of a character-level convolutional neural network for text classification. The model a

Ahmed BESBES 218 Oct 20, 2021
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 63 Oct 20, 2021
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

null 51.6k Oct 24, 2021
Tensors and Dynamic neural networks in Python with strong GPU acceleration

PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration Deep neural networks b

null 46.1k Feb 13, 2021
The world's simplest facial recognition api for Python and the command line

Face Recognition You can also read a translated version of this file in Chinese 简体中文版 or in Korean 한국어 or in Japanese 日本語. Recognize and manipulate fa

Adam Geitgey 41.9k Oct 24, 2021
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 132 Oct 21, 2021
A fast model to compute optical flow between two input images.

DCVNet: Dilated Cost Volumes for Fast Optical Flow This repository contains our implementation of the paper: @InProceedings{jiang2021dcvnet, title={

Huaizu Jiang 8 Sep 27, 2021
GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

Juwan HAN 1 Oct 11, 2021
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 2.1k Oct 19, 2021