Scene text recognition

Overview

AttentionOCR for Arbitrary-Shaped Scene Text Recognition

Introduction

This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (Latin Only, Latin and Chinese), futhermore, the algorithm is also adopted in ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling and ICDAR2019 Robust Reading Challenge on Reading Chinese Text on Signboard.

Scene text detection algorithm is modified from Tensorpack FasterRCNN, and we only open source code in this repository for scene text recognition. I upload ICDAR2019 ArT competition model to docker hub, please refer to Docker. For more details, please refer to our arXiv technical report.

Our text recognition algorithm not only recognizes Latin and Non-Latin characters, but also supports horizontal and vertical text recognition in one model. It is convenient for multi-lingual arbitrary-shaped text recognition.

Note that the competition model in docker container as described in our technical report is slightly different from the recognition model trained from this updated repository.

Dependencies

python 3
tensorflow-gpu 1.14
tensorpack 0.9.8
pycocotools

Usage

First download and extract multiple text datasets in base text dir, please refer to dataset.py for dataset preprocess and multiple datasets.

Multiple Datasets

$(base_dir)/lsvt
$(base_dir)/art
$(base_dir)/rects
$(base_dir)/icdar2017rctw

You can also synthesize text recognition data for data augmentation, please refer to TextRecognitionDataGenerator. It is helpful for long text recognition and attention-based language model because you can directly synthesize text images from NLP corpus. Then you should rewrite dataset.py for synthetic text dataset.

$(base_dir)/synthetic_text

Train

First, download pretrained inception v4 checkpoint and put it in ./pretrain folder. Then you can modify your gpu lists in config.py for specified gpus and then run:

python train.py

You can visualize your training steps via tensorboard:

tensorboard --logdir='./checkpoint'

Use ICDAR2019-LSVT, ICDAR2019-ArT, ICDAR2019-ReCTS for default training, you can change it with your own training data.

Evaluation

python eval.py --checkpoint_path=$(Your model path)

Use ICDAR2017RCTW for default evaluation with Normalized Edit Distance metric(1-N.E.D specifically), you can change it with your own evaluation data.

Export

Export checkpoint to tensorflow pb model for inference.

python export.py --pb_path=$(Your tensorflow pb model save path) --checkpoint_path=$(Your trained model path)

Test

Load tensorflow pb model for text recognition.

python test.py --pb_path=$(Your tensorflow pb model save path) --img_folder=$(Your test img folder)

Default use ICDAR2019-ArT for test, you can change it with your own test data.

Visualization

Scene text detection and recognition result:

Scene text recognition attention maps:

To learn more about attention mechanism, please refer to Attention Mechanism in Deep Learning.

Docker

I upload ICDAR2019 scene text recognition model include text detection and recognition to Docker Hub.

After nvidia-docker installed, run:

docker pull zhang0jhon/demo:ocr
docker run -it -p 5000:5000 --gpus all zhang0jhon/demo:ocr bash
cd /ocr/ocr
python flaskapp.py

Then you can test with your data via browser:

$(localhost or remote server ip address):5000

Comments
  • Fine-Tuning with Pretrained Model in Docker Image

    Fine-Tuning with Pretrained Model in Docker Image

    Hi @zhang0jhon

    Thanks for the great work! The pretrained model you put in the docker image worked pretty well! However I want to fine-tune the parameters for my specific application. Is it possible to fine-tune the pretrained model? Since the model is different between the docker image and here, do you have the traning code for the docker version and/or do you have a fine-tuning code already?

    Thanks in advance!

    opened by JianYang93 15
  • 'label', does not exist in the graph

    'label', does not exist in the graph

    @zhang0jhon When running test.py with the model, i get error:

     Traceback (most recent call last):
      File "test.py", line 121, in <module>
        test(args)
      File "test.py", line 91, in test
        model = TextRecognition(args.pb_path, cfg.seq_len+1)
      File "test.py", line 23, in __init__
        self.init_model()
      File "test.py", line 37, in init_model
        self.label_ph = self.sess.graph.get_tensor_by_name('label:0')
      File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name
        return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
      File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
        return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
      File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked
        "graph." % (repr(name), repr(op_name)))
    KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph."
    
    opened by ghost 9
  • Default MaxPoolingOp only supports NHWC on device type CPU

    Default MaxPoolingOp only supports NHWC on device type CPU

    你好: 请问我再运行完python flaskapp.py,上传图片之后,进行预测的时候,会显示下面的错误,请问这是什么原因导致的? tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU [[node pool0/MaxPool (defined at /tensor_flow/OCRSpace/ocr/ocr/text_detection.py:29) ]]

    Original stack trace for 'pool0/MaxPool':

    请问应该怎么修改呢?多谢。。。

    opened by Springzcf 6
  • 可识别字符长度改为16

    可识别字符长度改为16

    seq_len是单个文本行最多可识别的字符数,是这个意思吧。 我现在想训一个短文本行的模型,最长seq_len设置为16,请问还需要修改哪些地方? 直接改为16报错,报ValueRrror,维度不匹配, 具体错误为cannot feed value of shape(16,33) for Tensor 'label:0', which has shape(?,17) 麻烦您指点一下,多谢

    opened by Bachelorwangwei 2
  • 关于识别的问题

    关于识别的问题

    假设已经定位到文字部分(暂不考虑定位方法),若采用AttentionOCR去识别,识别结果是针对图片中文字整体识别还是针对图片中的文字一个一个进行识别,因为之前采用crnn-ctc的模型是对图片中的文字一起识别,但是我看到您的images文件夹中图片有标识每一个汉字的识别概率,不知道我表达清楚没有^~^

    opened by FortuneStar 2
  • Network does not predict EOS token

    Network does not predict EOS token

    Hello, I have trained this network for about 80k steps but even though it started to detect a lot of the text parts it never predicted the EOS token. How much did you train it? Do I need some special post-processing?

    Thank you for the awesome network 👍

    opened by ionutscorta 1
  • Unable to pull from the docker hub repository you have mentioned

    Unable to pull from the docker hub repository you have mentioned

    I am unable to pull from the docker hub repository that you have mentioned. When I executed

    docker pull zhang0jhon/demo:ocr

    I got an error saying:

    Error response from daemon: pull access denied for zhang0jhon/demo:ocr, repository does not exist or may require docker login

    Is the repository still existing or have you given pull access to all?

    opened by varshaneya 1
  • docker运行demo时内存不够怎么配置

    docker运行demo时内存不够怎么配置

    W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 358.89MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

    普通显卡 tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.5185 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.63GiB

    opened by bigkun 1
  • abcnet accuracy

    abcnet accuracy

    I have to questions. First what e2e accuracy mean? its accuracy for detection and recognition? And det-hmean is only for detection? And second on wich datasets was this model trained?

    opened by jdavidd 0
  • Problems on how to calculate the probs in text recongnition

    Problems on how to calculate the probs in text recongnition

    Thanks for your wonderful work! I observe that the text recongnition network outputs the text and probabilities. I wonder how the probilities calculate and what's the range of them? I can't find the funtion associated. Are the probilities just produced by the network? And they range in [0,1]? Thanks!

    opened by rainfall1998 0
  • I have some questions about training and testing

    I have some questions about training and testing

       First and foremost, thanks to you opensource the awesome STR algorithm,I have some questions about training and testing.
    

    I cope the checkpoint from the docker image, And I find detecting and recognition separted two model, one is ICDAR_0.7.pb, another is text_recognition_5435.pb. this mode is not apropriate this repo code. Hence I train my model through dataset.py genderating icdar_datasets.npy , with a lot of reading code, I find that the test.py can not work.it appear the error: 2021-06-16 15:01:59.018589: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2021-06-16 15:01:59.233909: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2021-06-16 15:01:59.261653: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D}}]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D}}]] [[sequence_preds/_17]]

    opened by alan-img 0
  • train.py error    ModelDescBase?

    train.py error ModelDescBase?

    Traceback (most recent call last): File "train.py", line 81, in train() File "train.py", line 64, in train starting_epoch=cfg.starting_epoch File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

    opened by xianzhe-741 0
  • Formulae miscalculation

    Formulae miscalculation

    ZhangOjhon your pre-trained model is gr8 !! for recognising characters generically. gr8 work it would be nice if you share the config file which includes the values used for the hyper parameters.
    I went through your code I just found out this formula discrepancy with the one in Bahadanus attention paper. 1.) the hidden state derived from the lstm decoder layer should only be given to the softmax. But in the code the attention_feature, prev_wemb, hidden_layer is concatenated and then given to the softmax. It would be helpful if you provide the explanation of why the formula is altered so and so. Thanks in advance.

    opened by deepakacl 0
  • What's is the license information of this project?

    What's is the license information of this project?

    Hi, I would love to use this amazing project for my project. Does it has MIT license? If yes, can you add the license to the repo? It will encourage people to use your amazing ocr model in their projects as well.

    opened by JC1DA 0
Owner
null
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Scene text recognition

AttentionOCR for Arbitrary-Shaped Scene Text Recognition Introduction This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2

null 777 Jan 9, 2023
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 4, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 5, 2023
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Jan Zdenek 43 Mar 15, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 1, 2023
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

MaybeShewill-CV 1000 Dec 27, 2022
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
huoyijie 1.2k Dec 29, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022