Scene text recognition

Last update: Jan 9, 2023

Related tags

Computer Vision AttentionOCR

Overview

AttentionOCR for Arbitrary-Shaped Scene Text Recognition

Introduction

This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (Latin Only, Latin and Chinese), futhermore, the algorithm is also adopted in ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling and ICDAR2019 Robust Reading Challenge on Reading Chinese Text on Signboard.

Scene text detection algorithm is modified from Tensorpack FasterRCNN, and we only open source code in this repository for scene text recognition. I upload ICDAR2019 ArT competition model to docker hub, please refer to Docker. For more details, please refer to our arXiv technical report.

Our text recognition algorithm not only recognizes Latin and Non-Latin characters, but also supports horizontal and vertical text recognition in one model. It is convenient for multi-lingual arbitrary-shaped text recognition.

Note that the competition model in docker container as described in our technical report is slightly different from the recognition model trained from this updated repository.

Dependencies

python 3
tensorflow-gpu 1.14
tensorpack 0.9.8
pycocotools

Usage

First download and extract multiple text datasets in base text dir, please refer to dataset.py for dataset preprocess and multiple datasets.

Multiple Datasets

$(base_dir)/lsvt
$(base_dir)/art
$(base_dir)/rects
$(base_dir)/icdar2017rctw

You can also synthesize text recognition data for data augmentation, please refer to TextRecognitionDataGenerator. It is helpful for long text recognition and attention-based language model because you can directly synthesize text images from NLP corpus. Then you should rewrite dataset.py for synthetic text dataset.

$(base_dir)/synthetic_text

Train

First, download pretrained inception v4 checkpoint and put it in ./pretrain folder. Then you can modify your gpu lists in config.py for specified gpus and then run:

python train.py

You can visualize your training steps via tensorboard:

tensorboard --logdir='./checkpoint'

Use ICDAR2019-LSVT, ICDAR2019-ArT, ICDAR2019-ReCTS for default training, you can change it with your own training data.

Evaluation

python eval.py --checkpoint_path=$(Your model path)

Use ICDAR2017RCTW for default evaluation with Normalized Edit Distance metric(1-N.E.D specifically), you can change it with your own evaluation data.

Export

Export checkpoint to tensorflow pb model for inference.

python export.py --pb_path=$(Your tensorflow pb model save path) --checkpoint_path=$(Your trained model path)

Test

Load tensorflow pb model for text recognition.

python test.py --pb_path=$(Your tensorflow pb model save path) --img_folder=$(Your test img folder)

Default use ICDAR2019-ArT for test, you can change it with your own test data.

Visualization

Scene text detection and recognition result:

Scene text recognition attention maps:

To learn more about attention mechanism, please refer to Attention Mechanism in Deep Learning.

Docker

I upload ICDAR2019 scene text recognition model include text detection and recognition to Docker Hub.

After nvidia-docker installed, run:

docker pull zhang0jhon/demo:ocr
docker run -it -p 5000:5000 --gpus all zhang0jhon/demo:ocr bash
cd /ocr/ocr
python flaskapp.py

Then you can test with your data via browser:

$(localhost or remote server ip address):5000

Comments

Fine-Tuning with Pretrained Model in Docker Image

Hi @zhang0jhon

Thanks for the great work! The pretrained model you put in the docker image worked pretty well! However I want to fine-tune the parameters for my specific application. Is it possible to fine-tune the pretrained model? Since the model is different between the docker image and here, do you have the traning code for the docker version and/or do you have a fine-tuning code already?

Thanks in advance!

opened by JianYang93 15

'label', does not exist in the graph

@zhang0jhon When running test.py with the model, i get error:

 Traceback (most recent call last):
  File "test.py", line 121, in <module>
    test(args)
  File "test.py", line 91, in test
    model = TextRecognition(args.pb_path, cfg.seq_len+1)
  File "test.py", line 23, in __init__
    self.init_model()
  File "test.py", line 37, in init_model
    self.label_ph = self.sess.graph.get_tensor_by_name('label:0')
  File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name
    return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
  File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked
    "graph." % (repr(name), repr(op_name)))
KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph."

opened by ghost 9

Default MaxPoolingOp only supports NHWC on device type CPU

你好：请问我再运行完python flaskapp.py，上传图片之后，进行预测的时候，会显示下面的错误，请问这是什么原因导致的？ tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU [[node pool0/MaxPool (defined at /tensor_flow/OCRSpace/ocr/ocr/text_detection.py:29) ]]

Original stack trace for 'pool0/MaxPool':

请问应该怎么修改呢？多谢。。。

opened by Springzcf 6
可识别字符长度改为16

seq_len是单个文本行最多可识别的字符数，是这个意思吧。我现在想训一个短文本行的模型，最长seq_len设置为16，请问还需要修改哪些地方？直接改为16报错，报ValueRrror，维度不匹配, 具体错误为cannot feed value of shape(16,33) for Tensor 'label:0', which has shape(?,17) 麻烦您指点一下，多谢

opened by Bachelorwangwei 2
关于识别的问题

假设已经定位到文字部分（暂不考虑定位方法），若采用AttentionOCR去识别，识别结果是针对图片中文字整体识别还是针对图片中的文字一个一个进行识别，因为之前采用crnn-ctc的模型是对图片中的文字一起识别，但是我看到您的images文件夹中图片有标识每一个汉字的识别概率，不知道我表达清楚没有^~^

opened by FortuneStar 2
Network does not predict EOS token

Hello, I have trained this network for about 80k steps but even though it started to detect a lot of the text parts it never predicted the EOS token. How much did you train it? Do I need some special post-processing?

Thank you for the awesome network 👍

opened by ionutscorta 1
Unable to pull from the docker hub repository you have mentioned

I am unable to pull from the docker hub repository that you have mentioned. When I executed

docker pull zhang0jhon/demo:ocr

I got an error saying:

Error response from daemon: pull access denied for zhang0jhon/demo:ocr, repository does not exist or may require docker login

Is the repository still existing or have you given pull access to all?

opened by varshaneya 1
docker运行demo时内存不够怎么配置

W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 358.89MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

普通显卡 tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.5185 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.63GiB

opened by bigkun 1
abcnet accuracy

I have to questions. First what e2e accuracy mean? its accuracy for detection and recognition? And det-hmean is only for detection? And second on wich datasets was this model trained?

opened by jdavidd 0
Problems on how to calculate the probs in text recongnition

Thanks for your wonderful work! I observe that the text recongnition network outputs the text and probabilities. I wonder how the probilities calculate and what's the range of them? I can't find the funtion associated. Are the probilities just produced by the network? And they range in [0,1]? Thanks!

opened by rainfall1998 0
I have some questions about training and testing
First and foremost, thanks to you opensource the awesome STR algorithm,I have some questions about training and testing.

I cope the checkpoint from the docker image, And I find detecting and recognition separted two model, one is ICDAR_0.7.pb, another is text_recognition_5435.pb. this mode is not apropriate this repo code. Hence I train my model through dataset.py genderating icdar_datasets.npy , with a lot of reading code, I find that the test.py can not work.it appear the error: 2021-06-16 15:01:59.018589: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2021-06-16 15:01:59.233909: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2021-06-16 15:01:59.261653: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/alan/anaconda3/envs/AttentionOCR/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D}}]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D}}]] [[sequence_preds/_17]]
opened by alan-img 0
train.py error ModelDescBase？

Traceback (most recent call last): File "train.py", line 81, in train() File "train.py", line 64, in train starting_epoch=cfg.starting_epoch File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

opened by xianzhe-741 0
Formulae miscalculation

ZhangOjhon your pre-trained model is gr8 !! for recognising characters generically. gr8 work it would be nice if you share the config file which includes the values used for the hyper parameters.
I went through your code I just found out this formula discrepancy with the one in Bahadanus attention paper. 1.) the hidden state derived from the lstm decoder layer should only be given to the softmax. But in the code the attention_feature, prev_wemb, hidden_layer is concatenated and then given to the softmax. It would be helpful if you provide the explanation of why the formula is altered so and so. Thanks in advance.

opened by deepakacl 0
What's is the license information of this project?

Hi, I would love to use this amazing project for my project. Does it has MIT license? If yes, can you add the license to the repo? It will encourage people to use your amazing ocr model in their projects as well.

opened by JC1DA 0

Scene text recognition

Related tags

Overview

AttentionOCR for Arbitrary-Shaped Scene Text Recognition

Introduction

Dependencies

Usage

Multiple Datasets

Train

Evaluation

Export

Test

Visualization

Docker

Comments

Owner

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Scene text recognition

End-to-end pipeline for real-time scene text detection and recognition.

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Scene text detection and recognition based on Extremal Region(ER)

A curated list of resources dedicated to scene text localization and recognition

A curated list of papers and resources for scene text detection and recognition

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

A toolbox of scene text detection and recognition

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Text recognition (optical character recognition) with deep learning methods.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection