CNN+LSTM+CTC based OCR implemented using tensorflow.

Last update: Dec 8, 2022

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset， the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

Python 3.6.4
TensorFlow 1.2
Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.

# make sure the data path is correct, have a look at helper.py.

python helper.py

Run following How to run

Comments

IndexError: list index out of range
Hi, Thanks for sharing the great work!

I downloaded the data based on the suggestion of this link.

Then I tried running the training script, but encountered below error,

train_feeder = utils.DataIterator(data_dir=train_dir) File "/home/levin/workspace/snrprj/CNN_LSTM_CTC_Tensorflow/utils.py", line 73, in __init__ code = image_name.split('/')[-1].split('_')[1].split('.')[0] IndexError: list index out of range

It looks to me that the script expects to get label for each image from its filename. So to get the code run properly and train the model, we will have to first rename the image files based on the labels.txt file, is this correct?
opened by LevinJ 3
文件名中 73091_(8+9)*4.png 含有特殊字符，是不能命名成功的，不知道您是怎么处理的
在网上看过您的CNN_LSTM_CTC_Tensorflow 源码，也下载了数据集，想重现您的结果，有几个问题请教一下，谢谢！ 1，源码是在这里下载的，https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow，数据集也下载解压了。 D:\Tensorflow\CNN_LSTM_CTC_Tensorflow-master\imgs 解压后目录结构 imgs\labels.txt imgs\image_contest_level_1\

2.运行 helper.py 后，在imgs 目录下生成了 X_train.txt、 X_val.txt、 y_train.txt、 y_val.txt4个文件是正常的。

X_train.txt 训练的文件名 X_val.txt 测试的文件名

y_train.txt 训练的答案 y_val.txt 测试的答案

但 cp_file(X_train, y_train, './imgs/train/') cp_file(X_val, y_val, './imgs/val/') "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py ['./imgs/image_contest_level_1/0.png' './imgs/image_contest_level_1/1.png' './imgs/image_contest_level_1/2.png' './imgs/image_contest_level_1/3.png' './imgs/image_contest_level_1/4.png' './imgs/image_contest_level_1/5.png' './imgs/image_contest_level_1/6.png' './imgs/image_contest_level_1/7.png' './imgs/image_contest_level_1/8.png' './imgs/image_contest_level_1/9.png'] Traceback (most recent call last): File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 129, in cp_file(X_train, y_train, './imgs/train/') File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 102, in cp_file shutil.copyfile(file_path, dest_filename) File "D:\Program Files\Python365\lib\shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: OSError: [Errno 22] Invalid argument: './imgs/train/73091_(8+9)*4.png'

进程完成，退出码 1 文件名中 73091_(8+9)*4.png 含有特殊字符，是不能命名成功的，不知道您是怎么处理的

3.运行 cnn_lstm_otc_ocr.py 报错

"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py:42: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert (FLAGS.cnn_count <= count_, "FLAGS.cnn_count should be <= {}!".format(count_))

4.运行 main.py

"D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/main.py

feature_h: 4, feature_w: 12 lstm input shape: [40, 12, 256] loading train data size: 0 loading validation data size: 0

2018-07-12 11:02:14.624545: I c:\users\user\source\repos\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-07-12 11:02:14.844809: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.63GiB 2018-07-12 11:02:14.845239: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0 2018-07-12 11:02:16.119318: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-07-12 11:02:16.119683: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 2018-07-12 11:02:16.119937: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N 2018-07-12 11:02:16.137500: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6410 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) =============================begin training=============================

进程完成，退出码 0

运行 utils.py "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/utils.py

进程完成，退出码 0

谢谢指点，不知您是否有微信或qq方便联系，请教学习，谢谢
opened by QQ2737499951 2
Change the image width and height

Hello,I chang the Image width and height from(60,180)to(80,500),then I get an error:

InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [40,288], In[1]: [176,512] [[Node: lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/concat, lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul/Enter)]] [[Node: Mean/_37 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_950_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Is there anything else I should change to fix this error?

opened by wushilian 2

Problem with frozen pb

I trained the model with a custom dataset and got the checkpoint files. I froze the model using this script

import tensorflow as tf
def freeze_graph(model_dir, output_node_names, frozen_graph_name):
    if not tf.gfile.Exists(model_dir):
        raise AssertionError(
            "Export directory doesn't exists. Please specify an export "
            "directory: %s" % model_dir)

    if not output_node_names:
        print("You need to supply the name of a node to --output_node_names.")
        return -1

    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_dir)
    input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_dir + "/" + frozen_graph_name + ".pb"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We start a session using a temporary fresh Graph
    with tf.Session(graph=tf.Graph()) as sess:
        # We import the meta graph in the current default Graph
        saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

        # We restore the weights
        saver.restore(sess, input_checkpoint)
        gd = sess.graph.as_graph_def()
        # We use a built-in TF helper to export variables to constants
        output_graph_def = tf.graph_util.convert_variables_to_constants(
            sess,  # The session is used to retrieve the weights
            gd,  # The graph_def is used to retrieve the nodes
            output_node_names.split(",")  # The output node names are used to select the usefull nodes
        )

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))

    return output_graph_def

freeze_graph('./checkpoint','SparseToDense','ocr.pb')

But when I'm loading the graph from the protobuf file, I'm getting this error:

ValueError: Input 0 of node import/cnn/unit-4/bn4/BatchNorm/AssignMovingAvg/cnn/unit-4/bn4/BatchNorm/moving_mean/AssignAdd was passed float from import/cnn/unit-4/bn4/BatchNorm/cnn/unit-4/bn4/BatchNorm/moving_mean/local_step:0 incompatible with expected float_ref.

I know this is a little off topic but any help is appreciated.

opened by jsn5 1

Training does not begin:

Hi Guys

I have prepared a small dataset just for trying out the network and see how it works. It seems like that its able to load the data set well and prints (Begin Training) but after that it just stops and do nothing.Here is what i see on screen: CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=./imgs/train/ --val_dir=./imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 --mode=train

feature_h: 4, feature_w: 12 lstm input shape: [128, 12, 256] loading train data ('size: ', 11) loading validation data size: 6

2018-05-29 11:47:19.300427: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-29 11:47:19.954690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-29 11:47:19.955398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.50GiB 2018-05-29 11:47:19.955416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-29 11:47:20.485722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-29 11:47:20.485760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-05-29 11:47:20.485768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-05-29 11:47:20.485968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3237 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0) =============================begin training============================= as you can see Training does not begin and i dont get any errors either

opened by prolaser 1
about ctc cost nan and soaring avg_train_cost

got ctc cost nan error after 30 epoch in chinese sentence ocr training... I can delay the error by smaller lr, bigger lr decay. But how to prevent ctc cost nan?

opened by zzks 0

raise _exceptions.DuplicateFlagError.from_flag

Hi, I try to train but using this cmd command:

main.py --train_dir=../imgs/train/ --val_dir=../imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 -mode=train

But got this error:

Traceback (most recent call last):
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\main.py", line 14, in <module>
    import cnn_lstm_otc_ocr
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\cnn_lstm_otc_ocr.py", line 6, in <module>
    import utils
  File "C:\Projects\CNN_LSTM_CTC_Tensorflow\utils.py", line 43, in <module>
    tf.app.flags.DEFINE_string('log_dir', './log', 'the logging dir')
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\platform\flags.py", line 58, in wrapper
    return original_function(*args, **kwargs)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 241, in DEFINE_string
    DEFINE(parser, name, default, help, flag_values, serializer, **args)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 82, in DEFINE
    flag_values, module_name)
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 104, in DEFINE_flag
    fv[flag.name] = flag
  File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_flagvalues.py", line 430, in __setitem__
    raise _exceptions.DuplicateFlagError.from_flag(name, self)
absl.flags._exceptions.DuplicateFlagError: The flag 'log_dir' is defined twice. First from absl.logging, Second from utils.  Description from first occurrence: directory to write logfiles into

How can it be fixed?

opened by nithrous 1

How to inference in test images?

Hi, Dear All, thanks a lot for this great project! I have trained the model with 32x128 OCR images successfully. I have a question, how do we test the new test images with the model? Using sliding window? I mean generally speaking, the images detected from the previous text detection branch are variable lengths, how do we input these images into the model to get the prediction? I thought about the sliding window, but could you please provide some advice or reference papers on this? Thanks.

opened by Remember2018 0
How does the inference work?

I strated trainning the model and i stoped it manually via keyboard exception to test the inference but when i run the command i get no errors and nothing happens?

opened by jihadbourassi 0
Lablels in mode Infer

Hi @watsonyanghx i found your code and i think i'm going to use it for license plate ocr but i want to ask first : In the inference mode do the images i want to test in mode infer have to have labels?

opened by jihadbourassi 0

Owner

Watson Yang

GitHub

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别文本检测：CTPN 文本识别：DenseNet + CTC 环境部署 sh setup.sh 注：CPU环境执行前需注释掉for gpu部分，并解开for cpu部分的注释 Demo 将测试图片放入test_images

2.6k Dec 29, 2022

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 8, 2023

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

4 Jul 11, 2022

Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

5 Dec 6, 2021

A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

794 Dec 30, 2022

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

224 Jan 7, 2023

A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

444 Dec 30, 2022

🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

646 Nov 11, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

1.5k Jan 7, 2023

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

41 Dec 30, 2022

CNN+LSTM+CTC based OCR implemented using tensorflow.

Related tags

Overview

CNN_LSTM_CTC_Tensorflow

Structure

Prerequisite

How to run

Run with your own data.

Comments

IndexError: list index out of range

文件名中 73091_(8+9)*4.png 含有特殊字符，是不能命名成功的，不知道您是怎么处理的

Change the image width and height

Problem with frozen pb

Training does not begin:

about ctc cost nan and soaring avg_train_cost

raise _exceptions.DuplicateFlagError.from_flag

How to inference in test images?

How does the inference work?

Lablels in mode Infer