CNN+LSTM+CTC based OCR implemented using tensorflow.

Overview

CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow.

Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset, the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.

Structure

The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.

Prerequisite

  1. Python 3.6.4

  2. TensorFlow 1.2

  3. Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at utils.py.

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R
  .:
  imgs  utils.py  helper.py  main.py  cnn_lstm_otc_ocr.py

  ./imgs:
  train  infer  val  labels.txt
  
  ./imgs/train:
  1.png  2.png  ...  50000.png
  
  ./imgs/val:
  1.png  2.png  ...  50000.png

  ./imgs/infer:
  1.png  2.png  ...  300000.png
   
  
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \
  --mode=train

# Inference
CUDA_VISIBLE_DEVICES=0 python ./main.py --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \
  --mode=infer

Run with your own data.

  1. Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.
# make sure the data path is correct, have a look at helper.py.

python helper.py
  1. Run following How to run
Comments
  • IndexError: list index out of range

    IndexError: list index out of range

    Hi, Thanks for sharing the great work!

    I downloaded the data based on the suggestion of this link.

    Then I tried running the training script, but encountered below error,

        train_feeder = utils.DataIterator(data_dir=train_dir)
      File "/home/levin/workspace/snrprj/CNN_LSTM_CTC_Tensorflow/utils.py", line 73, in __init__
        code = image_name.split('/')[-1].split('_')[1].split('.')[0]
    IndexError: list index out of range
    

    It looks to me that the script expects to get label for each image from its filename. So to get the code run properly and train the model, we will have to first rename the image files based on the labels.txt file, is this correct?

    opened by LevinJ 3
  • 文件名中 73091_(8+9)*4.png  含有特殊字符,是不能命名成功的,不知道您是怎么处理的

    文件名中 73091_(8+9)*4.png 含有特殊字符,是不能命名成功的,不知道您是怎么处理的

    在网上看过您的CNN_LSTM_CTC_Tensorflow 源码,也下载了数据集,想重现您的结果,有几个问题请教一下,谢谢! 1,源码是在这里下载的,https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow,数据集也下载解压了。 D:\Tensorflow\CNN_LSTM_CTC_Tensorflow-master\imgs 解压后目录结构 imgs\labels.txt imgs\image_contest_level_1\

    2.运行 helper.py 后,在imgs 目录下生成了 X_train.txt、 X_val.txt、 y_train.txt、 y_val.txt4个文件是正常的。

    X_train.txt 训练的文件名 X_val.txt 测试的文件名

    y_train.txt 训练的答案 y_val.txt 测试的答案

    但 cp_file(X_train, y_train, './imgs/train/') cp_file(X_val, y_val, './imgs/val/') "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py ['./imgs/image_contest_level_1/0.png' './imgs/image_contest_level_1/1.png' './imgs/image_contest_level_1/2.png' './imgs/image_contest_level_1/3.png' './imgs/image_contest_level_1/4.png' './imgs/image_contest_level_1/5.png' './imgs/image_contest_level_1/6.png' './imgs/image_contest_level_1/7.png' './imgs/image_contest_level_1/8.png' './imgs/image_contest_level_1/9.png'] Traceback (most recent call last): File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 129, in cp_file(X_train, y_train, './imgs/train/') File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/helper.py", line 102, in cp_file shutil.copyfile(file_path, dest_filename) File "D:\Program Files\Python365\lib\shutil.py", line 121, in copyfile with open(dst, 'wb') as fdst: OSError: [Errno 22] Invalid argument: './imgs/train/73091_(8+9)*4.png'

    进程完成,退出码 1 文件名中 73091_(8+9)*4.png 含有特殊字符,是不能命名成功的,不知道您是怎么处理的

    3.运行 cnn_lstm_otc_ocr.py 报错

    "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/cnn_lstm_otc_ocr.py:42: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert (FLAGS.cnn_count <= count_, "FLAGS.cnn_count should be <= {}!".format(count_))

    4.运行 main.py

    "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/main.py

    feature_h: 4, feature_w: 12 lstm input shape: [40, 12, 256] loading train data size: 0 loading validation data size: 0

    2018-07-12 11:02:14.624545: I c:\users\user\source\repos\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-07-12 11:02:14.844809: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.63GiB 2018-07-12 11:02:14.845239: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0 2018-07-12 11:02:16.119318: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-07-12 11:02:16.119683: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0 2018-07-12 11:02:16.119937: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N 2018-07-12 11:02:16.137500: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6410 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) =============================begin training=============================

    进程完成,退出码 0

    1. 运行 utils.py "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/utils.py

    进程完成,退出码 0

    谢谢指点,不知您是否有微信或qq方便联系,请教学习,谢谢

    opened by QQ2737499951 2
  • Change the image width and height

    Change the image width and height

    Hello,I chang the Image width and height from(60,180)to(80,500),then I get an error:

    InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [40,288], In[1]: [176,512] [[Node: lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/concat, lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul/Enter)]] [[Node: Mean/_37 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_950_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

    Is there anything else I should change to fix this error?

    opened by wushilian 2
  • Problem with frozen pb

    Problem with frozen pb

    I trained the model with a custom dataset and got the checkpoint files. I froze the model using this script

    import tensorflow as tf
    def freeze_graph(model_dir, output_node_names, frozen_graph_name):
        if not tf.gfile.Exists(model_dir):
            raise AssertionError(
                "Export directory doesn't exists. Please specify an export "
                "directory: %s" % model_dir)
    
        if not output_node_names:
            print("You need to supply the name of a node to --output_node_names.")
            return -1
    
        # We retrieve our checkpoint fullpath
        checkpoint = tf.train.get_checkpoint_state(model_dir)
        input_checkpoint = checkpoint.model_checkpoint_path
    
        # We precise the file fullname of our freezed graph
        absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
        output_graph = absolute_model_dir + "/" + frozen_graph_name + ".pb"
    
        # We clear devices to allow TensorFlow to control on which device it will load operations
        clear_devices = True
    
        # We start a session using a temporary fresh Graph
        with tf.Session(graph=tf.Graph()) as sess:
            # We import the meta graph in the current default Graph
            saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)
    
            # We restore the weights
            saver.restore(sess, input_checkpoint)
            gd = sess.graph.as_graph_def()
            # We use a built-in TF helper to export variables to constants
            output_graph_def = tf.graph_util.convert_variables_to_constants(
                sess,  # The session is used to retrieve the weights
                gd,  # The graph_def is used to retrieve the nodes
                output_node_names.split(",")  # The output node names are used to select the usefull nodes
            )
    
            # Finally we serialize and dump the output graph to the filesystem
            with tf.gfile.GFile(output_graph, "wb") as f:
                f.write(output_graph_def.SerializeToString())
            print("%d ops in the final graph." % len(output_graph_def.node))
    
        return output_graph_def
    
    freeze_graph('./checkpoint','SparseToDense','ocr.pb')
    

    But when I'm loading the graph from the protobuf file, I'm getting this error:

    ValueError: Input 0 of node import/cnn/unit-4/bn4/BatchNorm/AssignMovingAvg/cnn/unit-4/bn4/BatchNorm/moving_mean/AssignAdd was passed float from import/cnn/unit-4/bn4/BatchNorm/cnn/unit-4/bn4/BatchNorm/moving_mean/local_step:0 incompatible with expected float_ref.

    I know this is a little off topic but any help is appreciated.

    opened by jsn5 1
  • Training does not begin:

    Training does not begin:

    Hi Guys

    I have prepared a small dataset just for trying out the network and see how it works. It seems like that its able to load the data set well and prints (Begin Training) but after that it just stops and do nothing.Here is what i see on screen: CUDA_VISIBLE_DEVICES=0 python ./main.py --train_dir=./imgs/train/ --val_dir=./imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 --mode=train

    feature_h: 4, feature_w: 12 lstm input shape: [128, 12, 256] loading train data ('size: ', 11) loading validation data size: 6

    2018-05-29 11:47:19.300427: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-29 11:47:19.954690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-29 11:47:19.955398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.50GiB 2018-05-29 11:47:19.955416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-29 11:47:20.485722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-29 11:47:20.485760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-05-29 11:47:20.485768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-05-29 11:47:20.485968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3237 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0) =============================begin training============================= as you can see Training does not begin and i dont get any errors either

    opened by prolaser 1
  • about ctc cost nan and soaring avg_train_cost

    about ctc cost nan and soaring avg_train_cost

    got ctc cost nan error after 30 epoch in chinese sentence ocr training... I can delay the error by smaller lr, bigger lr decay. But how to prevent ctc cost nan?

    opened by zzks 0
  • raise _exceptions.DuplicateFlagError.from_flag

    raise _exceptions.DuplicateFlagError.from_flag

    Hi, I try to train but using this cmd command:

    main.py --train_dir=../imgs/train/ --val_dir=../imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 -mode=train

    But got this error:

    Traceback (most recent call last):
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\main.py", line 14, in <module>
        import cnn_lstm_otc_ocr
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\cnn_lstm_otc_ocr.py", line 6, in <module>
        import utils
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\utils.py", line 43, in <module>
        tf.app.flags.DEFINE_string('log_dir', './log', 'the logging dir')
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\platform\flags.py", line 58, in wrapper
        return original_function(*args, **kwargs)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 241, in DEFINE_string
        DEFINE(parser, name, default, help, flag_values, serializer, **args)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 82, in DEFINE
        flag_values, module_name)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_defines.py", line 104, in DEFINE_flag
        fv[flag.name] = flag
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\_flagvalues.py", line 430, in __setitem__
        raise _exceptions.DuplicateFlagError.from_flag(name, self)
    absl.flags._exceptions.DuplicateFlagError: The flag 'log_dir' is defined twice. First from absl.logging, Second from utils.  Description from first occurrence: directory to write logfiles into
    

    How can it be fixed?

    opened by nithrous 1
  • How to inference in test images?

    How to inference in test images?

    Hi, Dear All, thanks a lot for this great project! I have trained the model with 32x128 OCR images successfully. I have a question, how do we test the new test images with the model? Using sliding window? I mean generally speaking, the images detected from the previous text detection branch are variable lengths, how do we input these images into the model to get the prediction? I thought about the sliding window, but could you please provide some advice or reference papers on this? Thanks.

    opened by Remember2018 0
  • How does the inference work?

    How does the inference work?

    I strated trainning the model and i stoped it manually via keyboard exception to test the inference but when i run the command i get no errors and nothing happens?

    opened by jihadbourassi 0
  • Lablels in mode Infer

    Lablels in mode Infer

    Hi @watsonyanghx i found your code and i think i'm going to use it for license plate ocr but i want to ask first : In the inference mode do the images i want to test in mode infer have to have labels?

    opened by jihadbourassi 0
Owner
Watson Yang
Watson Yang
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

Tom 794 Dec 30, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

null 646 Nov 11, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 7, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 9, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 2, 2023
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 5, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 6, 2021