CNN+LSTM+CTC based OCR implemented using tensorflow.



Note: there is No restriction on the number of characters in the image (variable length). Have a look at the image bellow.

I trained a model with 100k images using this code and got 99.75% accuracy on test dataset (200k images) in the competition. The images in both dataset:

Update 2017.11.6:

The competiton page is not available now, if you want to reproduce this result, please see this issue about dataset, the lable file (a .txt file) is in the same folder with images after extracting .tar.gz file.

Update 2018.4.24:

Update to tensorflow 1.7 and fix some bugs reported at issue #8.


The images are first processed by a CNN to extract features, then these extracted features are fed into a LSTM for character recognition.

The architecture of CNN is just Convolution + Batch Normalization + Leaky Relu + Max Pooling for simplicity, and the LSTM is a 2 layers stacked LSTM, you can also try out Bidirectional LSTM.

You can play with the network architecture (add dropout to CNN, stacked layers of LSTM etc.) and see what will happen. Have a look at CNN part and LSTM part.


  1. Python 3.6.4

  2. TensorFlow 1.2

  3. Opencv3 (Not a must, used to read images).

How to run

There are many other parameters with which you can play, have a look at

Note that the num_classes is not added to parameters talked above for clarification.

# cd to the your workspace.
# The code will evaluate the accuracy every validation_steps specified in parameters.

ls -R

  train  infer  val  labels.txt
  1.png  2.png  ...  50000.png
  1.png  2.png  ...  50000.png

  1.png  2.png  ...  300000.png
# Train the model.
CUDA_VISIBLE_DEVICES=0 python ./ --train_dir=../imgs/train/ \
  --val_dir=../imgs/val/ \
  --image_height=60 \
  --image_width=180 \
  --image_channel=1 \
  --out_channels=64 \
  --num_hidden=128 \
  --batch_size=128 \
  --log_dir=./log/train \
  --num_gpus=1 \

# Inference
CUDA_VISIBLE_DEVICES=0 python ./ --infer_dir=./imgs/infer/ \
  --checkpoint_dir=./checkpoint/ \
  --num_gpus=0 \

Run with your own data.

  1. Prepare your data, make sure that all images are named in format: id_label.jpg, e.g: 004_(1+4)*2.jpg.
# make sure the data path is correct, have a look at

  1. Run following How to run
  • IndexError: list index out of range

    Hi, Thanks for sharing the great work!

    I downloaded the data based on the suggestion of this link.

    Then I tried running the training script, but encountered below error,

        train_feeder = utils.DataIterator(data_dir=train_dir)
      File "/home/levin/workspace/snrprj/CNN_LSTM_CTC_Tensorflow/", line 73, in __init__
        code = image_name.split('/')[-1].split('_')[1].split('.')[0]
    IndexError: list index out of range

    It looks to me that the script expects to get label for each image from its filename. So to get the code run properly and train the model, we will have to first rename the image files based on the labels.txt file, is this correct?

    opened by LevinJ 3
  • 文件名中 73091_(8+9)*4.png  含有特殊字符,是不能命名成功的,不知道您是怎么处理的

    在网上看过您的CNN_LSTM_CTC_Tensorflow 源码,也下载了数据集,想重现您的结果,有几个问题请教一下,谢谢! 1,源码是在这里下载的,,数据集也下载解压了。 D:\Tensorflow\CNN_LSTM_CTC_Tensorflow-master\imgs 解压后目录结构 imgs\labels.txt imgs\image_contest_level_1\

    2.运行 后,在imgs 目录下生成了 X_train.txt、 X_val.txt、 y_train.txt、 y_val.txt4个文件是正常的。

    X_train.txt 训练的文件名 X_val.txt 测试的文件名

    y_train.txt 训练的答案 y_val.txt 测试的答案

    但 cp_file(X_train, y_train, './imgs/train/') cp_file(X_val, y_val, './imgs/val/') "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/ ['./imgs/image_contest_level_1/0.png' './imgs/image_contest_level_1/1.png' './imgs/image_contest_level_1/2.png' './imgs/image_contest_level_1/3.png' './imgs/image_contest_level_1/4.png' './imgs/image_contest_level_1/5.png' './imgs/image_contest_level_1/6.png' './imgs/image_contest_level_1/7.png' './imgs/image_contest_level_1/8.png' './imgs/image_contest_level_1/9.png'] Traceback (most recent call last): File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/", line 129, in cp_file(X_train, y_train, './imgs/train/') File "D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/", line 102, in cp_file shutil.copyfile(file_path, dest_filename) File "D:\Program Files\Python365\lib\", line 121, in copyfile with open(dst, 'wb') as fdst: OSError: [Errno 22] Invalid argument: './imgs/train/73091_(8+9)*4.png'

    进程完成,退出码 1 文件名中 73091_(8+9)*4.png 含有特殊字符,是不能命名成功的,不知道您是怎么处理的

    3.运行 报错

    "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/ D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/ SyntaxWarning: assertion is always true, perhaps remove parentheses? assert (FLAGS.cnn_count <= count_, "FLAGS.cnn_count should be <= {}!".format(count_))


    "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/

    feature_h: 4, feature_w: 12 lstm input shape: [40, 12, 256] loading train data size: 0 loading validation data size: 0

    2018-07-12 11:02:14.624545: I c:\users\user\source\repos\tensorflow\tensorflow\core\platform\] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2018-07-12 11:02:14.844809: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.63GiB 2018-07-12 11:02:14.845239: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] Adding visible gpu devices: 0 2018-07-12 11:02:16.119318: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-07-12 11:02:16.119683: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] 0 2018-07-12 11:02:16.119937: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] 0: N 2018-07-12 11:02:16.137500: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6410 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) =============================begin training=============================

    进程完成,退出码 0

    1. 运行 "D:\Program Files\Python365\python36.exe" D:/Tensorflow/CNN_LSTM_CTC_Tensorflow-master/

    进程完成,退出码 0


    opened by QQ2737499951 2
  • Change the image width and height

    Hello,I chang the Image width and height from(60,180)to(80,500),then I get an error:

    InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [40,288], In[1]: [176,512] [[Node: lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/concat, lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul/Enter)]] [[Node: Mean/_37 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_950_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

    Is there anything else I should change to fix this error?

    opened by wushilian 2
  • Problem with frozen pb

    Problem with frozen pb

    I trained the model with a custom dataset and got the checkpoint files. I froze the model using this script

    import tensorflow as tf
    def freeze_graph(model_dir, output_node_names, frozen_graph_name):
        if not tf.gfile.Exists(model_dir):
            raise AssertionError(
                "Export directory doesn't exists. Please specify an export "
                "directory: %s" % model_dir)
        if not output_node_names:
            print("You need to supply the name of a node to --output_node_names.")
            return -1
        # We retrieve our checkpoint fullpath
        checkpoint = tf.train.get_checkpoint_state(model_dir)
        input_checkpoint = checkpoint.model_checkpoint_path
        # We precise the file fullname of our freezed graph
        absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
        output_graph = absolute_model_dir + "/" + frozen_graph_name + ".pb"
        # We clear devices to allow TensorFlow to control on which device it will load operations
        clear_devices = True
        # We start a session using a temporary fresh Graph
        with tf.Session(graph=tf.Graph()) as sess:
            # We import the meta graph in the current default Graph
            saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)
            # We restore the weights
            saver.restore(sess, input_checkpoint)
            gd = sess.graph.as_graph_def()
            # We use a built-in TF helper to export variables to constants
            output_graph_def = tf.graph_util.convert_variables_to_constants(
                sess,  # The session is used to retrieve the weights
                gd,  # The graph_def is used to retrieve the nodes
                output_node_names.split(",")  # The output node names are used to select the usefull nodes
            # Finally we serialize and dump the output graph to the filesystem
            with tf.gfile.GFile(output_graph, "wb") as f:
            print("%d ops in the final graph." % len(output_graph_def.node))
        return output_graph_def

    But when I'm loading the graph from the protobuf file, I'm getting this error:

    ValueError: Input 0 of node import/cnn/unit-4/bn4/BatchNorm/AssignMovingAvg/cnn/unit-4/bn4/BatchNorm/moving_mean/AssignAdd was passed float from import/cnn/unit-4/bn4/BatchNorm/cnn/unit-4/bn4/BatchNorm/moving_mean/local_step:0 incompatible with expected float_ref.

    I know this is a little off topic but any help is appreciated.

    opened by jsn5 1
  • Training does not begin:

    Training does not begin:

    Hi Guys

    I have prepared a small dataset just for trying out the network and see how it works. It seems like that its able to load the data set well and prints (Begin Training) but after that it just stops and do nothing.Here is what i see on screen: CUDA_VISIBLE_DEVICES=0 python ./ --train_dir=./imgs/train/ --val_dir=./imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 --mode=train

    feature_h: 4, feature_w: 12 lstm input shape: [128, 12, 256] loading train data ('size: ', 11) loading validation data size: 6

    2018-05-29 11:47:19.300427: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-29 11:47:19.954690: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-29 11:47:19.955398: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.50GiB 2018-05-29 11:47:19.955416: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0 2018-05-29 11:47:20.485722: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-29 11:47:20.485760: I tensorflow/core/common_runtime/gpu/] 0 2018-05-29 11:47:20.485768: I tensorflow/core/common_runtime/gpu/] 0: N 2018-05-29 11:47:20.485968: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3237 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0) =============================begin training============================= as you can see Training does not begin and i dont get any errors either

    opened by prolaser 1
  • about ctc cost nan and soaring avg_train_cost

    about ctc cost nan and soaring avg_train_cost

    got ctc cost nan error after 30 epoch in chinese sentence ocr training... I can delay the error by smaller lr, bigger lr decay. But how to prevent ctc cost nan?

    opened by zzks 0
  • raise _exceptions.DuplicateFlagError.from_flag

    raise _exceptions.DuplicateFlagError.from_flag

    Hi, I try to train but using this cmd command: --train_dir=../imgs/train/ --val_dir=../imgs/val/ --image_height=60 --image_width=180 --image_channel=1 --out_channels=64 --num_hidden=128 --batch_size=128 --log_dir=./log/train --num_gpus=1 -mode=train

    But got this error:

    Traceback (most recent call last):
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\", line 14, in <module>
        import cnn_lstm_otc_ocr
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\", line 6, in <module>
        import utils
      File "C:\Projects\CNN_LSTM_CTC_Tensorflow\", line 43, in <module>'log_dir', './log', 'the logging dir')
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\platform\", line 58, in wrapper
        return original_function(*args, **kwargs)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\", line 241, in DEFINE_string
        DEFINE(parser, name, default, help, flag_values, serializer, **args)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\", line 82, in DEFINE
        flag_values, module_name)
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\", line 104, in DEFINE_flag
        fv[] = flag
      File "C:\Users\N1throServer\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\flags\", line 430, in __setitem__
        raise _exceptions.DuplicateFlagError.from_flag(name, self)
    absl.flags._exceptions.DuplicateFlagError: The flag 'log_dir' is defined twice. First from absl.logging, Second from utils.  Description from first occurrence: directory to write logfiles into

    How can it be fixed?

    opened by nithrous 1
  • How to inference in test images?

    How to inference in test images?

    Hi, Dear All, thanks a lot for this great project! I have trained the model with 32x128 OCR images successfully. I have a question, how do we test the new test images with the model? Using sliding window? I mean generally speaking, the images detected from the previous text detection branch are variable lengths, how do we input these images into the model to get the prediction? I thought about the sliding window, but could you please provide some advice or reference papers on this? Thanks.

    opened by Remember2018 0
  • How does the inference work?

    How does the inference work?

    I strated trainning the model and i stoped it manually via keyboard exception to test the inference but when i run the command i get no errors and nothing happens?

    opened by jihadbourassi 0
  • Lablels in mode Infer

    Lablels in mode Infer

    Hi @watsonyanghx i found your code and i think i'm going to use it for license plate ocr but i want to ask first : In the inference mode do the images i want to test in mode infer have to have labels?

    opened by jihadbourassi 0
