Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Jerod Weinman

Last update: Dec 21, 2022

Related tags

Computer Vision ocr tensorflow lstm text-recognition convolutional-neural-networks ctc

Overview

This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perform robust word recognition.

The model is a straightforward adaptation of Shi et al.'s CRNN architecture (arXiv:1507.0571). The provided code downloads and trains using Jaderberg et al.'s synthetic data (IJCV 2016), MJSynth.

Notably, the model achieves a lower test word error rate (1.82%) than CRNN when trained and tested on case-insensitive, closed vocabulary MJSynth data.

Written for Python 2.7. Requires TensorFlow >=1.10 (deprecation warnings exist for TF>1.10, but the code still works).

The model and subsequent experiments are more fully described in Weinman et al. (ICDAR 2019)

Structure

The model as built is a hybrid of Shi et al.'s CRNN architecture (arXiv:1507.0571) and the VGG deep convnet, which reduces the number of parameters by stacking pairs of small 3x3 kernels. In addition, the pooling is also limited in the horizontal direction to preserve resolution for character recognition. There must be at least one horizontal element per character.

Assuming one starts with a 32x32 image, the dimensions at each level of filtering are as follows:

Layer	Op	KrnSz	Stride(v,h)	OutDim	H	W	PadOpt
1	Conv	3	1	64	30	30	valid
2	Conv	3	1	64	30	30	same
	Pool	2	2	64	15	15
3	Conv	3	1	128	15	15	same
4	Conv	3	1	128	15	15	same
	Pool	2	2,1	128	7	14
5	Conv	3	1	256	7	14	same
6	Conv	3	1	256	7	14	same
	Pool	2	2,1	256	3	13
7	Conv	3	1	512	3	13	same
8	Conv	3	1	512	3	13	same
	Pool	3	3,1	512	1	13
9	LSTM			512
10	LSTM			512

To accelerate training, a batch normalization layer is included before each pooling layer and ReLU non-linearities are used throughout. Other model details should be easily identifiable in the code.

The default training mechanism uses the ADAM optimizer with learning rate decay.

Differences from CRNN

Deeper early convolutions

The original CRNN uses a single 3x3 convolution in the first two conv/pool stages, while this network uses a paired sequence of 3x3 kernels. This change increases the theoretical receptive field of early stages of the network.

As a tradeoff, we omit the computationally expensive 2x2x512 final convolutional layer of CRNN. In its place, this network vertically max pools over the remaining three rows of features to collapse to a single 512-dimensional feature vector at each horizontal location.

The combination of these changes preserves the theoretical receptive field size of the final CNN layer, but reduces the number of convolution parameters to be learned by 15%.

Padding

Another important difference is the lack of zero-padding in the first convolutional layer, which can cause spurious strong filter responses around the border. By trimming the first convolution to valid regions, this model erodes the outermost pixel of values from the response filter maps (reducing height from 32 to 30 and reducing the width by two pixels).

This approach seems preferable to requiring the network to learn to ignore strong Conv1 responses near the image edge (presumably by weakening the power of filters in subsequent convolutional layers).

Batch normalization

We include batch normalization after each pair of convolutions (i.e., after layers 2, 4, 6, and 8 as numbered above). The CRNN does not include batch normalization after its first two convolutional stages. Our model therefore requires greater computation with an eye toward decreasing the number of training iterations required to reach converegence.

Subsampling/stride

The first two pooling stages of CRNN downsample the feature maps with a stride of two in both spatial dimensions. This model instead preserves sequence length by downsampling horizontally only after the first pooling stage.

Because the output feature map must have at least one timeslice per character predicted, overzealous downsampling can make it impossible to represent/predict sequences of very compact or narrow characters. Reducing the horizontal downsampling allows this model to recognize words in narrow fonts.

This increase in horizontal resolution does mean the LSTMs must capture more information. Hence this model uses 512 hidden units, rather than the 256 used by the CRNN. We found this larger number to be necessary for good performance.

Training

To completely train the model, you will need to download the mjsynth dataset and pack it into sharded TensorFlow records. Then you can start the training process, a tensorboard monitor, and an ongoing evaluation thread. The individual commands are packaged in the accompanying Makefile.

make mjsynth-download
make mjsynth-tfrecord
make train &
make monitor &
make test

To monitor training, point your web browser to the url (e.g., (http://127.0.1.1:8008)) given by the Tensorboard output.

Note that it may take 4-12 hours to download the complete mjsynth data set. A very small set (0.1%) of packaged example data is included; to run the small demo, skip the first two lines involving mjsynth.

With a GeForce GTX 1080, the demo takes about 20 minutes for the validation character error to reach 45% (using the default parameters); at one hour (roughly 7000 iterations), the validation error is just over 20%.

With the full training data, by one million iterations the model typically converges to around 5% training character error and 27.5% word error.

Checkpoints

Pre-trained model checkpoints at DOI:11084/23328 are used to produce results in the following paper:

Weinman, J. et al. (2019) Deep Neural Networks for Text Detection and Recognition in Historical Maps. In Proc. ICDAR.

Testing

The evaluate script (src/evaluate.py) streams statistics for one batch of validation (or evaluation) data. It prints the iteration, evaluation batch loss, label error (percentage of characters predicted incorrectly), and the sequence error (percentage of words—entire sequences—predicted incorrectly).

The test script (src/test.py) tallies statistics, finally normalizing for all data. It prints the loss, label error, total number of labels, sequence error, total number of sequences, and the label error rate and sequence error rate.

Validation

To see the output of a small set of instances, the validation script (src/validation.py) allows you to load a model and read an image one at a time via the process's standard input and print the decoded output for each. For example

cd src ; python validate.py < ~/paths_to_images.txt

Alternatively, you can run the program interactively by typing image paths in the terminal (one per line, type Control-D when you want the model to run the input entered so far).

Configuration

There are many command-line options to configure training parameters. Run train.py or test.py with the --help flag to see them or inspect the scripts. Model parameters are not command-line configurable and need to be edited in the code (see src/model.py).

Dynamic training data

Dynamic data can be used for training or testing by setting the --nostatic_data flag.

You can use the --ipc_synth boolean flag [default=True] to determine whether to use single-threaded or a buffered, multiprocess synthesis.

The --synth_config_file flag must be given with --nostatic_data.

The MapTextSynthesizer library supports training with dynamically synthesized data. The relevant code can be found within MapTextSynthesizer/tensorflow/generator

Using a lexicon

By default, recognition occurs in "open vocabulary" mode. That is, the system observes no constraints on producing the resulting output strings. However, it also has a "closed vocabulary" mode that can efficiently limit output to a given word list as well as a "mixed vocabulary" mode that can produce either a vocabulary word from a given word list (lexicon) or a non-vocabulary word, depending on the value of a prior bias for lexicon words.

Using the closed or mixed vocabulary modes requires additional software. This repository is connected with a fork of Harald Scheidl's CTCWordBeamSearch, obtainable as follows:

git clone https://github.com/weinman/CTCWordBeamSearch
cd CTCWordBeamSearch
git checkout var_seq_len

Then follow the build instructions, which may be as simple as running

cd cpp/proj
./buildTF.sh

To use, make sure CTCWordBeamSearch/cpp/proj (the directory containing TFWordBeamSearch.so) is in the LD_LIBRARY_PATH when running test.py or validate.py (in this repository).

API Notes

This version uses the TensorFlow (v1.14) Dataset for fast I/O. Training, testing, validation, and prediction use a custom Estimator.

Citing this work

Please cite the following paper if you use this code in your own research work:

@inproceedings{ weinman19deep,
    author = {Jerod Weinman and Ziwen Chen and Ben Gafford and Nathan Gifford and Abyaya Lamsal and Liam Niehus-Staab},
    title = {Deep Neural Networks for Text Detection and Recognition in Historical Maps},
    booktitle = {Proc. IAPR International Conference on Document Analysis and Recognition},
    month = {Sep.},
    year = {2019},
    location = {Sydney, Australia},
    doi = {10.1109/ICDAR.2019.00149}
}

Acknowledgment

This work was supported in part by the National Science Foundation under grant Grant Number 1526350.

Comments

often recognize 'u' wrongly

Hello,

I trained your model with mjsynth dataset and default parameter settings over 1000000 steps. I found that the model often wrongly recognizes character 'u'. It seems as if there is no 'u' class. Do you have any thoughts about what the cause might be?

opened by kojit 12
Training error

When I trained your sample data with tensorflow-gpu 1.12, I got this error (I've cloned tf-1.12 branch, but it had same error).

INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_session_config': allow_soft_placement: true , '_save_checkpoints_steps': None, '_service': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_protocol': None, '_master': '', '_tf_random_seed': None, '_save_checkpoints_secs': 120, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd68bd37390>, '_experimental_distribute': None, '_keep_checkpoint_max': 5, '_is_chief': True, '_task_type': 'worker', '_device_fn': None, '_train_distribute': None, '_save_summary_steps': 100, '_model_dir': '../data/model', '_evaluation_master': '', '_num_ps_replicas': 0} Traceback (most recent call last): File "train.py", line 182, in tf.app.run() File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "train.py", line 179, in main classifier.train( input_fn=_get_input, max_steps=FLAGS.max_num_steps ) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1234, in _train_model_default input_fn, model_fn_lib.ModeKeys.TRAIN)) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1075, in _get_features_and_labels_from_input_fn self._call_input_fn(input_fn, mode)) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1162, in _call_input_fn return input_fn(**kwargs) File "train.py", line 130, in _get_input dataset = pipeline.get_data( FLAGS.static_data, **data_args) File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/pipeline.py", line 79, in get_data dataset = dpipe.get_dataset( dpipe_args ) File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/mjsynth.py", line 60, in get_dataset buffer_size=buffer_sz ) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 218, in init prefetch_input_elements=None) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 134, in init cycle_length, block_length) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2714, in init super(InterleaveDataset, self).init(input_dataset, map_func) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2677, in init experimental_nested_dataset_support=True) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1860, in init self._function.add_to_graph(ops.get_default_graph()) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 479, in add_to_graph self._create_definition_if_needed() File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 335, in _create_definition_if_needed self._create_definition_if_needed_impl() File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 344, in _create_definition_if_needed_impl self._capture_by_value, self._caller_device) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 864, in func_graph_from_py_func outputs = func(*func_graph.inputs) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1794, in tf_data_structured_function_wrapper ret = func(*nested_args) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 210, in read_one_file return _TFRecordDataset(filename, compression_type, buffer_size) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 105, in init argument_default=_DEFAULT_READER_BUFFER_SIZE_BYTES) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/util/convert.py", line 32, in optional_param_to_tensor argument_value, dtype=argument_dtype, name=argument_name) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor as_ref=False) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 442, in make_tensor_proto _AssertCompatible(values, dtype) File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in _AssertCompatible (dtype.name, repr(mismatch), type(mismatch).name)) TypeError: Expected int64, got 256.0 of type 'float' instead.

opened by thanhhau097 9
validate.py speed problem

one picture one time needs 30 seconds, -- validate.py picture is 32*280 around, 5000+ chars, 200mb model size. how to speed up? 30 seconds is too long.

opened by wkhunter 8

error with the mjsynth-tfrecord.py file

I downloaded the mjsynth dataset separately and stored the images in the image subpath under the data directory. Basically, I did everything manually up until the "make mjsynth-tfrecord.py" command. When i ran the command, it showed me a syntax error in the print line in this line from the mjsynth-tfrecord.py file.

    print str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print str(i),'of',str(num_shards),'[',str(start),':]',out_filename
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)

since i am using python 3.6, I thought the problem is the absence of opening and closing brackets in the print line, hence i changed it to this...

    print (str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename)
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print (str(i),'of',str(num_shards),'[',str(start),':]',out_filename)
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)

And the program started runnig, but Im seeing a lot of files read a error corrosponding to this line

    except:
        # Some files have bogus payloads, catch and note the error, moving on
        print('ERROR',filename)

Can anyone tell me why this is happening? Thankyou for the help in advance.

opened by Kumara-Kaushik 8

TypeError: __init__() got an unexpected keyword argument 'session_config'`

when i run validate.py,i encounter an error: D:\Tensorflow\cnn_lstm_ctc_ocr-master\src>python validate.py d:/Tensorflow/cnn_lstm_ctc_ocr_master/src/11.jpg d:\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "validate.py", line 109, in <module> tf.app.run() File "d:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "validate.py", line 89, in main classifier = tf.estimator.Estimator( config=_get_config(), File "validate.py", line 82, in _get_config custom_config = tf.estimator.RunConfig( session_config=device_config ) TypeError: __init__() got an unexpected keyword argument 'session_config' my tensorflow version is 1.2.1,windows

opened by lijian-1994 7
image pixel value

excuse me，in function _preprocess_image(image) (Mjsynth.py), why rescale the pixels value to float([-0.5,0.5]) not float([0,1]). can u tell me why? tks

opened by oftenliu 7
Feature Extraction using CNN and Window width

Hi, I would like to use this code to extract features using CNN, I'm asking if I can use a sliding window width more than 1 pixel . My goal is to extract a set of features based on CNN and to train a BLSTM-CTC recognizer.

opened by sanakhamekhem 7
confidence on sess passing

hey i am using the old versions of your code which at there you make a session, loading once on a server and inferencing it for several times, the problem is I can not get access to the confidence of the prediction; I am using your code which runs with sess.run not with tf.estimator, can you help me to solve this problem ?

opened by imohammadhossein 6

Computing real sequence length

Hi! I have a simple doubt about the calculation of the sequence length after the conv and pool layers. In the following code, why did you calculate the seq len just until the fourth pooling op (after_pool4)?

conv1 = conv_layer(inputs, layer_params[0], training ) # 30,30
conv2 = conv_layer( conv1, layer_params[1], training ) # 30,30
pool2 = pool_layer( conv2, 2, 'valid', 'pool2')        # 15,15
conv3 = conv_layer( pool2, layer_params[2], training ) # 15,15
conv4 = conv_layer( conv3, layer_params[3], training ) # 15,15
pool4 = pool_layer( conv4, 1, 'valid', 'pool4' )       # 7,14
conv5 = conv_layer( pool4, layer_params[4], training ) # 7,14
conv6 = conv_layer( conv5, layer_params[5], training ) # 7,14
pool6 = pool_layer( conv6, 1, 'valid', 'pool6')        # 3,13
conv7 = conv_layer( pool6, layer_params[6], training ) # 3,13
conv8 = conv_layer( conv7, layer_params[7], training ) # 3,13
pool8 = tf.layers.max_pooling2d( conv8, [3,1], [3,1], 
                           padding='valid', name='pool8') # 1,13

features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim

kernel_sizes = [ params[1] for params in layer_params]

#Calculate resulting sequence length from original image widths
conv1_trim = tf.constant( 2 * (kernel_sizes[0] // 2),
                          dtype=tf.int32,
                          name='conv1_trim')
one = tf.constant(1, dtype=tf.int32, name='one')
two = tf.constant(2, dtype=tf.int32, name='two')
after_conv1 = tf.subtract( widths, conv1_trim)
after_pool2 = tf.floor_div( after_conv1, two )
after_pool4 = tf.subtract(after_pool2, one)
sequence_length = tf.reshape(after_pool4,[-1], name='seq_len') # Vectorize

opened by wellescastro 6

TypeError

There is a error " init() got an unexpected keyword argument 'train_distribute'" I search the function 'tf.estimator.RunConfig' and don't find the parameter "train_distribute" Coule you help me correct it? i use python2.7 and tensorflow1.5

opened by w867066886 4
Dynamic training data shape error.

ValueError: generator yielded an element of shape (37, 109, 1) where an element of shape (32, ?, 1) was expected.

the pipline.py call preprocess data dataset = dataset.map( dpipe.preprocess_fn, num_parallel_calls=num_threads ) seems ok and maptextsynth.py use the new normalize_image method

def _preprocess_image( image ): """Rescale image""" image = pipeline.normalize_image(image) return image

opened by pczzy 4

How to convert .ckpt model to SavedModel .pb format for hosting with Tensorflow Model Serving?

Models trained by this pipeline perform great. But how to host them using Tesorflow Model Serving? Checkpoint needs to be converted into SavedModel (.pb) format.

What I've done so far is:

I've modified following method to provide model with an output tensor. The line of code I added is: logits = tf.nn.softmax(logits, name="softmax_tensor")

def _get_image_info( features, mode ):
    """Calculates the logits and sequence length"""

    image = features['image']
    width = features['width']

    conv_features,sequence_length = model.convnet_layers(image,
                                                         width,
                                                         mode)

    logits = model.rnn_layers(conv_features, sequence_length,
                              charset.num_classes())

    logits = tf.nn.softmax(logits, name="softmax_tensor")

    return logits, sequence_length

Then following MNIST eample, I have added a simple serving input receiver function:

def serving_input_receiver_fn():
    """
    This is used to define inputs to serve the model.
    :return: ServingInputReciever
    """
    reciever_tensors = {
        # The size of input image is flexible.
        'image': tf.placeholder(tf.float32, [None, None, None, 1]),
        'width': tf.placeholder(tf.int32, [None, 1]),
        'length': tf.placeholder(tf.int64, [None, 1]),
        'text': tf.placeholder(tf.string, [None,]),
    }

    # Convert give inputs to adjust to the model.
    features = {
        # Resize given images.
        'image': tf.image.resize_images(reciever_tensors['image'], [28, 28]),
        'width': tf.shape(reciever_tensors['image'])[1],
    }
    return tf.estimator.export.ServingInputReceiver(receiver_tensors=reciever_tensors,
                                                    features=features)

Next in train.py, I have added: classifier.export_saved_model(saved_dir, serving_input_receiver_fn=model_fn.serving_input_receiver_fn)
After that, when I tried to train, I received following error: TypeError: Expected labels (first argument) to be a SparseTensor

5.To fix that, in model.py, I modified following method, where I converted "sequence_labels" from dense to sparse tensor.

def ctc_loss_layer( rnn_logits, sequence_labels, sequence_length,
                    reduce_mean=True ):
    """Build CTC Loss layer for training"""
    labels_sparse = dense_to_sparse(sequence_labels, sparse_val=0)
    losses = tf.nn.ctc_loss( labels_sparse,
                             rnn_logits, 
                             sequence_length,
                             time_major=True, 
                             ignore_longer_outputs_than_inputs=True )
    if (reduce_mean):
        loss = tf.reduce_mean( losses )
    else:
        loss = tf.reduce_sum( losses )

    return loss
    
def dense_to_sparse(dense_tensor, sparse_val=0):
    with tf.name_scope("dense_to_sparse"):
        sparse_inds = tf.where(tf.not_equal(dense_tensor, sparse_val),
                               name="sparse_inds")
        sparse_vals = tf.gather_nd(dense_tensor, sparse_inds,
                                   name="sparse_vals")
        dense_shape = tf.shape(dense_tensor, name="dense_shape",
                               out_type=tf.int64)
        return tf.SparseTensor(sparse_inds, sparse_vals, dense_shape)

Now I am facing the next exception: ValueError: Tried to convert 'x' to a tensor and failed. Error: None values not supported.

If anyone tried to convert the model by this pipeline to SavedModel for hosting with Tensorflow Model Serving, all help is welcome. Thank you. This pipeline is generating very good accuracy. We need to add handling for SavedModel conversion so we could host it using Tensorflow Model Serving. So far I've been unsuccessful, but going in the right direction. I think collaboratively we can do it faster. Thank you for your help.

opened by igorvishnevskiy 6

Owner

Jerod Weinman

Associate Professor of Computer Science

GitHub

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别文本检测：CTPN 文本识别：DenseNet + CTC 环境部署 sh setup.sh 注：CPU环境执行前需注释掉for gpu部分，并解开for cpu部分的注释 Demo 将测试图片放入test_images

2.6k Dec 29, 2022

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

224 Jan 7, 2023

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 8, 2023

A small C++ implementation of LSTM networks, focused on OCR.

clstm CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations. Status and sco

794 Dec 30, 2022

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

4 Jul 11, 2022

Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

5 Dec 6, 2021

PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 4, 2022

🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

646 Nov 11, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

138 Dec 9, 2022

Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

3.2k Dec 31, 2022

Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

1.1k Jan 2, 2023

Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

3.2k Dec 31, 2022

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

16.7k Jan 3, 2023

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Related tags

Overview

Overview

Structure

Differences from CRNN

Deeper early convolutions

Padding

Batch normalization

Subsampling/stride

Training

Checkpoints

Testing

Validation

Configuration

Dynamic training data

Using a lexicon

API Notes

Citing this work

Acknowledgment

Comments

Owner

Jerod Weinman

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

A small C++ implementation of LSTM networks, focused on OCR.

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

Indonesian ID Card OCR using tesseract OCR

PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

🖺 OCR using tensorflow with attention

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Repository collecting all the submodules for the new PyTorch-based OCR System.

Python-based tools for document analysis and OCR

Visual Attention based OCR

Python-based tools for document analysis and OCR

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

A Python wrapper for the tesseract-ocr API

FastOCR is a desktop application for OCR API.

OCR-D-compliant page segmentation