Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Harald Scheidl

Last update: Jan 7, 2023

Related tags

Computer Vision machine-learning ocr deep-learning tensorflow recurrent-neural-networks handwritten-text-recognition

Overview

Handwritten Text Recognition with TensorFlow

Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows
Update 2020: code is compatible with TF2

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.

Run demo

Download the model trained on the IAM dataset. Put the contents of the downloaded file model.zip into the model directory of the repository. Afterwards, go to the src directory and run python main.py. The input image and the expected output is shown below.

> python main.py
Init with stored values from ../model/snapshot-39
Recognized: "Hello"
Probability: 0.42098119854927063

Command line arguments

--train: train the NN on 95% of the dataset samples and validate on the remaining 5%
--validate: validate the trained NN
--decoder: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
--batch_size: batch size
--data_dir: directory containing IAM dataset (with subdirectories img and gt)
--fast: use LMDB to load images (faster than loading image files from disk)
--dump: dumps the output of the NN to CSV file(s) saved in the dump folder. Can be used as input for the CTCDecoder

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png).

Integrate word beam search decoding

The word beam search decoder can be used instead of the two decoders shipped with TF. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

Follow these instructions to integrate word beam search decoding:

Clone repository CTCWordBeamSearch
Compile and install by running pip install . at the root level of the CTCWordBeamSearch repository
Specify the command line option --decoder wordbeamsearch when executing main.py to actually use the decoder

The dictionary is automatically created in training and validation mode by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file data/corpus.txt. Further, the manually created list of word-characters can be found in the file model/wordCharList.txt. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.

Train model with IAM dataset

Follow these instructions to get the IAM dataset:

Register for free at this website
Download words/words.tgz
Download ascii/words.txt
Create a directory for the dataset on your disk, and create two subdirectories: img and gt
Put words.txt into the gt directory
Put the content (directories a01, a02, ...) of words.tgz into the img directory

Start the training

Delete files from model directory if you want to train from scratch
Go to the src directory and execute python main.py --train --data_dir path/to/IAM
Training stops after a fixed number of epochs without improvement

Fast image loading

Loading and decoding the png image files from the disk is the bottleneck even when using only a small GPU. The database LMDB is used to speed up image loading:

Go to the src directory and run createLMDB.py --data_dir path/to/IAM with the IAM data directory specified
A subfolder lmdb is created in the IAM data directory containing the LMDB files
When training the model, add the command line option --fast

The dataset should be located on an SSD drive. Using the --fast option and a GTX 1050 Ti training takes around 3h with a batch size of 500.

Information about model

The model is a stripped-down version of the HTR system I implemented for my thesis. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:

The input image is a gray-value image and has a size of 128x32
5 CNN layers map the input image to a feature sequence of size 32x256
2 LSTM layers with 256 units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)

References

Comments

No accuracy, GPU option, char list fixed number

I am currently trying to train the model fro detecting handwritten words in Farsi language which is in nature cursive. After some amount of time I get absolutely 0.00 acc and all the predictions from the beginning to the end remains the same. This means there happens absolutely no training for the model. I have checked every steps that I have taken and have no idea what I am doing wrong. Is it possible to train this model for another language at all?

My second question is how can I set the training to be happening on GPU. I have a GPU and it is working fine on other training procedures on my machine but this code runs only on CPU. Is there any flag or something for this?

My last question is for the char list number. according to the decoder on English language the length of chars set must be 79 and if this condition does not satisfies the Tensorflow throws error which as far as I traced it relates to this. How can I make this suitable for any number of chars?

opened by rzamarefat 19

Doubling number of conv layers improves accuracy

I'm not sure the title is tremendously surprising to anyone, but I cleared 76% word accuracy with a deeper network. More interestingly, using a deeper network and terminating around epoch 25 yields a 74-75% word accuracy model, which is better and faster than training a smaller network to the bitter end.

screenshot from 2019-01-11 01-57-09

Relevant code:

		for i in range(numLayers):
			kernel = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i], featureVals[i + 1]], stddev=0.1))
			conv = tf.nn.conv2d(pool, kernel, padding='SAME',  strides=(1,1,1,1))
			conv_norm = tf.layers.batch_normalization(conv, training=self.is_train)
			relu = tf.nn.relu(conv_norm)
			kernel2 = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i+1], featureVals[i + 1]], stddev=0.1))
			conv2 = tf.nn.conv2d(relu, kernel2, padding='SAME',  strides=(1,1,1,1))
			conv_norm2 = tf.layers.batch_normalization(conv2, training=self.is_train)
			relu2 = tf.nn.relu(conv_norm2)
			pool = tf.nn.max_pool(relu2, (1, poolVals[i][0], poolVals[i][1], 1), (1, strideVals[i][0], strideVals[i][1], 1), 'VALID')

opened by Chazzz 13

Does wordbeamsearch allow for languages without spacing?

I've currently been using SimpleHTR in my day to day work, and the WBS functionality has been very useful. The issue is this: I'm using it mainly for the Thai language, where there are no spaces between words within a sentence. Something weird happens in the below sentence. Actual label: คุณต้องระวังอะไรเป็นพิเศษ? (the words in this sentence are: คุณ ต้อง ระวัง อะไร เป็น พิเศษ)

Bestpath prediction: คุณต้องระวังอะไรเป็นพิเศษ7

WBS prediction: คุณ.ต้อง-ระวัง%มร(ปีน9เศษ7

As you might be able to see, the words predicted are generally correct, but there are non-word characters introduced between them as if they were stopword characters or punctuation. Is there any workaround for this? I understand that wbs was likely designed with latin languages with natural gaps between words in mind, but is there a way to not force the introduction of nonword characters as punctuation for other languages?

opened by thetruejacob 10
batch normalization

Hi Harald,

Thank you for this starter in OCR NN. Is there any reason why you didn't include batch normalization layer between conv2d and relu?

Regards,
enhancement

opened by soldierofhell 10
Works only for single word

Hi, Is this model only works for single handwritten word? When the image has a collection of multiple words (a sentence) then this model gives poor/no result. But when the image has only one word like (Cat), this model works perfectly. Is my observation is correct or am I missing something here? Thanks.

opened by IamDixit 9
Training base model appears to yield higher error rate than the checked-in model
Hello Harald,

I am trying to train the model with some new IAM data, but before doing that I tried training with just the base dataset I was directed to in the README. The accuracy of the model changed from ~10% error rate in the GitHub model here, to ~43% error rate (according to accuracy.txt).

I ran into similar problems on:

Tensorflow 1.10, Python 3.6.5, Ubuntu 16.04.5 LTS

Tensorflow 1.10, Python 3.7.2, MacOS 10.13

I got this high error rate just by following the training instructions in the README. Hope to hear back from you soon on how you produced a more accurate model.

Thanks for your great work!
opened by prpaxson 8
"validate" option should use charList from model

https://github.com/githubharald/SimpleHTR/blob/7f26b321f8b8b18e5f60cbc1b7d5e1ad202e7487/src/main.py#L177

From my point of view validate should use char_list from the model instead from the loaded dataset. This way if the user uses a different dataset for validation it will work. In current implmentation, the charList depends of dataset and will create errors if the new dataset has not exactly same charlist as the learning dataset.

opened by atsju 7
Model size 800*64 ? (Trained on word or line images)

Hello sir , Thanks for sharing the repo with us . may you please let me know , that you trained on word images or line images (of IAM) if Model size is 800*64(Used in your thesis )(FAQ article)?? This would be great help. I tried to train on word images even after 24 Hours Word Accuracy is 1% so i thought i should try to train over line images. Kindly guide

opened by jjsr 7
Assertion Error

Traceback (most recent call last): File "main.py", line 139, in main() File "main.py", line 115, in main loader = DataLoader(FilePaths.fnTrain, Model.batchSize, Model.imgSize, Model.maxTextLen) File "/home/syedjafer/Documents/FinalYearProj/SimpleHTR/src/DataLoader.py", line 43, in init assert (len(lineSplit) >= 9) , "Working" AssertionError: Working

opened by syedjafer 7
ZeroDivisionError: division by zero

I'm trying to run python main.py --train with a new image data. But I got the error:

Init with new values Epoch: 1 Train NN Validate NN Traceback (most recent call last): File "main.py", line 142, in main() File "main.py", line 129, in main train(model, loader) File "main.py", line 39, in train charErrorRate = validate(model, loader) File "main.py", line 84, in validate charErrorRate = numCharErr / numCharTotal ZeroDivisionError: division by zero

How can I solve it?

opened by helium390 6
No saved model found

Python: 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49) [GCC 7.2.0] Tensorflow: 1.8.0 2018-06-25 17:43:33.775081: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Traceback (most recent call last): File "main.py", line 97, in infer(fnInfer) File "main.py", line 86, in infer model = Model(open(fnCharList).read(), mustRestore=True) File "/home/jatinpal/goall/SimpleHTR/src/Model.py", line 33, in init (self.sess, self.saver) = self.setupTF() File "/home/jatinpal/goall/SimpleHTR/src/Model.py", line 106, in setupTF raise Exception('No saved model found in: ' + modelDir) Exception: No saved model found in: ../model/

can anyone suggest why I an not able to locate the saved model . My laptop details ubuntu 16.04,8GB RAM, python 3.6.3(anaconda) tensorflow 1.8.0 cv2 3.4.1

opened by raina99 6

Owner

Harald Scheidl

Interested in computer vision, deep learning, C++ and Python.

GitHub https://towardsdatascience.com/2326a3487cd5

Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

160 Dec 21, 2022

IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.

IMGUR5K Handwriting Dataset To run the code for downloading the urls and generate corresponding annotations : Usage: python download_imgur5k.py --data

213 Dec 26, 2022

OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

562 Jan 3, 2023

Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

117 Jan 2, 2023

This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

2 Feb 6, 2022

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

Handwritten Number Recognition using CNN and Character Segmentation

Handwritten-Number-Recognition-With-Image-Segmentation Info About this repository This Repository is aimed at reading handwritten images of numbers an

17 Aug 25, 2022

Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

190 Jan 3, 2023

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

224 Jan 7, 2023

A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

444 Dec 30, 2022

This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

3 Aug 15, 2022

CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

356 Dec 8, 2022

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别文本检测：CTPN 文本识别：DenseNet + CTC 环境部署 sh setup.sh 注：CPU环境执行前需注释掉for gpu部分，并解开for cpu部分的注释 Demo 将测试图片放入test_images

2.6k Dec 29, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Related tags

Overview

Handwritten Text Recognition with TensorFlow

Run demo

Command line arguments

Integrate word beam search decoding

Train model with IAM dataset

Start the training

Fast image loading

Information about model

References

Comments

Owner

Harald Scheidl

Handwritten Text Recognition (HTR) using TensorFlow 2.x

OCR software for recognition of handwritten text

Apply different text recognition services to images of handwritten documents.

This can be use to convert text in a file to handwritten text.

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Handwritten Number Recognition using CNN and Character Segmentation

Detect handwritten words in a text-line (classic image processing method).

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

A pure pytorch implemented ocr project including text detection and recognition

This is used to convert a string to an Image with Handwritten Characters.

CNN+LSTM+CTC based OCR implemented using tensorflow.

CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Text recognition (optical character recognition) with deep learning methods.

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

OCR, Scene-Text-Understanding, Text Recognition