Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Overview

Handwritten Text Recognition with TensorFlow

  • Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows
  • Update 2020: code is compatible with TF2

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.

htr

Run demo

Download the model trained on the IAM dataset. Put the contents of the downloaded file model.zip into the model directory of the repository. Afterwards, go to the src directory and run python main.py. The input image and the expected output is shown below.

test

> python main.py
Init with stored values from ../model/snapshot-39
Recognized: "Hello"
Probability: 0.42098119854927063

Command line arguments

  • --train: train the NN on 95% of the dataset samples and validate on the remaining 5%
  • --validate: validate the trained NN
  • --decoder: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
  • --batch_size: batch size
  • --data_dir: directory containing IAM dataset (with subdirectories img and gt)
  • --fast: use LMDB to load images (faster than loading image files from disk)
  • --dump: dumps the output of the NN to CSV file(s) saved in the dump folder. Can be used as input for the CTCDecoder

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png).

Integrate word beam search decoding

The word beam search decoder can be used instead of the two decoders shipped with TF. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

decoder_comparison

Follow these instructions to integrate word beam search decoding:

  1. Clone repository CTCWordBeamSearch
  2. Compile and install by running pip install . at the root level of the CTCWordBeamSearch repository
  3. Specify the command line option --decoder wordbeamsearch when executing main.py to actually use the decoder

The dictionary is automatically created in training and validation mode by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file data/corpus.txt. Further, the manually created list of word-characters can be found in the file model/wordCharList.txt. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.

Train model with IAM dataset

Follow these instructions to get the IAM dataset:

  • Register for free at this website
  • Download words/words.tgz
  • Download ascii/words.txt
  • Create a directory for the dataset on your disk, and create two subdirectories: img and gt
  • Put words.txt into the gt directory
  • Put the content (directories a01, a02, ...) of words.tgz into the img directory

Start the training

  • Delete files from model directory if you want to train from scratch
  • Go to the src directory and execute python main.py --train --data_dir path/to/IAM
  • Training stops after a fixed number of epochs without improvement

Fast image loading

Loading and decoding the png image files from the disk is the bottleneck even when using only a small GPU. The database LMDB is used to speed up image loading:

  • Go to the src directory and run createLMDB.py --data_dir path/to/IAM with the IAM data directory specified
  • A subfolder lmdb is created in the IAM data directory containing the LMDB files
  • When training the model, add the command line option --fast

The dataset should be located on an SSD drive. Using the --fast option and a GTX 1050 Ti training takes around 3h with a batch size of 500.

Information about model

The model is a stripped-down version of the HTR system I implemented for my thesis. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:

  • The input image is a gray-value image and has a size of 128x32
  • 5 CNN layers map the input image to a feature sequence of size 32x256
  • 2 LSTM layers with 256 units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)

nn_overview

References

Issues
  • No accuracy, GPU option, char list fixed number

    No accuracy, GPU option, char list fixed number

    I am currently trying to train the model fro detecting handwritten words in Farsi language which is in nature cursive. After some amount of time I get absolutely 0.00 acc and all the predictions from the beginning to the end remains the same. This means there happens absolutely no training for the model. I have checked every steps that I have taken and have no idea what I am doing wrong. Is it possible to train this model for another language at all?

    My second question is how can I set the training to be happening on GPU. I have a GPU and it is working fine on other training procedures on my machine but this code runs only on CPU. Is there any flag or something for this?

    My last question is for the char list number. according to the decoder on English language the length of chars set must be 79 and if this condition does not satisfies the Tensorflow throws error which as far as I traced it relates to this. How can I make this suitable for any number of chars?

    opened by rzamarefat 19
  • Doubling number of conv layers improves accuracy

    Doubling number of conv layers improves accuracy

    I'm not sure the title is tremendously surprising to anyone, but I cleared 76% word accuracy with a deeper network. More interestingly, using a deeper network and terminating around epoch 25 yields a 74-75% word accuracy model, which is better and faster than training a smaller network to the bitter end.

    screenshot from 2019-01-11 01-57-09

    Relevant code:

    		for i in range(numLayers):
    			kernel = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i], featureVals[i + 1]], stddev=0.1))
    			conv = tf.nn.conv2d(pool, kernel, padding='SAME',  strides=(1,1,1,1))
    			conv_norm = tf.layers.batch_normalization(conv, training=self.is_train)
    			relu = tf.nn.relu(conv_norm)
    			kernel2 = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i+1], featureVals[i + 1]], stddev=0.1))
    			conv2 = tf.nn.conv2d(relu, kernel2, padding='SAME',  strides=(1,1,1,1))
    			conv_norm2 = tf.layers.batch_normalization(conv2, training=self.is_train)
    			relu2 = tf.nn.relu(conv_norm2)
    			pool = tf.nn.max_pool(relu2, (1, poolVals[i][0], poolVals[i][1], 1), (1, strideVals[i][0], strideVals[i][1], 1), 'VALID')
    
    
    opened by Chazzz 13
  • Does wordbeamsearch allow for languages without spacing?

    Does wordbeamsearch allow for languages without spacing?

    I've currently been using SimpleHTR in my day to day work, and the WBS functionality has been very useful. The issue is this: I'm using it mainly for the Thai language, where there are no spaces between words within a sentence. Something weird happens in the below sentence. image Actual label: คุณต้องระวังอะไรเป็นพิเศษ? (the words in this sentence are: คุณ ต้อง ระวัง อะไร เป็น พิเศษ)

    image Bestpath prediction: คุณต้องระวังอะไรเป็นพิเศษ7

    image WBS prediction: คุณ.ต้อง-ระวัง%มร(ปีน9เศษ7

    As you might be able to see, the words predicted are generally correct, but there are non-word characters introduced between them as if they were stopword characters or punctuation. Is there any workaround for this? I understand that wbs was likely designed with latin languages with natural gaps between words in mind, but is there a way to not force the introduction of nonword characters as punctuation for other languages?

    opened by thetruejacob 10
  • batch normalization

    batch normalization

    Hi Harald,

    Thank you for this starter in OCR NN. Is there any reason why you didn't include batch normalization layer between conv2d and relu?

    Regards,

    enhancement 
    opened by soldierofhell 10
  • Works only for single word

    Works only for single word

    Hi, Is this model only works for single handwritten word? When the image has a collection of multiple words (a sentence) then this model gives poor/no result. But when the image has only one word like (Cat), this model works perfectly. Is my observation is correct or am I missing something here? Thanks.

    opened by IamDixit 9
  • Training base model appears to yield higher error rate than the checked-in model

    Training base model appears to yield higher error rate than the checked-in model

    Hello Harald,

    I am trying to train the model with some new IAM data, but before doing that I tried training with just the base dataset I was directed to in the README. The accuracy of the model changed from ~10% error rate in the GitHub model here, to ~43% error rate (according to accuracy.txt).

    I ran into similar problems on:

    • Tensorflow 1.10, Python 3.6.5, Ubuntu 16.04.5 LTS
    • Tensorflow 1.10, Python 3.7.2, MacOS 10.13

    I got this high error rate just by following the training instructions in the README. Hope to hear back from you soon on how you produced a more accurate model.

    Thanks for your great work!

    opened by prpaxson 8
  • "validate" option should use charList from model

    https://github.com/githubharald/SimpleHTR/blob/7f26b321f8b8b18e5f60cbc1b7d5e1ad202e7487/src/main.py#L177

    From my point of view validate should use char_list from the model instead from the loaded dataset. This way if the user uses a different dataset for validation it will work. In current implmentation, the charList depends of dataset and will create errors if the new dataset has not exactly same charlist as the learning dataset.

    opened by atsju 7
  • Model size 800*64 ? (Trained on word or line images)

    Model size 800*64 ? (Trained on word or line images)

    Hello sir , Thanks for sharing the repo with us . may you please let me know , that you trained on word images or line images (of IAM) if Model size is 800*64(Used in your thesis )(FAQ article)?? This would be great help. I tried to train on word images even after 24 Hours Word Accuracy is 1% so i thought i should try to train over line images. Kindly guide

    opened by jjsr 7
  • Assertion Error

    Assertion Error

    Traceback (most recent call last): File "main.py", line 139, in main() File "main.py", line 115, in main loader = DataLoader(FilePaths.fnTrain, Model.batchSize, Model.imgSize, Model.maxTextLen) File "/home/syedjafer/Documents/FinalYearProj/SimpleHTR/src/DataLoader.py", line 43, in init assert (len(lineSplit) >= 9) , "Working" AssertionError: Working

    opened by syedjafer 7
  • ZeroDivisionError: division by zero

    ZeroDivisionError: division by zero

    I'm trying to run python main.py --train with a new image data. But I got the error:

    Init with new values Epoch: 1 Train NN Validate NN Traceback (most recent call last): File "main.py", line 142, in main() File "main.py", line 129, in main train(model, loader) File "main.py", line 39, in train charErrorRate = validate(model, loader) File "main.py", line 84, in validate charErrorRate = numCharErr / numCharTotal ZeroDivisionError: division by zero

    How can I solve it?

    opened by helium390 6
  • No saved model found

    No saved model found

    Python: 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49) [GCC 7.2.0] Tensorflow: 1.8.0 2018-06-25 17:43:33.775081: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

    Traceback (most recent call last): File "main.py", line 97, in infer(fnInfer) File "main.py", line 86, in infer model = Model(open(fnCharList).read(), mustRestore=True) File "/home/jatinpal/goall/SimpleHTR/src/Model.py", line 33, in init (self.sess, self.saver) = self.setupTF() File "/home/jatinpal/goall/SimpleHTR/src/Model.py", line 106, in setupTF raise Exception('No saved model found in: ' + modelDir) Exception: No saved model found in: ../model/

    can anyone suggest why I an not able to locate the saved model . My laptop details ubuntu 16.04,8GB RAM, python 3.6.3(anaconda) tensorflow 1.8.0 cv2 3.4.1

    opened by raina99 6
Owner
Harald Scheidl
Interested in computer vision, deep learning, C++ and Python.
Harald Scheidl
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 140 Jun 23, 2022
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 523 Jun 27, 2022
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 98 Jun 24, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 6, 2022
Handwritten Number Recognition using CNN and Character Segmentation

Handwritten-Number-Recognition-With-Image-Segmentation Info About this repository This Repository is aimed at reading handwritten images of numbers an

Sparsha Saha 16 Jun 5, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 178 Jun 21, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 199 Jun 18, 2022
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 392 Jun 16, 2022
This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

Akashdeep Mahata 2 Apr 30, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 354 May 11, 2022
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.5k Jun 21, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 911 Jun 18, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jun 28, 2022
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 2.9k Jun 25, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Jun 29, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 489 Apr 28, 2022
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 352 Jun 22, 2022