Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Hoàng Tùng Lâm (Linus)

Last update: May 7, 2022

Related tags

Deep Learning LineHTR

Overview

Line-level Handwritten Text Recognition with TensorFlow

This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and can handle a full line of text image. Huge thanks to @Harald Scheidl for his great works.

How to run

Go to the src/ directory and run python main.py with these following arguments

Command line arguments

--train: train the NN, details see below.
--validate: validate the NN, details see below.
--beamsearch: use vanilla beam search decoding (better, but slower) instead of best path decoding.
--wordbeamsearch: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.

I don't include any pretrained model in this branch so you will need to train the model on your data first

Train model

I created this model for the Cinnamon AI Marathon 2018 competition, they released a small dataset but it's in Vietnamese, so you guys may want to try some other dataset like [4]IAM for English.

As long as your dataset contain a labels.json file like this:

{
    "img1.jpg": "abc xyz",
    ...
    "imgn.jpg": "def ghi"
}

With eachkey is the path to the images file and each value is the ground truth label for that image, this code will works fine.

Learning is visualized by Tensorboard, I tracked the character error rate, word error rate and sentences accuracy for this model. All logs will be saved in ./logs/ folder. You can start a Tensorboard session to see the logs with this command tensorboard --logdir='./logs/'

It's took me about 48 hours with about 13k images on a single GTX 1060 6GB to get down to 0.16 CER on the private testset of the competition.

Information about model

Overview

The model is a extended version of the Simple HTR system @Harald Scheidl implemented It consists of 7 CNN layers, 2 RNN (Bi-LSTM) layers and the CTC loss and decoding layer and can handle a full line of text image

The input image is a gray-value image and has a size of 800x64
7 CNN layers map the input image to a feature sequence of size 100x512
2 LSTM layers with 512 units propagate information through the sequence and map the sequence to a matrix of size 100x205. Each matrix-element represents a score for one of the 205 characters at one of the 100 time-steps
The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)
Batch size is set to 50

Highest accuracy achieved is 0.84 on the private testset of the Cinnamon AI Marathon 2018 competition (measure by Charater Error Rate - CER).

Improve accuracy

If you need a better accuracy, here are some ideas how to improve it [2]:

Data augmentation: increase dataset-size by applying further (random) transformations to the input images. At the moment, only random distortions are performed.
Remove cursive writing style in the input images (see DeslantImg).
Increase input size.
Add more CNN layers or use transfer learning on CNN.
Replace Bi-LSTM by 2D-LSTM.
Replace optimizer: Adam improves the accuracy, however, the number of training epochs increases (see discussion).
Decoder: use token passing or word beam search decoding [3] (see CTCWordBeamSearch) to constrain the output to dictionary words.
Text correction: if the recognized word is not contained in a dictionary, search for the most similar one.

Btw, don't hesitate to ask me anything via a Github Issue (See the issue template file for more details)

BTW, big shout out to Sushant Gautam for extended this code for IAM dataset, he even provide pretrained model and web UI for inferences the model. Don't forget to check his repo out.

References

[1] Build a Handwritten Text Recognition System using TensorFlow

[2] Scheidl - Handwritten Text Recognition in Historical Documents

[3] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition

Comments

Keras implementation help
Versions

TensorFlow version = 1.15

Python version = 3.7

Operating system = archlinux

Issue

Hi i am trying to implement this with keras but i am stuck with LSTM and ctc layers here is my model summary please have a look at it
opened by shekarneo 12
error while using the model with CTC wordbeamsearch

tensorflow version 1.8 python version 3.7 ubuntu 18.04 LTS

I get the following error after copying the tfwordbeamsearch.so file of the CTC repository into the src directory of your repository. Traceback (most recent call last): File "main.py", line 194, in main() File "main.py", line 178, in main decoderType, mustRestore=False) File "/home/niti/Desktop/HTR/HTR/src/Model.py", line 36, in init self.setupCTC() File "/home/niti/Desktop/HTR/HTR/src/Model.py", line 177, in setupCTC chars = codecs.open(FilePaths.wordCharList.txt, 'r').read() AttributeError: type object 'FilePaths' has no attribute 'wordCharList'

Also is there any way I can recognise the characters like ./,!=+- etc instead of the word characters only using CTC? Also, can this be trained on Google colab

opened by NitiKaur 2
Data Augmentation

Hey @lamhoangtung , I am making a doctor prescription recogniser. I am using the @githubharald SimpleHTR along with some increased preprocessing steps. The images are word-images and I am currently getting 68 % accuracy on the dataset that I have. I have come through a paper that says that data augmentation will help paperlink, do you have any info if it would be useful or some tips what can be done.

P.S I am currently re-training the model with 7 Layered Cnn just like you on IAM and RIMES Dataset firstly and then fine-tuning it on the doctor dataset. Also, I have preprocessed the word images just like @githubharald told.

opened by jalotra 2
About accuracy in a sentences ?

I used your code which you talked achieved is 0.84 (CER). But when I run yes CER is 0.81 ~ 0.84 but the sentence error rate is very low and appears overfitting on your model? You can expand my problem ??? Thank you for reading Thai Hoc

opened by NguyenThaiHoc1 2
hey, i'm unable to run the main.py file without errors. i'm uploading the snap of my error , please help me out from this

error : while running the main.py file

Traceback (most recent call last): File "main.py", line 173, in main() File "main.py", line 169, in main infer(model, FilePaths.fnInfer) File "main.py", line 126, in infer img = preprocessor(fnImg, model.imgSize, binary=True) File "C:\Users\Lenovo\PycharmProjects\LineHTR\src\helpers.py", line 19, in preprocessor img = np.int16(img) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

opened by PavanKumarReddy-B 0
Sourcery Starbot ⭐ refactored lamhoangtung/LineHTR
Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch https://github.com/sourcery-ai-bot/LineHTR master git merge --ff-only FETCH_HEAD git reset HEAD^
opened by sourcery-ai-bot 0
Evaluation
Hello. Thank for your efficient work. But I've read your readme and create a labels.json which contains 'img_path': 'context' and I used the IAM offline lines dataset for the input data. But I don't know why the result is very bad: This is my systems informations:

Versions

TensorFlow version: 1.x

Python version: 3.7.12

Operating system: Google colab I'm a newbie and in research of this domain, so could you please help me which part I did wrong. Thank you in advance.
opened by AICampB4 1
maxTextLen cannot be increased to be > 100
Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:

How to convert dataset X into IAM format?

How to modify the model to recognize text-lines/more characters/...?

General/theoretical questions regarding (handwritten) text recognition.

If you create a new issue, please provide the following information:

Versions

TensorFlow version 2.4.0

Python version 3.6.9

Operating system Red Hat Enterprise Linux Server 7.9

Issue

Which result/error did you get?

ValueError: Paddings must be non-negative for '{{node Optimizer/gradients/RNN/Slice_grad/Pad}} = Pad[T=DT_FLOAT, Tpaddings=DT_INT32](Optimizer/gradients/RNN/Squeeze_grad/Reshape, Optimizer/gradients/RNN/Slice_grad/concat)' with input shapes: [50,102,1,512], [4,2] and with computed input tensors: input[1] = <[0 0][0 -2][0 3][0 0]>.

If you think the result is wrong - what result did you expect instead?

How to reproduce the issue? set maxTextLen = 102, replace 100 in setupRNN with Model.maxTextLen

Provide all necessary data
opened by marijakotur 0
Getting an issue that it is not correctly output the text
Versions

TensorFlow version=1.14.0

Python version=3.7

Operating system=windows 10

when I am setting up the project and running with sample test image I am getting the output on encoded format ex: Recognized: "zUA" how to solve this issue
opened by KeerthanaSivaraj 1
when custom training set is given, the model is not learning. character error rate, line accuracy, word error rate always 0.000% how can this be solved? ---tf1.8
Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:

How to convert dataset X into IAM format?

How to modify the model to recognize text-lines/more characters/...?

General/theoretical questions regarding (handwritten) text recognition.

If you create a new issue, please provide the following information:

Versions

TensorFlow version

Python version

Operating system

Issue

Which result/error did you get?

If you think the result is wrong - what result did you expect instead?

How to reproduce the issue?

Provide all necessary data
opened by nitindantu 1

Owner

Hoàng Tùng Lâm (Linus)

AI Researcher/Engineer at Techainer

GitHub https://towardsdatascience.com/2326a3487cd5

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

4 May 10, 2022

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

19 Nov 28, 2022

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

306 Dec 2, 2022

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

63 May 22, 2021

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Extract MNIST handwritten digits dataset binary file into bmp images

MNIST-dataset-extractor Extract MNIST handwritten digits dataset binary file into bmp images More info at http://yann.lecun.com/exdb/mnist/ Dependenci

6 May 24, 2021

Recognize Handwritten Digits using Deep Learning on the browser itself.

MNIST on the Web An attempt to predict MNIST handwritten digits from my PyTorch model from the browser (client-side) and not from the server, with the

7 May 28, 2022

A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

1 Oct 30, 2021

A simple Neural Network that predicts the label for a series of handwritten digits

Neural_Network A simple Neural Network that predicts the label for a series of handwritten numbers This program tries to predict the label (1,2,3 etc.

1 Dec 18, 2021

Digitalizing-Prescription-Image - PIRDS - Prescription Image Recognition and Digitalizing System is a OCR make with Tensorflow

Digitalizing-Prescription-Image PIRDS - Prescription Image Recognition and Digit

2 May 11, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

1 Dec 31, 2021

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow. YOLOv4 is a state of the art algorithm that uses deep convolutional neural networks to perform object detections. We can take the output of YOLOv4 feed these object detections into Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) in order to create a highly accurate object tracker.

1.1k Dec 29, 2022

Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Related tags

Overview

Line-level Handwritten Text Recognition with TensorFlow

How to run

Command line arguments

Train model

Information about model

Overview

Improve accuracy

References

Comments

Owner

Hoàng Tùng Lâm (Linus)

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Extract MNIST handwritten digits dataset binary file into bmp images

Recognize Handwritten Digits using Deep Learning on the browser itself.

A script that trains a model to recognize handwritten digits using the MNIST data set.

A simple Neural Network that predicts the label for a series of handwritten digits

Digitalizing-Prescription-Image - PIRDS - Prescription Image Recognition and Digitalizing System is a OCR make with Tensorflow

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

This is a yolo3 implemented via tensorflow 2.7

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,