Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Overview

Line-level Handwritten Text Recognition with TensorFlow

poster

This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and can handle a full line of text image. Huge thanks to @Harald Scheidl for his great works.

How to run

Go to the src/ directory and run python main.py with these following arguments

Command line arguments

  • --train: train the NN, details see below.
  • --validate: validate the NN, details see below.
  • --beamsearch: use vanilla beam search decoding (better, but slower) instead of best path decoding.
  • --wordbeamsearch: use word beam search decoding (only outputs words contained in a dictionary) instead of best path decoding. This is a custom TF operation and must be compiled from source, more information see corresponding section below. It should not be used when training the NN.

I don't include any pretrained model in this branch so you will need to train the model on your data first

Train model

I created this model for the Cinnamon AI Marathon 2018 competition, they released a small dataset but it's in Vietnamese, so you guys may want to try some other dataset like [4]IAM for English.

As long as your dataset contain a labels.json file like this:

{
    "img1.jpg": "abc xyz",
    ...
    "imgn.jpg": "def ghi"
}

With eachkey is the path to the images file and each value is the ground truth label for that image, this code will works fine.

Learning is visualized by Tensorboard, I tracked the character error rate, word error rate and sentences accuracy for this model. All logs will be saved in ./logs/ folder. You can start a Tensorboard session to see the logs with this command tensorboard --logdir='./logs/'

It's took me about 48 hours with about 13k images on a single GTX 1060 6GB to get down to 0.16 CER on the private testset of the competition.

Information about model

Overview

The model is a extended version of the Simple HTR system @Harald Scheidl implemented It consists of 7 CNN layers, 2 RNN (Bi-LSTM) layers and the CTC loss and decoding layer and can handle a full line of text image

  • The input image is a gray-value image and has a size of 800x64
  • 7 CNN layers map the input image to a feature sequence of size 100x512
  • 2 LSTM layers with 512 units propagate information through the sequence and map the sequence to a matrix of size 100x205. Each matrix-element represents a score for one of the 205 characters at one of the 100 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)
  • Batch size is set to 50

Highest accuracy achieved is 0.84 on the private testset of the Cinnamon AI Marathon 2018 competition (measure by Charater Error Rate - CER).

Improve accuracy

If you need a better accuracy, here are some ideas how to improve it [2]:

  • Data augmentation: increase dataset-size by applying further (random) transformations to the input images. At the moment, only random distortions are performed.
  • Remove cursive writing style in the input images (see DeslantImg).
  • Increase input size.
  • Add more CNN layers or use transfer learning on CNN.
  • Replace Bi-LSTM by 2D-LSTM.
  • Replace optimizer: Adam improves the accuracy, however, the number of training epochs increases (see discussion).
  • Decoder: use token passing or word beam search decoding [3] (see CTCWordBeamSearch) to constrain the output to dictionary words.
  • Text correction: if the recognized word is not contained in a dictionary, search for the most similar one.

Btw, don't hesitate to ask me anything via a Github Issue (See the issue template file for more details)

BTW, big shout out to Sushant Gautam for extended this code for IAM dataset, he even provide pretrained model and web UI for inferences the model. Don't forget to check his repo out.

References

[1] Build a Handwritten Text Recognition System using TensorFlow

[2] Scheidl - Handwritten Text Recognition in Historical Documents

[3] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition

Comments
  • Keras implementation help

    Keras implementation help

    1. Versions
    • TensorFlow version = 1.15
    • Python version = 3.7
    • Operating system = archlinux
    1. Issue

    Hi i am trying to implement this with keras but i am stuck with LSTM and ctc layers here is my model summary please have a look at it

    model

    opened by shekarneo 12
  • error while using the model with CTC wordbeamsearch

    error while using the model with CTC wordbeamsearch

    tensorflow version 1.8 python version 3.7 ubuntu 18.04 LTS

    I get the following error after copying the tfwordbeamsearch.so file of the CTC repository into the src directory of your repository. Traceback (most recent call last): File "main.py", line 194, in main() File "main.py", line 178, in main decoderType, mustRestore=False) File "/home/niti/Desktop/HTR/HTR/src/Model.py", line 36, in init self.setupCTC() File "/home/niti/Desktop/HTR/HTR/src/Model.py", line 177, in setupCTC chars = codecs.open(FilePaths.wordCharList.txt, 'r').read() AttributeError: type object 'FilePaths' has no attribute 'wordCharList'

    Also is there any way I can recognise the characters like ./,!=+- etc instead of the word characters only using CTC? Also, can this be trained on Google colab

    opened by NitiKaur 2
  • Data Augmentation

    Data Augmentation

    Hey @lamhoangtung , I am making a doctor prescription recogniser. I am using the @githubharald SimpleHTR along with some increased preprocessing steps. The images are word-images and I am currently getting 68 % accuracy on the dataset that I have. I have come through a paper that says that data augmentation will help paperlink, do you have any info if it would be useful or some tips what can be done.

    P.S I am currently re-training the model with 7 Layered Cnn just like you on IAM and RIMES Dataset firstly and then fine-tuning it on the doctor dataset. Also, I have preprocessed the word images just like @githubharald told.

    opened by jalotra 2
  • About accuracy in a sentences ?

    About accuracy in a sentences ?

    I used your code which you talked achieved is 0.84 (CER). But when I run yes CER is 0.81 ~ 0.84 but the sentence error rate is very low and appears overfitting on your model? You can expand my problem ??? Thank you for reading Thai Hoc

    opened by NguyenThaiHoc1 2
  • hey, i'm unable to run the main.py file without errors. i'm uploading the snap of my error , please help me out from this

    hey, i'm unable to run the main.py file without errors. i'm uploading the snap of my error , please help me out from this

    error : while running the main.py file

    Traceback (most recent call last): File "main.py", line 173, in main() File "main.py", line 169, in main infer(model, FilePaths.fnInfer) File "main.py", line 126, in infer img = preprocessor(fnImg, model.imgSize, binary=True) File "C:\Users\Lenovo\PycharmProjects\LineHTR\src\helpers.py", line 19, in preprocessor img = np.int16(img) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

    opened by PavanKumarReddy-B 0
  • Sourcery Starbot ⭐ refactored lamhoangtung/LineHTR

    Sourcery Starbot ⭐ refactored lamhoangtung/LineHTR

    Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

    Here's your pull request refactoring your most popular Python repo.

    If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

    Review changes via command line

    To manually merge these changes, make sure you're on the master branch, then run:

    git fetch https://github.com/sourcery-ai-bot/LineHTR master
    git merge --ff-only FETCH_HEAD
    git reset HEAD^
    
    opened by sourcery-ai-bot 0
  • Evaluation

    Evaluation

    Hello. Thank for your efficient work. But I've read your readme and create a labels.json which contains 'img_path': 'context' and I used the IAM offline lines dataset for the input data. But I don't know why the result is very bad: image This is my systems informations:

    1. Versions
    • TensorFlow version: 1.x
    • Python version: 3.7.12
    • Operating system: Google colab I'm a newbie and in research of this domain, so could you please help me which part I did wrong. Thank you in advance.
    opened by AICampB4 1
  • maxTextLen cannot be increased to be > 100

    maxTextLen cannot be increased to be > 100

    Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:

    • How to convert dataset X into IAM format?
    • How to modify the model to recognize text-lines/more characters/...?
    • General/theoretical questions regarding (handwritten) text recognition.

    If you create a new issue, please provide the following information:

    1. Versions
    • TensorFlow version 2.4.0
    • Python version 3.6.9
    • Operating system Red Hat Enterprise Linux Server 7.9
    1. Issue
    • Which result/error did you get?
    • ValueError: Paddings must be non-negative for '{{node Optimizer/gradients/RNN/Slice_grad/Pad}} = Pad[T=DT_FLOAT, Tpaddings=DT_INT32](Optimizer/gradients/RNN/Squeeze_grad/Reshape, Optimizer/gradients/RNN/Slice_grad/concat)' with input shapes: [50,102,1,512], [4,2] and with computed input tensors: input[1] = <[0 0][0 -2][0 3][0 0]>.
    • If you think the result is wrong - what result did you expect instead?
    • How to reproduce the issue? set maxTextLen = 102, replace 100 in setupRNN with Model.maxTextLen
    • Provide all necessary data
    opened by marijakotur 0
  • Getting an issue that it is not correctly output the text

    Getting an issue that it is not correctly output the text

    Versions

    • TensorFlow version=1.14.0
    • Python version=3.7
    • Operating system=windows 10

    when I am setting up the project and running with sample test image I am getting the output on encoded format ex: Recognized: "zUA" how to solve this issue

    opened by KeerthanaSivaraj 1
  • when custom training set is given, the model is not learning. character error rate, line accuracy, word error rate always 0.000% how can this be solved? ---tf1.8

    when custom training set is given, the model is not learning. character error rate, line accuracy, word error rate always 0.000% how can this be solved? ---tf1.8

    Please have a look at the FAQ section in the README - maybe your question is already answered there. Only issues concerning the repositories code will be answered. The following questions will not be answered:

    • How to convert dataset X into IAM format?
    • How to modify the model to recognize text-lines/more characters/...?
    • General/theoretical questions regarding (handwritten) text recognition.

    If you create a new issue, please provide the following information:

    1. Versions
    • TensorFlow version
    • Python version
    • Operating system
    1. Issue
    • Which result/error did you get?
    • If you think the result is wrong - what result did you expect instead?
    • How to reproduce the issue?
    • Provide all necessary data
    opened by nitindantu 1
Owner
Hoàng Tùng Lâm (Linus)
AI Researcher/Engineer at Techainer
Hoàng Tùng Lâm (Linus)
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

Yan Shu 19 Nov 28, 2022
Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

Alan Patterson 306 Dec 2, 2022
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

Kilian Batzner 63 May 22, 2021
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
Extract MNIST handwritten digits dataset binary file into bmp images

MNIST-dataset-extractor Extract MNIST handwritten digits dataset binary file into bmp images More info at http://yann.lecun.com/exdb/mnist/ Dependenci

Omar Mostafa 6 May 24, 2021
Recognize Handwritten Digits using Deep Learning on the browser itself.

MNIST on the Web An attempt to predict MNIST handwritten digits from my PyTorch model from the browser (client-side) and not from the server, with the

Harjyot Bagga 7 May 28, 2022
A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

Hamza Sayih 1 Oct 30, 2021
A simple Neural Network that predicts the label for a series of handwritten digits

Neural_Network A simple Neural Network that predicts the label for a series of handwritten numbers This program tries to predict the label (1,2,3 etc.

Ty 1 Dec 18, 2021
Akshat Surolia 2 May 11, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Face-Recognition-based-Attendance-System A real time implementation of Attendance System in python. Pre-requisites To understand the implentation of F

Muhammad Zain Ul Haque 1 Dec 31, 2021
Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow. YOLOv4 is a state of the art algorithm that uses deep convolutional neural networks to perform object detections. We can take the output of YOLOv4 feed these object detections into Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) in order to create a highly accurate object tracker.

The AI Guy 1.1k Dec 29, 2022
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Miles Zhang 54 Dec 21, 2022
This is a yolo3 implemented via tensorflow 2.7

YoloV3 - an object detection algorithm implemented via TF 2.x source code In this article I assume you've already familiar with basic computer vision

null 2 Jan 17, 2022
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

null 291 Jan 2, 2023