This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Overview

Handwritten Text Recognition (OCR) with MXNet Gluon

These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientist @ Amazon AI, in collaboration with Thomas Delteil who built the original prototype.

Setup

git clone https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet --recursive

You need to install SCLITE for WER evaluation You can follow the following bash script from this folder:

cd ..
git clone https://github.com/usnistgov/SCTK
cd SCTK
export CXXFLAGS="-std=c++11" && make config
make all
make check
make install
make doc
cd -

You also need hsnwlib

pip install pybind11 numpy setuptools
cd ..
git clone https://github.com/nmslib/hnswlib
cd hnswlib/python_bindings
python setup.py install
cd ../..

if "AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments" occurs rename credentials.json.example and to credentials.json with your username and password.

Overview

The pipeline is composed of 3 steps:

The entire inference pipeline can be found in this notebook. See the pretrained models section for the pretrained models.

A recorded talk detailing the approach is available on youtube. [video]

The corresponding slides are available on slideshare. [slides]

Pretrained models:

You can get the models by running python get_models.py:

Sample results

The greedy, lexicon search, and beam search outputs present similar and reasonable predictions for the selected examples. In Figure 6, interesting examples are presented. The first line of Figure 6 show cases where the lexicon search algorithm provided fixes that corrected the words. In the top example, “tovely” (as it was written) was corrected “lovely” and “woved” was corrected to “waved”. In addition, the beam search output corrected “a” into “all”, however it missed a space between “lovely” and “things”. In the second example, “selt” was converted to “salt” with the lexicon search output. However, “selt” was erroneously converted to “self” in the beam search output. Therefore, in this example, beam search performed worse. In the third example, none of the three methods significantly provided comprehensible results. Finally, in the forth example, the lexicon search algorithm incorrectly converted “forhim” into “forum”, however the beam search algorithm correctly identified “for him”.

Dataset:

  • To use test_iam_dataset.ipynb, create credentials.json using credentials.json.example and editing the appropriate field. The username and password can be obtained from http://www.fki.inf.unibe.ch/DBs/iamDB/iLogin/index.php.

  • It is recommended to use an instance with 32GB+ RAM and 100GB disk size, a GPU is also recommended. A p3.2xlarge would be the recommended starter instance on AWS for this project

Appendix

1) Handwritten area

Model architecture

Results

2) Line Detection

Model architecture

Results

3) Handwritten text recognition

Model architecture

Results

Comments
  • kernal dying

    kernal dying

    Kernel dying while downloading processing IAM dataset

    Kernel Restarting The kernel for projects/DocByte/handwriting_recog/amazon_HWR/handwritten-text-recognition-for-apache-mxnet/0_handwriting_ocr.ipynb appears to have died. It will restart automatically.

    test_ds = IAMDataset("form_original", train=False) running this line

    opened by dee6600 35
  • OSError

    OSError

    getting the below error while executing dataset creation step. train_ds = IAMDataset("form", output_data="bb", output_parse_method="form", train=True)

    opened by JPremnath06 11
  • Downloading gbw dataset crashes machine

    Downloading gbw dataset crashes machine

    Below code in 'Denoising text ouptut' section of '0_handwriting_ocr.ipynb' file, when run on a machine with 35.35GB RAM and 107.77GB disk space(google colab TPU Session) crashes system for unknown reason.

    ctx_nlp = mx.gpu(3)
    language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp)
    moses_tokenizer = nlp.data.SacreMosesTokenizer()
    moses_detokenizer = nlp.data.SacreMosesDetokenizer()
    

    How do i download this dataset without crashing the machine? And also, I don't want to download it from next time, so can I save this dataset too?

    opened by naveen-marthala 11
  • AssertionError: Shape of params are incompatible

    AssertionError: Shape of params are incompatible

    Hi again,

    I am a bit confused about this error, happening in the 4_text_denoising notebook. I just did every step from before but something does not fit with the dimensions. Can you explain why this is happening?

    
    AssertionErrorTraceback (most recent call last)
    <ipython-input-31-2a0e848a57c4> in <module>
          1 model_path = 'models/denoiser2.params'
          2 if (os.path.isfile(model_path)):
    ----> 3     net.load_parameters(model_path, ctx=ctx)
          4     print("Loaded parameters")
          5     best_test_loss = evaluate(net, val_data_ft)
    
    /usr/local/lib/python3.6/dist-packages/mxnet/gluon/block.py in load_parameters(self, filename, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source)
        553                         name, filename, _brief_print_list(self._params.keys())))
        554             if name in params:
    --> 555                 params[name]._load_init(loaded[name], ctx, cast_dtype=cast_dtype, dtype_source=dtype_source)
        556 
        557     def load_params(self, filename, ctx=None, allow_missing=False,
    
    /usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py in _load_init(self, data, ctx, cast_dtype, dtype_source)
        280                     "Failed loading Parameter '%s' from saved params: " \
        281                     "shape incompatible expected %s vs saved %s"%(
    --> 282                         self.name, str(self.shape), str(data.shape))
        283             self.shape = tuple(i if i != unknown_dim_size else j
        284                                for i, j in zip(self.shape, data.shape))
    
    AssertionError: Failed loading Parameter 'transformer_enc_const' from saved params: shape incompatible expected (150, 512) vs saved (150, 256)
    
    
    opened by jbuehler1337 7
  • Train on IAMDataset

    Train on IAMDataset "Word" Crashes the code

    The 3_handwriting_recognition.py works fine with IAMDataset("line", output_data="text", train=True) but crashes when using the word IAMDataset. Specifically, doing this crashes the code.

    train_ds = IAMDataset("word", output_data="text", train=True)
    print("Number of training samples: {}".format(len(train_ds)))
    
    test_ds = IAMDataset("word", output_data="text", train=False)
    print("Number of testing samples: {}".format(len(test_ds)))
    

    Gives: mxnet.base.MXNetError: Shape inconsistent, Provided = [13320192], inferred shape=(8863744,)

    opened by srikar2097 7
  • How to compute labelling probability after prediction?

    How to compute labelling probability after prediction?

    Hi,

    Thanks a lot for this wonderful piece of work.

    I am trying to calculate CTC loss to compute labeling probability after prediction. Please guide if it is possible

    Thanks

    opened by shrinidt 7
  • Number of samples equals zero

    Number of samples equals zero

    Hi @jonomon The number of training and testing samples comes out to be zero on running the following code in 2_line_word_segmentation.ipynb -

    train_ds = IAMDataset("form_bb", output_data="bb", output_parse_method=detection_box, train=True)
    print("Number of training samples: {}".format(len(train_ds)))
    

    Im working on colab, have included the extracted as well as the .tgz files in dataset/iamdataset. I can't seem to figure out the issue here, could you point me in the right direction?

    Thanks, Sambbhav

    opened by sambbhavgarg 6
  • How to make predictions from pre-trained models?

    How to make predictions from pre-trained models?

    Good work, thanks. I am using pre-trained models to get text from images. When I was going through the codes on how to do it, I learned that my test images's format has to match the format 'IAMDastaset' class in 'ocr.utils.iam_dataset' outputs. So, How do I modify 'IAMDastaset' class in 'ocr.utils.iam_dataset' to change an input image to match test dataframe format that this class outputs or How do I get dataframe for images other than the ones in IAM dataset. I couldn't understand this class completely. So, if anyone worked on this, please help me solve this.

    opened by naveen-marthala 6
  • MXNetError: [17:38:24] C:\Jenkins\workspace\mxnet-tag\mxnet\3rdparty\dmlc-core\src\io\local_filesys.cc:209: Check failed:

    MXNetError: [17:38:24] C:\Jenkins\workspace\mxnet-tag\mxnet\3rdparty\dmlc-core\src\io\local_filesys.cc:209: Check failed:

    When executing the code in the handwriting notebook I get an error in the segmentation, which I show below: LocalFileSystem :: Open "models / paragraph_segmentation2.params": No such file or directory

    however, when I check the address I have the file resnet34_v1-48216ba9.params but in the same way I get the same error

    could someone help me understand this mistake?

    opened by LauraOrozco 6
  • regions of text not being detected properly

    regions of text not being detected properly

    Here is my input image in color: new paragraph image 1. The detection that happended from pre-trained models can be seen below, when the above image is converted from 0. RGB to grayscale and 1. BGR to grayscale (happended at paragraph segmentation from '0_handwriting_ocr.ipynb' notebook & I have made no changes in the code from that notebook): image 2. Because of this improper region detection, areas that actually has text are being cropped out. And, when i try form size other than what is in the code(because my images have smaller aspect ratio), that is form_size = (1120, 800), computer crashes. What is causing this and how can i not have this happen? 3. Presumably, because of the above improper detection or may be becuase line/word segmentaion not happening properly, here's the word segmentation: image and here's the line segmentation: image How do i fix these?

    opened by naveen-marthala 3
  • ConnectionResetError at word segmentation

    ConnectionResetError at word segmentation

    Hi, I already mentioned my problem but I didin't find an issue describing what I experience at the moment. When I start the 2_line_word_segmentation.ipynb I get the following error:

    ConnectionResetErrorTraceback (most recent call last)
    <ipython-input-13-fbd64d2ad138> in <module>
          3     cls_metric = mx.metric.Accuracy()
          4     box_metric = mx.metric.MAE()
    ----> 5     train_loss = run_epoch(e, net, train_data, trainer, log_dir, print_name="train", is_train=True, update_metric=False)
          6     test_loss = run_epoch(e, net, test_data, trainer, log_dir, print_name="test", is_train=False, update_metric=True)
          7     if test_loss < best_test_loss:
    
    <ipython-input-6-6b90c6f2ae19> in run_epoch(e, network, dataloader, trainer, log_dir, print_name, is_train, update_metric)
         32 
         33     total_losses = [0 for ctx_i in ctx]
    ---> 34     for i, (X, Y) in enumerate(dataloader):
         35         X = gluon.utils.split_and_load(X, ctx)
         36         Y = gluon.utils.split_and_load(Y, ctx)
    
    /usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in __next__(self)
        503         try:
        504             if self._dataset is None:
    --> 505                 batch = pickle.loads(ret.get(self._timeout))
        506             else:
        507                 batch = ret.get(self._timeout)
    
    /usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in rebuild_ndarray(pid, fd, shape, dtype)
         59             fd = multiprocessing.reduction.rebuild_handle(fd)
         60         else:
    ---> 61             fd = fd.detach()
         62         return nd.NDArray(nd.ndarray._new_from_shared_mem(pid, fd, shape, dtype))
         63 
    
    /usr/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
         55         def detach(self):
         56             '''Get the fd.  This should only be called once.'''
    ---> 57             with _resource_sharer.get_connection(self._id) as conn:
         58                 return reduction.recv_handle(conn)
         59 
    
    /usr/lib/python3.6/multiprocessing/resource_sharer.py in get_connection(ident)
         85         from .connection import Client
         86         address, key = ident
    ---> 87         c = Client(address, authkey=process.current_process().authkey)
         88         c.send((key, os.getpid()))
         89         return c
    
    /usr/lib/python3.6/multiprocessing/connection.py in Client(address, family, authkey)
        491 
        492     if authkey is not None:
    --> 493         answer_challenge(c, authkey)
        494         deliver_challenge(c, authkey)
        495 
    
    /usr/lib/python3.6/multiprocessing/connection.py in answer_challenge(connection, authkey)
        730     import hmac
        731     assert isinstance(authkey, bytes)
    --> 732     message = connection.recv_bytes(256)         # reject large message
        733     assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
        734     message = message[len(CHALLENGE):]
    
    /usr/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
        214         if maxlength is not None and maxlength < 0:
        215             raise ValueError("negative maxlength")
    --> 216         buf = self._recv_bytes(maxlength)
        217         if buf is None:
        218             self._bad_message_length()
    
    /usr/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
        405 
        406     def _recv_bytes(self, maxsize=None):
    --> 407         buf = self._recv(4)
        408         size, = struct.unpack("!i", buf.getvalue())
        409         if maxsize is not None and size > maxsize:
    
    /usr/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
        377         remaining = size
        378         while remaining > 0:
    --> 379             chunk = read(handle, remaining)
        380             n = len(chunk)
        381             if n == 0:
    
    ConnectionResetError: [Errno 104] Connection reset by peer
    

    I am using a Docker Image on a Linux system. Can you help me to get the notebook to run?

    opened by jbuehler1337 3
  • Bump protobuf from 3.8.0 to 3.18.3

    Bump protobuf from 3.8.0 to 3.18.3

    Bumps protobuf from 3.8.0 to 3.18.3.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.18.3

    C++

    Protocol Buffers v3.16.1

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.2

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.1

    Python

    • Update setup.py to reflect that we now require at least Python 3.5 (#8989)
    • Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

    Ruby

    • Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

    Protocol Buffers v3.18.0

    C++

    • Fix warnings raised by clang 11 (#8664)
    • Make StringPiece constructible from std::string_view (#8707)
    • Add missing capability attributes for LLVM 12 (#8714)
    • Stop using std::iterator (deprecated in C++17). (#8741)
    • Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)
    • Fix #7047 Safely handle setlocale (#8735)
    • Remove deprecated version of SetTotalBytesLimit() (#8794)
    • Support arena allocation of google::protobuf::AnyMetadata (#8758)
    • Fix undefined symbol error around SharedCtor() (#8827)
    • Fix default value of enum(int) in json_util with proto2 (#8835)
    • Better Smaller ByteSizeLong
    • Introduce event filters for inject_field_listener_events
    • Reduce memory usage of DescriptorPool
    • For lazy fields copy serialized form when allowed.
    • Re-introduce the InlinedStringField class
    • v2 access listener
    • Reduce padding in the proto's ExtensionRegistry map.
    • GetExtension performance optimizations
    • Make tracker a static variable rather than call static functions
    • Support extensions in field access listener
    • Annotate MergeFrom for field access listener
    • Fix incomplete types for field access listener
    • Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.
    • Reduce binary size due to fieldless proto messages
    • TextFormat: ParseInfoTree supports getting field end location in addition to start.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Unable to open module ocr

    Unable to open module ocr

    While I want to pip install ocr it is not working for me. What python version this was used for? How do I install the module ocr in Jupyter notebook. I tried pip install it doesn't work

    opened by abhijitrath 0
  • Cannot connect to database

    Cannot connect to database

    We are unable to connect to IAM database , we went through given site , sign in but it says we have to write username and pass in jason file , we tried that too. If someone can help please reply

    opened by Yash31012000 8
  • Wrong path to scripts

    Wrong path to scripts

    In README.MD links to Python scripts for each step are broken, i.e.

    • for first step link is to https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/scripts/paragraph_segmentation_dcnn.py but should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/paragraph_segmentation_dcnn.py
    • for second step it should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/word_and_line_segmentation.py
    • for third step it should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/handwriting_line_recognition.py
    opened by Nikita-T86 0
  • ReadError: not a gzip file

    ReadError: not a gzip file

    image

    I was trying to get the 4 testing images following the notebook, but when downloading the IAM images it says "ReadError: not a gzip file. Could anyone kindly help here ?

    opened by TRokieG 1
Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 7, 2023
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 8, 2021
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. ?? Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 1, 2022
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 7, 2023
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

null 8 Nov 19, 2022
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

Yuliang Liu 600 Dec 18, 2022
Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

Matthias Baer 16 Jan 5, 2023
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 4, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 5, 2023
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 6, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

null 99 Dec 12, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 8, 2022