This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Amazon Web Services - Labs

Last update: Jan 3, 2023

Related tags

Computer Vision handwritten-text-recognition-for-apache-mxnet

Overview

Handwritten Text Recognition (OCR) with MXNet Gluon

These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientist @ Amazon AI, in collaboration with Thomas Delteil who built the original prototype.

Setup

git clone https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet --recursive

You need to install SCLITE for WER evaluation You can follow the following bash script from this folder:

cd ..
git clone https://github.com/usnistgov/SCTK
cd SCTK
export CXXFLAGS="-std=c++11" && make config
make all
make check
make install
make doc
cd -

You also need hsnwlib

pip install pybind11 numpy setuptools
cd ..
git clone https://github.com/nmslib/hnswlib
cd hnswlib/python_bindings
python setup.py install
cd ../..

if "AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments" occurs rename credentials.json.example and to credentials.json with your username and password.

Overview

The pipeline is composed of 3 steps:

Detecting the handwritten area in a form [blog post], [jupyter notebook], [python script]
Detecting lines of handwritten texts [blog post], [jupyter notebook], [python script]
Recognising characters and applying a language model to correct errors. [blog post], [jupyter notebook], [python script]

The entire inference pipeline can be found in this notebook. See the pretrained models section for the pretrained models.

A recorded talk detailing the approach is available on youtube. [video]

The corresponding slides are available on slideshare. [slides]

Pretrained models:

You can get the models by running python get_models.py:

Sample results

The greedy, lexicon search, and beam search outputs present similar and reasonable predictions for the selected examples. In Figure 6, interesting examples are presented. The first line of Figure 6 show cases where the lexicon search algorithm provided fixes that corrected the words. In the top example, “tovely” (as it was written) was corrected “lovely” and “woved” was corrected to “waved”. In addition, the beam search output corrected “a” into “all”, however it missed a space between “lovely” and “things”. In the second example, “selt” was converted to “salt” with the lexicon search output. However, “selt” was erroneously converted to “self” in the beam search output. Therefore, in this example, beam search performed worse. In the third example, none of the three methods significantly provided comprehensible results. Finally, in the forth example, the lexicon search algorithm incorrectly converted “forhim” into “forum”, however the beam search algorithm correctly identified “for him”.

Dataset:

To use test_iam_dataset.ipynb, create credentials.json using credentials.json.example and editing the appropriate field. The username and password can be obtained from http://www.fki.inf.unibe.ch/DBs/iamDB/iLogin/index.php.
It is recommended to use an instance with 32GB+ RAM and 100GB disk size, a GPU is also recommended. A p3.2xlarge would be the recommended starter instance on AWS for this project

Appendix

1) Handwritten area

Model architecture

Results

2) Line Detection

Model architecture

Results

3) Handwritten text recognition

Model architecture

Results

Comments

kernal dying

Kernel dying while downloading processing IAM dataset

Kernel Restarting The kernel for projects/DocByte/handwriting_recog/amazon_HWR/handwritten-text-recognition-for-apache-mxnet/0_handwriting_ocr.ipynb appears to have died. It will restart automatically.

test_ds = IAMDataset("form_original", train=False) running this line

opened by dee6600 35
OSError

getting the below error while executing dataset creation step. train_ds = IAMDataset("form", output_data="bb", output_parse_method="form", train=True)

opened by JPremnath06 11
Downloading gbw dataset crashes machine
Below code in 'Denoising text ouptut' section of '0_handwriting_ocr.ipynb' file, when run on a machine with 35.35GB RAM and 107.77GB disk space(google colab TPU Session) crashes system for unknown reason.

ctx_nlp = mx.gpu(3) language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp) moses_tokenizer = nlp.data.SacreMosesTokenizer() moses_detokenizer = nlp.data.SacreMosesDetokenizer()

How do i download this dataset without crashing the machine? And also, I don't want to download it from next time, so can I save this dataset too?
opened by naveen-marthala 11

AssertionError: Shape of params are incompatible

Hi again,

I am a bit confused about this error, happening in the 4_text_denoising notebook. I just did every step from before but something does not fit with the dimensions. Can you explain why this is happening?


AssertionErrorTraceback (most recent call last)
<ipython-input-31-2a0e848a57c4> in <module>
      1 model_path = 'models/denoiser2.params'
      2 if (os.path.isfile(model_path)):
----> 3     net.load_parameters(model_path, ctx=ctx)
      4     print("Loaded parameters")
      5     best_test_loss = evaluate(net, val_data_ft)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/block.py in load_parameters(self, filename, ctx, allow_missing, ignore_extra, cast_dtype, dtype_source)
    553                         name, filename, _brief_print_list(self._params.keys())))
    554             if name in params:
--> 555                 params[name]._load_init(loaded[name], ctx, cast_dtype=cast_dtype, dtype_source=dtype_source)
    556 
    557     def load_params(self, filename, ctx=None, allow_missing=False,

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/parameter.py in _load_init(self, data, ctx, cast_dtype, dtype_source)
    280                     "Failed loading Parameter '%s' from saved params: " \
    281                     "shape incompatible expected %s vs saved %s"%(
--> 282                         self.name, str(self.shape), str(data.shape))
    283             self.shape = tuple(i if i != unknown_dim_size else j
    284                                for i, j in zip(self.shape, data.shape))

AssertionError: Failed loading Parameter 'transformer_enc_const' from saved params: shape incompatible expected (150, 512) vs saved (150, 256)

opened by jbuehler1337 7

Train on IAMDataset "Word" Crashes the code
The 3_handwriting_recognition.py works fine with IAMDataset("line", output_data="text", train=True) but crashes when using the word IAMDataset. Specifically, doing this crashes the code.

train_ds = IAMDataset("word", output_data="text", train=True) print("Number of training samples: {}".format(len(train_ds))) test_ds = IAMDataset("word", output_data="text", train=False) print("Number of testing samples: {}".format(len(test_ds)))

Gives: mxnet.base.MXNetError: Shape inconsistent, Provided = [13320192], inferred shape=(8863744,)
opened by srikar2097 7
How to compute labelling probability after prediction?

Hi,

Thanks a lot for this wonderful piece of work.

I am trying to calculate CTC loss to compute labeling probability after prediction. Please guide if it is possible

Thanks

opened by shrinidt 7
Number of samples equals zero
Hi @jonomon The number of training and testing samples comes out to be zero on running the following code in 2_line_word_segmentation.ipynb -

train_ds = IAMDataset("form_bb", output_data="bb", output_parse_method=detection_box, train=True) print("Number of training samples: {}".format(len(train_ds)))

Im working on colab, have included the extracted as well as the .tgz files in dataset/iamdataset. I can't seem to figure out the issue here, could you point me in the right direction?

Thanks, Sambbhav
opened by sambbhavgarg 6
How to make predictions from pre-trained models?

Good work, thanks. I am using pre-trained models to get text from images. When I was going through the codes on how to do it, I learned that my test images's format has to match the format 'IAMDastaset' class in 'ocr.utils.iam_dataset' outputs. So, How do I modify 'IAMDastaset' class in 'ocr.utils.iam_dataset' to change an input image to match test dataframe format that this class outputs or How do I get dataframe for images other than the ones in IAM dataset. I couldn't understand this class completely. So, if anyone worked on this, please help me solve this.

opened by naveen-marthala 6
$MXNetError: [17:38:24] C:\Jenkins\workspace\mxnet-tag\mxnet\3rdparty\dmlc-core\src\io\local_filesys.cc:209: Check failed:$

MXNetError: [17:38:24] C:\Jenkins\workspace\mxnet-tag\mxnet\3rdparty\dmlc-core\src\io\local_filesys.cc:209: Check failed:

When executing the code in the handwriting notebook I get an error in the segmentation, which I show below: LocalFileSystem :: Open "models / paragraph_segmentation2.params": No such file or directory

however, when I check the address I have the file resnet34_v1-48216ba9.params but in the same way I get the same error

could someone help me understand this mistake?

opened by LauraOrozco 6
regions of text not being detected properly

Here is my input image in color: 1. The detection that happended from pre-trained models can be seen below, when the above image is converted from 0. RGB to grayscale and 1. BGR to grayscale (happended at paragraph segmentation from '0_handwriting_ocr.ipynb' notebook & I have made no changes in the code from that notebook): 2. Because of this improper region detection, areas that actually has text are being cropped out. And, when i try form size other than what is in the code(because my images have smaller aspect ratio), that is form_size = (1120, 800), computer crashes. What is causing this and how can i not have this happen? 3. Presumably, because of the above improper detection or may be becuase line/word segmentaion not happening properly, here's the word segmentation: and here's the line segmentation: How do i fix these?

opened by naveen-marthala 3

ConnectionResetError at word segmentation

Hi, I already mentioned my problem but I didin't find an issue describing what I experience at the moment. When I start the 2_line_word_segmentation.ipynb I get the following error:

ConnectionResetErrorTraceback (most recent call last)
<ipython-input-13-fbd64d2ad138> in <module>
      3     cls_metric = mx.metric.Accuracy()
      4     box_metric = mx.metric.MAE()
----> 5     train_loss = run_epoch(e, net, train_data, trainer, log_dir, print_name="train", is_train=True, update_metric=False)
      6     test_loss = run_epoch(e, net, test_data, trainer, log_dir, print_name="test", is_train=False, update_metric=True)
      7     if test_loss < best_test_loss:

<ipython-input-6-6b90c6f2ae19> in run_epoch(e, network, dataloader, trainer, log_dir, print_name, is_train, update_metric)
     32 
     33     total_losses = [0 for ctx_i in ctx]
---> 34     for i, (X, Y) in enumerate(dataloader):
     35         X = gluon.utils.split_and_load(X, ctx)
     36         Y = gluon.utils.split_and_load(Y, ctx)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in __next__(self)
    503         try:
    504             if self._dataset is None:
--> 505                 batch = pickle.loads(ret.get(self._timeout))
    506             else:
    507                 batch = ret.get(self._timeout)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in rebuild_ndarray(pid, fd, shape, dtype)
     59             fd = multiprocessing.reduction.rebuild_handle(fd)
     60         else:
---> 61             fd = fd.detach()
     62         return nd.NDArray(nd.ndarray._new_from_shared_mem(pid, fd, shape, dtype))
     63 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
     55         def detach(self):
     56             '''Get the fd.  This should only be called once.'''
---> 57             with _resource_sharer.get_connection(self._id) as conn:
     58                 return reduction.recv_handle(conn)
     59 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in get_connection(ident)
     85         from .connection import Client
     86         address, key = ident
---> 87         c = Client(address, authkey=process.current_process().authkey)
     88         c.send((key, os.getpid()))
     89         return c

/usr/lib/python3.6/multiprocessing/connection.py in Client(address, family, authkey)
    491 
    492     if authkey is not None:
--> 493         answer_challenge(c, authkey)
    494         deliver_challenge(c, authkey)
    495 

/usr/lib/python3.6/multiprocessing/connection.py in answer_challenge(connection, authkey)
    730     import hmac
    731     assert isinstance(authkey, bytes)
--> 732     message = connection.recv_bytes(256)         # reject large message
    733     assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
    734     message = message[len(CHALLENGE):]

/usr/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

/usr/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

/usr/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer

I am using a Docker Image on a Linux system. Can you help me to get the notebook to run?

opened by jbuehler1337 3

Bump protobuf from 3.8.0 to 3.18.3
Bumps protobuf from 3.8.0 to 3.18.3.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.18.3

C++

Reduce memory consumption of MessageSet parsing

This release addresses a Security Advisory for C++ and Python users

Protocol Buffers v3.16.1

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.2

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.1

Python

Update setup.py to reflect that we now require at least Python 3.5 (#8989)

Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

Ruby

Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

Protocol Buffers v3.18.0

C++

Fix warnings raised by clang 11 (#8664)

Make StringPiece constructible from std::string_view (#8707)

Add missing capability attributes for LLVM 12 (#8714)

Stop using std::iterator (deprecated in C++17). (#8741)

Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)

Fix #7047 Safely handle setlocale (#8735)

Remove deprecated version of SetTotalBytesLimit() (#8794)

Support arena allocation of google::protobuf::AnyMetadata (#8758)

Fix undefined symbol error around SharedCtor() (#8827)

Fix default value of enum(int) in json_util with proto2 (#8835)

Better Smaller ByteSizeLong

Introduce event filters for inject_field_listener_events

Reduce memory usage of DescriptorPool

For lazy fields copy serialized form when allowed.

Re-introduce the InlinedStringField class

v2 access listener

Reduce padding in the proto's ExtensionRegistry map.

GetExtension performance optimizations

Make tracker a static variable rather than call static functions

Support extensions in field access listener

Annotate MergeFrom for field access listener

Fix incomplete types for field access listener

Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.

Reduce binary size due to fieldless proto messages

TextFormat: ParseInfoTree supports getting field end location in addition to start.

... (truncated)

Commits

a902b39 No-op whitespace change

ae62acd Updating version.json and repo version numbers to: 18.3

f43ac49 Merge pull request #10542 from deannagarcia/3.18.x

9efdf55 Add missing includes

d1635e1 Apply patch

5b37c91 Update version.json with "lts": true (#10534)

c39d622 Merge pull request #10529 from protocolbuffers/deannagarcia-patch-5

f77d3b6 Update version.json

8178b06 Merge pull request #10503 from deannagarcia/3.18.x

24ca839 Add version file

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Unable to open module ocr

While I want to pip install ocr it is not working for me. What python version this was used for? How do I install the module ocr in Jupyter notebook. I tried pip install it doesn't work

opened by abhijitrath 0
Cannot connect to database

We are unable to connect to IAM database , we went through given site , sign in but it says we have to write username and pass in jason file , we tried that too. If someone can help please reply

opened by Yash31012000 8
Wrong path to scripts
In README.MD links to Python scripts for each step are broken, i.e.

for first step link is to https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/scripts/paragraph_segmentation_dcnn.py but should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/paragraph_segmentation_dcnn.py

for second step it should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/word_and_line_segmentation.py

for third step it should be https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet/blob/master/ocr/handwriting_line_recognition.py
opened by Nikita-T86 0
ReadError: not a gzip file

I was trying to get the 4 testing images following the notebook, but when downloading the IAM images it says "ReadError: not a gzip file. Could anyone kindly help here ?

opened by TRokieG 1

Owner

Amazon Web Services - Labs

AWS Labs

GitHub

Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

346 Jan 7, 2023

IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.

IMGUR5K Handwriting Dataset To run the code for downloading the urls and generate corresponding annotations : Usage: python download_imgur5k.py --data

213 Dec 26, 2022

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics

21 Dec 8, 2021

Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. ?? Try our new DEMO for online baseli

180 Nov 24, 2022

PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

46 Nov 14, 2022

MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

99 Nov 1, 2022

This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

4.2k Jan 7, 2023

Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

SCUT-CTW1500 Datasets We have updated annotations for both train and test set. Train: 1000 images [images][annos] Additional point annotation for each

600 Dec 18, 2022

Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

16 Jan 5, 2023

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

572 Jan 5, 2023

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

243 Dec 30, 2022

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

1 Jan 8, 2022

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Related tags

Overview

Handwritten Text Recognition (OCR) with MXNet Gluon

Setup

Overview

Pretrained models:

Sample results

Dataset:

Appendix

1) Handwritten area

Model architecture

Results

2) Line Detection

Model architecture

Results

3) Handwritten text recognition

Model architecture

Results

Comments

Protocol Buffers v3.18.3

C++

Protocol Buffers v3.16.1

Java

Protocol Buffers v3.18.2

Java

Protocol Buffers v3.18.1

Python

Ruby

Protocol Buffers v3.18.0

C++

Owner

Amazon Web Services - Labs

Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

Page to PAGE Layout Analysis Tool

PAGE XML format collection for document image page content and more

MXNet OCR implementation. Including text recognition and detection.

This tool will help you convert your text to handwriting xD

Convert Text-to Handwriting Using Python

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

Python package for handwriting and sketching in Jupyter cells

End-to-end pipeline for real-time scene text detection and recognition.

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

Deep learning based page layout analysis

ocroseg - This is a deep learning model for page layout analysis / segmentation.

a deep learning model for page layout analysis / segmentation.

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Text recognition (optical character recognition) with deep learning methods.

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.