ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

Overview

VistaOCR

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

Publications

"How to Efficiently Increase Resolution in Neural OCR Models". Stephen Rawls, Huaigu Cao, Joe Mathai, Prem Natarajan. IEEE Workshop on Arabic Script Analysis and Recognition (ASAR) 2018.

"Combining Convolutional Neural Networks and LSTMs for Segmentation Free OCR". Stephen Rawls, Huaigu Cao, Senthil Kumar, Prem Natarajan. International Conference on Document Analysis and Recognition (ICDAR) 2017.

"Combining Deep Learning and Language Modeling for Segmentation-free OCR From Raw Pixels". Stephen Rawls, Huaigu Cao, Ekraam Sabir, Prem Natarajan. IEEE Workshop on Arabic Script Analysis and Recognition (ASAR) 2017.

Model

VistaOCR Model Diagram

Pretrained Models

Coming Soon. Pre-trained models for English, French, and Arabic Handwriting

Performance Numbers

Coming soon. Expected character and word error rates from public datasets.

How to Train

Coming soon.

How to Decode using Existing Model

Coming soon.

Citation

@inproceedings{vistaocr,
  author    = {Stephen Rawls and Huaigu Cao and Senthil Kumar and Prem Natarjan},
  title     = {Combining Convolutional Neural Networks and LSTMs for Segmentation Free OCR},
  booktitle = {Proc. ICDAR},
  year      = {2017},
  url       = {https://doi.org/10.1109/ICDAR.2017.34},
  doi       = {10.1109/ICDAR.2017.34}
}
You might also like...
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

OCR software for recognition of handwritten text
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with annotations. This code downloads the images and verifies the hash to the image to avoid data contamination.
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Python package for handwriting and sketching in Jupyter cells
Python package for handwriting and sketching in Jupyter cells

ipysketch A Python package for handwriting and sketching in Jupyter notebooks. Usage A movie is worth a thousand pictures is worth a million words...

 Convert Text-to Handwriting Using Python
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

This tool will help you convert your text to handwriting xD
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Comments
  • How Train vistaOCR?

    How Train vistaOCR?

    Hello @stephenrawls When i'm trying to train this system with oflline handritten recognition database it returns me the error that : -the the file "desc.json" loaded in this classe ""isi-vista/VistaOCR/blob/master/src/ocr_dataset.py"" with open(os.path.join(data_dir, 'desc.json'), 'r') as fh: self.data_desc = json.load(fh) 1-Is the file "desc.json"" the ground truth of the database used and can you make me please the description of this File and how can implement it?

    2-It is this instruction used to train correct and what is the value we must put to "--snapshot-prefix" 👍 python train_cnn_lstm.py --datadir /root/Downloads/data --num-lstm-layers 3 --num-lstm-units 256 --lstm-input-dim=256 --snapshot-prefix /root/Downloads/VistaOCR-master/src/models

    thank in advance

    opened by Tailor2019 0
Owner
ISI Center for Vision, Image, Speech, and Text Analytics
ISI Center for Vision, Image, Speech, and Text Analytics
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

null 174 Dec 31, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 1, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023