Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Christian Bartz

Last update: Jan 5, 2023

Related tags

Computer Vision deep-learning mxnet end-to-end text-recognition semi-supervised-learning convolutional-neural-networks

Overview

STN-OCR: A single Neural Network for Text Detection and Text Recognition

This repository contains the code for the paper: STN-OCR: A single Neural Network for Text Detection and Text Recognition

Please note that we refined our approach and released new source code. You can find the code here

Please use the new code, if you want to experiment with FSNS like data and our approach. It should also be easy to redo the text recognition experiments with the new code, although we did not release any code for that.

Structure of the repository

The folder datasets contains code related to datasets used in the paper. datasets/svhn contains several scripts that can be used to create svhn based ground truth files as used in our experiments reported in section 4.2., please see the readme in this folder on how to use the scripts. datasets/fsns contains scripts that can be used to first download the fsns dataset, second extract the images from the downloaded files and third restructure the contained gt files.

The folder mxnet contains all code used for training our networks.

Installation

In order to use the code you will need the following software environment:

Install python3 (the code might work with python2, too, but this is untested)
it might be a good idea to use a virtualenv
install all requirements with pip install -r requirements.txt
clone and install warp-ctc from here
go into the folder mxnet/metrics/ctc and run python setup.py build_ext --inplace
clone the mxnet repository
checkout the tag v0.9.3
add the warpctc plugin to the project by enabling it in the file config.mk
compile mxnet
install the python bindings of mxnet
You should be ready to go!

Training

You can use this code to train models for three different tasks.

SVHN House Number Recognition

The file train_svhn.py is the entry point for training a network using our purpose build svhn datasets. The file as such is ready to train a network capable of finding a single house number placed randomly on an image.

Example:

In order to do this, you need to follow these steps:

Download the datasets
Locate the folder generated/centered
open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)
make sure to prepare your environment as described in installation
start the training by issuing the following command:

python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5 --zoom 0.5 --char-map datasets/svhn/svhn_char_map.json
Wait and enjoy.

If you want to do experiments on more challenging images you might need to update some parts of the code in train_svhn.py. The parts you might want to update are located around line 40 in this file. Here you can change the max. number of house numbers in the image (num_timesteps), the maximum number of characters per house number (labels_per_timestep), the number of rnn layers to use for predicting the localization num_rnn_layers and whether to use a blstm for predicting the localization or not use_blstm.

A quite more challenging dataset is contained in the folder medium_two_digits, or medium in the datasets folder. Example:

If you want to follow our experiments with svhn numbers placed in a regular grid you'll need to do the following:

Download the datasets
Locate the folder generated/easy
open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)
set num_timesteps and labels_per_timestep to 4 in train_svhn.py
start the training using the following command: python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5
If you are lucky it will work ;)

Text Recognition

Following our text recognition experiments might be a little difficult, because we can not offer the entire dataset used by us. But it is possible to perform the experiments based on the Synth-90k dataset provided by Jaderberg et al. here. After downloading and extracting this file you'll need to adapt the groundtruth file provided with this dataset to fit to the format used by our code. Our format is quite easy. You need to create a csv file with tabular separated values. The first column is the absolute path to the image and the rest of the line are the labels corresponding to this image.

To train the network you can use the train_text_recognition.py script. You can start this script in a similar manner to the train_svhn.py script.

FSNS

In order to redo our experiments on the FSNS dataset you need to perform the following steps:

Download the fsns dataset using the download_fsns.py script located in datasets/fsns
Extract the individual images using the tfrecord_to_image.py script located in datasets/fsns/tfrecord_utils (you will need to install tensorflow for doing that)
Use the transform_gt.py script to transform the original fsns groundtruth, which is based on a single line to a groundtruth containing labels for each word individually. A possible usage of the transform_gt.py script could look like this:

python transform_gt.py <path to original gt> datasets/fsns/fsns_char_map.json <path to gt that shall be generated>
Because MXNet expects the blank label to be 0 for the training with CTC Loss, you have to use the swap_classes.py script in datasets/fsns and swap the class for space and blank in the gt, by issuing:

python swap_classes.py <original gt> <swapped gt> 0 133
After performing these steps you should be able to run the training by issuing:

python train_fsns.py <path to generated train gt> <path to generated validation gt> --char-map datases/fsns/fsns_char_map.json --blank-label 0

Observing the Training Progress

We've added a nice script that makes it possible to see how well the network performs at every step of the training. This progress is normally plotted to disk for each iteration and can later on be used to create animations of the train progress (you can use the create_gif.py and create_video.py scripts located in mxnet/utils for this purpose). Besides this normal plotting to disk it is also possible to directly see this progress while the training is running. In order to see this you have to do the following:

start the show_progress.py script in mxnet/utils
start the training with the following additional command line params:

--send-bboxes --ip <localhost, or remote ip if you are working on a remote machine> --port <the port the show_progress.py script is running on (default is 1337)
enjoy!

This tool is especially helpful in determining whether the network is learning anything or not. We recommend that you always use this tool while training.

Evaluation

If you want to evaluate already trained models you can use the evaluation scripts provided in the mxnet folder. For evaluating a model you need to do the following:

train or download a model
choose the correct evaluation script an adapt it, if necessary (take care in case you are fiddling around with the amount of timesteps and number of RNN layers)
Get the dataset you want to evaluate the model on and adapt the groundtruth file to fit the format expected by our software. The format expected by our software is defined as a csv (tab separated) file that looks like that: <absolute path to image> \t <numerical labels each label separated from the other by \t>
run the chosen evaluation script like so

python eval_<type>_model.py <path to model dir>/<prefix of model file> <number of epoch to test> <path to evaluation gt> <path to char map>

You can use eval_svhn_model.py for evaluating a model trained with CTC on the original svhn dataset, the eval_text_recognition_model.py script for evaluating a model trained for text recognition, and the eval_fsns_model.py for evaluating a model trained on the FSNS dataset.

License

This Code is licensed under the GPLv3 license. Please see further details in LICENSE.md.

Citation

If you are using this Code please cite the following publication:

@article{bartz2017stn,
  title={STN-OCR: A single Neural Network for Text Detection and Text Recognition},
  author={Bartz, Christian and Yang, Haojin and Meinel, Christoph},
  journal={arXiv preprint arXiv:1707.08831},
  year={2017}
}

A short note on code quality

The code contains a huge amount of workarounds around MXNet, as we were not able to find any easier way to do what we wanted to do. If you know a better way, pease let us know, as we would like to have code that is better understandable, as now.

Comments

Shape error eval_svhn_model.py for SVHN demos.

Hi, I was trying to run your demos but I only make it to work for the original_svhn model, I also tried to train one by myself but at the end it raises the same size error.

When I do:

python eval_svhn_model.py ../datasets/svhn/models/original_svhn/models/model 40 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json

Works perfect.

However, when I try:

python eval_svhn_model.py ../datasets/svhn/models/regular_grid/model 19 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json

It raises the following error, I have tried to pass a different --input-width and --input-height but it seems that the problem is not there.

[16:54:45] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[16:54:45] /home/sorelyss/Documents/test/incubator-mxnet/dmlc-core/include/dmlc/./logging.h:300: [16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)

Stack trace returned 25 entries:
[bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]
[bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]
[bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]
[bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]
[bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]
[bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]
[bt] (6) python(PyObject_Call+0x47) [0x5c1797]
[bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]
[bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
[bt] (9) python() [0x5406df]
[bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]
[bt] (11) python() [0x5406df]
[bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
[bt] (13) python() [0x540199]
[bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
[bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
[bt] (16) python() [0x540199]
[bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]
[bt] (18) python() [0x60c272]
[bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]
[bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]
[bt] (21) python(Py_Main+0x456) [0x63fb26]
[bt] (22) python(main+0xe1) [0x4cfeb1]
[bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]
[bt] (24) python(_start+0x29) [0x5d6049]

Traceback (most recent call last):
  File "eval_svhn_model.py", line 109, in <module>
    model = get_model(args, data_shape, output_size)
  File "eval_svhn_model.py", line 58, in get_model
    model.set_params(arg_params, aux_params)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/base_module.py", line 557, in set_params
    allow_missing=allow_missing, force_init=force_init)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 261, in init_params
    _impl(name, arr, arg_params)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 251, in _impl
    cache_arr.copyto(arr)
  File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/ndarray.py", line 556, in copyto
    return _internal._copyto(self, out=other)
  File "mxnet/cython/ndarray.pyx", line 167, in ndarray._make_ndarray_function.generic_ndarray_function
  File "mxnet/cython/./base.pyi", line 36, in ndarray.CALL
mxnet.base.MXNetError: b'[16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)\n\nStack trace returned 25 entries:\n[bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]\n[bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]\n[bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]\n[bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]\n[bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]\n[bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]\n[bt] (6) python(PyObject_Call+0x47) [0x5c1797]\n[bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]\n[bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (9) python() [0x5406df]\n[bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]\n[bt] (11) python() [0x5406df]\n[bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (13) python() [0x540199]\n[bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (16) python() [0x540199]\n[bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]\n[bt] (18) python() [0x60c272]\n[bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]\n[bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]\n[bt] (21) python(Py_Main+0x456) [0x63fb26]\n[bt] (22) python(main+0xe1) [0x4cfeb1]\n[bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]\n[bt] (24) python(_start+0x29) [0x5d6049]\n'

opened by sorelyss 8

Cannot train on fsns data set

Hi Bartzi: i try to use train_fsns.py to train on fsns data set. i get the following error messages: if i use args, --eval-image

i don't use args, --eval-image. i just load train_file and val_file

my development's environment: ubuntu 16.04 cuda 8.0 cudnn 5.0 mxnet 0.9.3

Thank you very much.

opened by Jia-HongHenryLee 6
Error in running eval_text_recognition.py
Hi, I'm using the text recognition pretrained model downloaded from the website. I'm getting the following error in running this script. Any idea how to solve this?

python eval_text_recognition_model.py model-0002.params 10000 original_gt.txt model-symbol.json

Traceback (most recent call last): File "eval_text_recognition_model.py", line 97, in <module> reverse_char_map = {v: k for k, v in char_map.items()} File "eval_text_recognition_model.py", line 97, in <dictcomp> reverse_char_map = {v: k for k, v in char_map.items()} TypeError: unhashable type: 'list'

Also, how many epochs I should set for the best results? I couldn't find it in the paper.

Thanks
opened by vermaarjun7 6
2 details questions

@Bartzi Sorry to bother you again, I have another 2 questions. first is still about the N, because different training images may have different length of words or characters, so will N change during trainning? When I saw the source code, I found that N was set by num_time_steps param. if N keeps the same during training, so what should we do if N is larger than the length of words or charaters? the second question is about the recognition network,When we get N text regions from the original images after the sample network, how could we find the corresponding label for different text regions during training?for example, we get 2 text regions '16', '18', and we have 2 labels '16', '18',how can we choose label ‘16’ for text regions '16' instead of '18' during the network training? Wish your reply, Thanks.

opened by caoyangcr7 5
About LSTM in loc-net
Why lstm is used in loc-net ? i saw in the paper: "This BLSTM is used to generate the hidden states hn, which in turn are used to predict the affine transformation matrices". Why not directly use Flattened feature to predict the output affine transformation matrices.

Why lstm input is same? In the code, the Flattened feature is copied for num_timestep times as for num_timestep inputs of lstm, these features are totally the same, why design it in such way? And if so, the diverse direction in blstm should be useless.

How to choose output matrices. If bbox is less than num_timestep, then how do i find which affine transformation matrices is the perferred bbox parameters. can you explain it to me? I am a little bit confused about the paper!
opened by jugg1024 4
tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

@Bartzi , I face a question when I run tfrecord_to_image.py: python tfrecord_to_image.py /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/train /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/fsns_data_train train

error information: Traceback (most recent call last): File "tfrecord_to_image.py", line 39, in for idx, string_record in enumerate(record_iterator): File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/lib/io/tf_record.py", line 77, in tf_record_iterator reader.GetNext(status) File "/usr/lib/python3.4/contextlib.py", line 66, in exit next(self.gen) File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

opened by gitUserGoodLeaner 4
plot_log not matched the log file format

the actual log file content has 4 columns, as following: 2017-08-25 10:54:04,722 Node[0] Epoch[0] Batch [50] Speed: 106.29 samples/sec Accuracy=0.204000 Loss=2.318700 but parse_log_file only process 3 columns. and event_info = re.search(r'.-(?P<event_name>.)=(?P.)', info) not matched the log output, there is not a '-' before the event_name. I correct it by modify if len(line_splits) == 3 to if len(line_splits) == 4 and erase the '.-' but I got an error when plotting. the error was: 'for metric, axe in zip(metrics_to_plot, axes): TypeError: zip argument #2 must support iteration'

(Pdb) print(metrics_to_plot) ['Accuracy'] (Pdb) print(axes) Axes(0.125,0.11;0.775x0.77)

opened by yangxiuwu 4
question about the N grids in paper

Hello, I have a question about the N grids in the paper. in the paper, it said that

The first is the localization network that takes the input image and predicts N transformation matrices, that are applied to N identical grids, forming N different sampling grids

How can we know the number of N ?

opened by caoyangcr7 3
i read the paper and have a question: what is the order of the labels?

assume there are N lines in a image, (the order is "aaa", "bbb", "ccc"...) each have a bbox, after the LocalizationNetwork there N affine transformation matrices (maybe the order is "ccc", "bbb", "aaa"), but how to decide which is which? if don't align it, how to train it? or if it just have a prescriptive order of.. like from top to bottom? and what will happen if the number of bbox in the image is less or more than N?

opened by jacobunderlinebenseal 2
compiling on Windows

Hi. I want to know can the code running on a Windows platform. Since it is not clearly declared that the code can't run on windows. I had tried to run the code, but the warp-ctc can't be compiled on Windows. How can I make it?

opened by whulc 2
I have 3 Question

Q1 : Do we need to use "eval_text_recognition_model.py" file to perform the text recognition ? Q2 : Can you guys provide us with a pre-trained model? Q3 : Is this system capable to recognizing a single text in an image or a line of text containing multiple characters?

opened by arsalan993 1
load pretrained model error

Hello, I encountered another problem when loading the pre-trained model, as shown in the following figure:

When calling the python svhn_train.py --model_prefix provided by you, the responding model file cannot always be found, but I switched the directory to this directory and found that there is a responding model file in this directory, so it is strange. The only difference between me and you is that my mxnet version is 1.0.0 instead of 0.9.3, but I think the functions of the two versions of the model load should be the same, and it will not cause the error. In addition, I would like to ask you, is there a difference between see-ocr and stn-ocr? Are the two models exactly the same? What is the difference between the two?

Looking forward to your reply！

opened by fycfycfyc 1
train original svhn datasets

Excuse me: I noticed that your code gives the training steps of the model on the two variant data sets of svhn, but the training steps of the model on the original svhn data set are not given. If you want the model to be trained on the original svhn data set, how should the original svhn data set be preprocessed? For example, how big should the svhn data set be resized? Looking forward to your reply.Thank you very much!

opened by fycfycfyc 1
Evaluation fail!

Hello Bartzi I try to run evaluation SVHN and i get this error:

the command: python eval_svhn_model.py '/home/hthai/stn-ocr/datasets/svhn/original_svhn/models/model' 0040 '/home/hthai/stn-ocr/datasets/svhn/evaluation/test.csv' '/home/hthai/stn-ocr/datasets/svhn/svhn_char_map.json'

How can i fix it ? Thank you !

opened by ThaiLe189 6
ctc_loss.cpp:509:10: fatal error: 'ctc.h' file not found

I am getting the above warning while running "make"

After that while running this command python3 setup.py build_ext --inplace I am getting the below error. Can you please help me with this

opened by eravallirao 15
Training does not end.

I have issued the command for training (svhn) as per the instructions. It does not progress at all. ########################################################################## Command : python train_svhn.py /home/aditya/stn-ocr/generated/centered/train.csv /home/aditya/stn-ocr/generated/centered/valid.csv --log-dir /home/aditya/stn-ocr -b 400 --lr 1e-5

/home/aditya/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp loading data 2018-10-29 13:53:20,201 Node[0] start with arguments Namespace(batch_size=400, blank_label=0, char_map=None, checkpoint_interval=None, eval_image=None, fix_loc=False, gif=False, gpus=None, ip=None, kv_store='local', load_epoch=None, log_dir='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training', log_file='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training/log', log_level='INFO', log_name='training', lr=1e-05, lr_factor=1, lr_factor_epoch=1, model_prefix=None, num_epochs=10, plot_network_graph=False, port=1337, progressbar=False, save_model_prefix=None, send_bboxes=False, train_file='/home/aditya/stn-ocr/generated/centered/train.csv', val_file='/home/aditya/stn-ocr/generated/centered/valid.csv', video=False, zoom=0.9) 2018-10-29 13:53:20,202 Node[0] EPOCH SIZE: 250 2018-10-29 13:53:20,226 Node[0] Start training with [cpu(0)]

############################################################################

It stops right there. No progress.

opened by avasisht-celadon 9

Owner

Christian Bartz

GitHub https://arxiv.org/abs/1707.08831

A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

444 Dec 30, 2022

MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

99 Nov 1, 2022

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

27.5k Jan 8, 2023

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a

684 Jan 6, 2023

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

4 Jul 11, 2022

Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

5 Dec 6, 2021

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

354 Dec 12, 2022

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022

Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍商品识别，围绕在复杂的商场零售场景中，识别出货架图像中的商品信息。主要组成部分：重复图像检测。【更新进度 4/10】图像拼接。【更新进度 0/10】目标检测。【更新进度 0/10】商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

18 Jan 27, 2022

OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

562 Jan 3, 2023

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

2.4k Jan 8, 2023

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

24 Apr 28, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Related tags

Overview

STN-OCR: A single Neural Network for Text Detection and Text Recognition

Please note that we refined our approach and released new source code. You can find the code here

Structure of the repository

Installation

Training

SVHN House Number Recognition

Text Recognition

FSNS

Observing the Training Progress

Evaluation

License

Citation

A short note on code quality

Comments

Owner

Christian Bartz

A pure pytorch implemented ocr project including text detection and recognition

MXNet OCR implementation. Including text recognition and detection.

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

Indonesian ID Card OCR using tesseract OCR

OCR, Scene-Text-Understanding, Text Recognition

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

OCR software for recognition of handwritten text

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

make a better chinese character recognition OCR than tesseract

Provides OCR (Optical Character Recognition) services through web applications

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

OCR system for Arabic language that converts images of typed text to machine-encoded text.