Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Overview

STN-OCR: A single Neural Network for Text Detection and Text Recognition

This repository contains the code for the paper: STN-OCR: A single Neural Network for Text Detection and Text Recognition

Please note that we refined our approach and released new source code. You can find the code here

Please use the new code, if you want to experiment with FSNS like data and our approach. It should also be easy to redo the text recognition experiments with the new code, although we did not release any code for that.

Structure of the repository

The folder datasets contains code related to datasets used in the paper. datasets/svhn contains several scripts that can be used to create svhn based ground truth files as used in our experiments reported in section 4.2., please see the readme in this folder on how to use the scripts. datasets/fsns contains scripts that can be used to first download the fsns dataset, second extract the images from the downloaded files and third restructure the contained gt files.

The folder mxnet contains all code used for training our networks.

Installation

In order to use the code you will need the following software environment:

  1. Install python3 (the code might work with python2, too, but this is untested)
  2. it might be a good idea to use a virtualenv
  3. install all requirements with pip install -r requirements.txt
  4. clone and install warp-ctc from here
  5. go into the folder mxnet/metrics/ctc and run python setup.py build_ext --inplace
  6. clone the mxnet repository
  7. checkout the tag v0.9.3
  8. add the warpctc plugin to the project by enabling it in the file config.mk
  9. compile mxnet
  10. install the python bindings of mxnet
  11. You should be ready to go!

Training

You can use this code to train models for three different tasks.

SVHN House Number Recognition

The file train_svhn.py is the entry point for training a network using our purpose build svhn datasets. The file as such is ready to train a network capable of finding a single house number placed randomly on an image.

Example: centered_image

In order to do this, you need to follow these steps:

  1. Download the datasets

  2. Locate the folder generated/centered

  3. open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)

  4. make sure to prepare your environment as described in installation

  5. start the training by issuing the following command:

    python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5 --zoom 0.5 --char-map datasets/svhn/svhn_char_map.json

  6. Wait and enjoy.

If you want to do experiments on more challenging images you might need to update some parts of the code in train_svhn.py. The parts you might want to update are located around line 40 in this file. Here you can change the max. number of house numbers in the image (num_timesteps), the maximum number of characters per house number (labels_per_timestep), the number of rnn layers to use for predicting the localization num_rnn_layers and whether to use a blstm for predicting the localization or not use_blstm.

A quite more challenging dataset is contained in the folder medium_two_digits, or medium in the datasets folder. Example: 2_digits_more_challenge

If you want to follow our experiments with svhn numbers placed in a regular grid you'll need to do the following:

  1. Download the datasets
  2. Locate the folder generated/easy
  3. open train.csv and adapt the paths of all images to the path on your machine (do the same with valid.csv)
  4. set num_timesteps and labels_per_timestep to 4 in train_svhn.py
  5. start the training using the following command: python train_svhn.py <path to train.csv> <path to valid.csv> --gpus <gpu id you want to use> --log-dir <where to save the logs> -b <batch size you want ot use> --lr 1e-5
  6. If you are lucky it will work ;)

Text Recognition

Following our text recognition experiments might be a little difficult, because we can not offer the entire dataset used by us. But it is possible to perform the experiments based on the Synth-90k dataset provided by Jaderberg et al. here. After downloading and extracting this file you'll need to adapt the groundtruth file provided with this dataset to fit to the format used by our code. Our format is quite easy. You need to create a csv file with tabular separated values. The first column is the absolute path to the image and the rest of the line are the labels corresponding to this image.

To train the network you can use the train_text_recognition.py script. You can start this script in a similar manner to the train_svhn.py script.

FSNS

In order to redo our experiments on the FSNS dataset you need to perform the following steps:

  1. Download the fsns dataset using the download_fsns.py script located in datasets/fsns

  2. Extract the individual images using the tfrecord_to_image.py script located in datasets/fsns/tfrecord_utils (you will need to install tensorflow for doing that)

  3. Use the transform_gt.py script to transform the original fsns groundtruth, which is based on a single line to a groundtruth containing labels for each word individually. A possible usage of the transform_gt.py script could look like this:

    python transform_gt.py <path to original gt> datasets/fsns/fsns_char_map.json <path to gt that shall be generated>

  4. Because MXNet expects the blank label to be 0 for the training with CTC Loss, you have to use the swap_classes.py script in datasets/fsns and swap the class for space and blank in the gt, by issuing:

    python swap_classes.py <original gt> <swapped gt> 0 133

  5. After performing these steps you should be able to run the training by issuing:

    python train_fsns.py <path to generated train gt> <path to generated validation gt> --char-map datases/fsns/fsns_char_map.json --blank-label 0

Observing the Training Progress

We've added a nice script that makes it possible to see how well the network performs at every step of the training. This progress is normally plotted to disk for each iteration and can later on be used to create animations of the train progress (you can use the create_gif.py and create_video.py scripts located in mxnet/utils for this purpose). Besides this normal plotting to disk it is also possible to directly see this progress while the training is running. In order to see this you have to do the following:

  1. start the show_progress.py script in mxnet/utils

  2. start the training with the following additional command line params:

    --send-bboxes --ip <localhost, or remote ip if you are working on a remote machine> --port <the port the show_progress.py script is running on (default is 1337)

  3. enjoy!

This tool is especially helpful in determining whether the network is learning anything or not. We recommend that you always use this tool while training.

Evaluation

If you want to evaluate already trained models you can use the evaluation scripts provided in the mxnet folder. For evaluating a model you need to do the following:

  1. train or download a model

  2. choose the correct evaluation script an adapt it, if necessary (take care in case you are fiddling around with the amount of timesteps and number of RNN layers)

  3. Get the dataset you want to evaluate the model on and adapt the groundtruth file to fit the format expected by our software. The format expected by our software is defined as a csv (tab separated) file that looks like that: <absolute path to image> \t <numerical labels each label separated from the other by \t>

  4. run the chosen evaluation script like so

    python eval_<type>_model.py <path to model dir>/<prefix of model file> <number of epoch to test> <path to evaluation gt> <path to char map>

You can use eval_svhn_model.py for evaluating a model trained with CTC on the original svhn dataset, the eval_text_recognition_model.py script for evaluating a model trained for text recognition, and the eval_fsns_model.py for evaluating a model trained on the FSNS dataset.

License

This Code is licensed under the GPLv3 license. Please see further details in LICENSE.md.

Citation

If you are using this Code please cite the following publication:

@article{bartz2017stn,
  title={STN-OCR: A single Neural Network for Text Detection and Text Recognition},
  author={Bartz, Christian and Yang, Haojin and Meinel, Christoph},
  journal={arXiv preprint arXiv:1707.08831},
  year={2017}
}

A short note on code quality

The code contains a huge amount of workarounds around MXNet, as we were not able to find any easier way to do what we wanted to do. If you know a better way, pease let us know, as we would like to have code that is better understandable, as now.

Comments
  • Shape error eval_svhn_model.py for SVHN demos.

    Shape error eval_svhn_model.py for SVHN demos.

    Hi, I was trying to run your demos but I only make it to work for the original_svhn model, I also tried to train one by myself but at the end it raises the same size error.

    When I do:

    python eval_svhn_model.py ../datasets/svhn/models/original_svhn/models/model 40 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json
    

    Works perfect.

    However, when I try:

    python eval_svhn_model.py ../datasets/svhn/models/regular_grid/model 19 ../datasets/svhn/evaluation/test.csv ../datasets/svhn/svhn_char_map.json
    

    It raises the following error, I have tried to pass a different --input-width and --input-height but it seems that the problem is not there.

    [16:54:45] src/nnvm/legacy_json_util.cc:153: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
    [16:54:45] /home/sorelyss/Documents/test/incubator-mxnet/dmlc-core/include/dmlc/./logging.h:300: [16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)
    
    Stack trace returned 25 entries:
    [bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]
    [bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]
    [bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]
    [bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]
    [bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]
    [bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]
    [bt] (6) python(PyObject_Call+0x47) [0x5c1797]
    [bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]
    [bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
    [bt] (9) python() [0x5406df]
    [bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]
    [bt] (11) python() [0x5406df]
    [bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
    [bt] (13) python() [0x540199]
    [bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]
    [bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]
    [bt] (16) python() [0x540199]
    [bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]
    [bt] (18) python() [0x60c272]
    [bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]
    [bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]
    [bt] (21) python(Py_Main+0x456) [0x63fb26]
    [bt] (22) python(main+0xe1) [0x4cfeb1]
    [bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]
    [bt] (24) python(_start+0x29) [0x5d6049]
    
    Traceback (most recent call last):
      File "eval_svhn_model.py", line 109, in <module>
        model = get_model(args, data_shape, output_size)
      File "eval_svhn_model.py", line 58, in get_model
        model.set_params(arg_params, aux_params)
      File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/base_module.py", line 557, in set_params
        allow_missing=allow_missing, force_init=force_init)
      File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 261, in init_params
        _impl(name, arr, arg_params)
      File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/module/module.py", line 251, in _impl
        cache_arr.copyto(arr)
      File "/home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/ndarray.py", line 556, in copyto
        return _internal._copyto(self, out=other)
      File "mxnet/cython/ndarray.pyx", line 167, in ndarray._make_ndarray_function.generic_ndarray_function
      File "mxnet/cython/./base.pyi", line 36, in ndarray.CALL
    mxnet.base.MXNetError: b'[16:54:45] src/ndarray/ndarray.cc:239: Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (48,48,3,3) to.shape=(64,64,3,3)\n\nStack trace returned 25 entries:\n[bt] (0) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fbe48041d6c]\n[bt] (1) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10CopyFromToERKNS_7NDArrayEPS0_i+0x437) [0x7fbe48832997]\n[bt] (2) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x9d853a) [0x7fbe487cf53a]\n[bt] (3) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvoke+0x1034) [0x7fbe48aca674]\n[bt] (4) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x1312c) [0x7fbe3b8d512c]\n[bt] (5) /home/sorelyss/Documents/test/incubator-mxnet/python/mxnet/_cy3/ndarray.cpython-35m-x86_64-linux-gnu.so(+0x140ed) [0x7fbe3b8d60ed]\n[bt] (6) python(PyObject_Call+0x47) [0x5c1797]\n[bt] (7) python(PyEval_EvalFrameEx+0x4ec6) [0x53bba6]\n[bt] (8) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (9) python() [0x5406df]\n[bt] (10) python(PyEval_EvalFrameEx+0x54f0) [0x53c1d0]\n[bt] (11) python() [0x5406df]\n[bt] (12) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (13) python() [0x540199]\n[bt] (14) python(PyEval_EvalFrameEx+0x50b2) [0x53bd92]\n[bt] (15) python(PyEval_EvalFrameEx+0x4b04) [0x53b7e4]\n[bt] (16) python() [0x540199]\n[bt] (17) python(PyEval_EvalCode+0x1f) [0x540e4f]\n[bt] (18) python() [0x60c272]\n[bt] (19) python(PyRun_FileExFlags+0x9a) [0x60e71a]\n[bt] (20) python(PyRun_SimpleFileExFlags+0x1bc) [0x60ef0c]\n[bt] (21) python(Py_Main+0x456) [0x63fb26]\n[bt] (22) python(main+0xe1) [0x4cfeb1]\n[bt] (23) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbe53795830]\n[bt] (24) python(_start+0x29) [0x5d6049]\n'
    
    
    opened by sorelyss 8
  • Cannot train on fsns data set

    Cannot train on fsns data set

    Hi Bartzi: i try to use train_fsns.py to train on fsns data set. i get the following error messages: if i use args, --eval-image
    screenshot from 2017-09-29 16-23-47
    i don't use args, --eval-image. i just load train_file and val_file screenshot from 2017-09-29 16-28-11

    my development's environment: ubuntu 16.04 cuda 8.0 cudnn 5.0 mxnet 0.9.3

    Thank you very much.

    opened by Jia-HongHenryLee 6
  • Error in running eval_text_recognition.py

    Error in running eval_text_recognition.py

    Hi, I'm using the text recognition pretrained model downloaded from the website. I'm getting the following error in running this script. Any idea how to solve this?

    python eval_text_recognition_model.py model-0002.params 10000 original_gt.txt model-symbol.json

    Traceback (most recent call last):
      File "eval_text_recognition_model.py", line 97, in <module>
        reverse_char_map = {v: k for k, v in char_map.items()}
      File "eval_text_recognition_model.py", line 97, in <dictcomp>
        reverse_char_map = {v: k for k, v in char_map.items()}
    TypeError: unhashable type: 'list'
    
    

    Also, how many epochs I should set for the best results? I couldn't find it in the paper.

    Thanks

    opened by vermaarjun7 6
  • 2 details questions

    2 details questions

    @Bartzi Sorry to bother you again, I have another 2 questions. first is still about the N, because different training images may have different length of words or characters, so will N change during trainning? When I saw the source code, I found that N was set by num_time_steps param. if N keeps the same during training, so what should we do if N is larger than the length of words or charaters? the second question is about the recognition network,When we get N text regions from the original images after the sample network, how could we find the corresponding label for different text regions during training?for example, we get 2 text regions '16', '18', and we have 2 labels '16', '18',how can we choose label ‘16’ for text regions '16' instead of '18' during the network training? Wish your reply, Thanks.

    opened by caoyangcr7 5
  • About LSTM in loc-net

    About LSTM in loc-net

    1. Why lstm is used in loc-net ? i saw in the paper: "This BLSTM is used to generate the hidden states hn, which in turn are used to predict the affine transformation matrices". Why not directly use Flattened feature to predict the output affine transformation matrices.
    2. Why lstm input is same? In the code, the Flattened feature is copied for num_timestep times as for num_timestep inputs of lstm, these features are totally the same, why design it in such way? And if so, the diverse direction in blstm should be useless.
    3. How to choose output matrices. If bbox is less than num_timestep, then how do i find which affine transformation matrices is the perferred bbox parameters. can you explain it to me? I am a little bit confused about the paper!
    opened by jugg1024 4
  • tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

    tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

    @Bartzi , I face a question when I run tfrecord_to_image.py: python tfrecord_to_image.py /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/train /home/HardDisk/research/Computer_Vision/OCR/stn-ocr/stn-ocr/datasets/fsns/fsns_data/fsns_data_train train

    error information: Traceback (most recent call last): File "tfrecord_to_image.py", line 39, in for idx, string_record in enumerate(record_iterator): File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/lib/io/tf_record.py", line 77, in tf_record_iterator reader.GetNext(status) File "/usr/lib/python3.4/contextlib.py", line 66, in exit next(self.gen) File "/home/bob/stn-ocr-py3-env/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.DataLossError: truncated record at 285474855

    opened by gitUserGoodLeaner 4
  • plot_log not matched the log file format

    plot_log not matched the log file format

    the actual log file content has 4 columns, as following: 2017-08-25 10:54:04,722 Node[0] Epoch[0] Batch [50] Speed: 106.29 samples/sec Accuracy=0.204000 Loss=2.318700 but parse_log_file only process 3 columns. and event_info = re.search(r'.-(?P<event_name>.)=(?P.)', info) not matched the log output, there is not a '-' before the event_name. I correct it by modify if len(line_splits) == 3 to if len(line_splits) == 4 and erase the '.-' but I got an error when plotting. the error was: 'for metric, axe in zip(metrics_to_plot, axes): TypeError: zip argument #2 must support iteration'

    (Pdb) print(metrics_to_plot) ['Accuracy'] (Pdb) print(axes) Axes(0.125,0.11;0.775x0.77)

    opened by yangxiuwu 4
  • question about the N grids in paper

    question about the N grids in paper

    Hello, I have a question about the N grids in the paper. in the paper, it said that

    The first is the localization network that takes the input image and predicts N transformation matrices, that are applied to N identical grids, forming N different sampling grids

    How can we know the number of N ?

    opened by caoyangcr7 3
  • i read the paper and have a question: what is the order of the labels?

    i read the paper and have a question: what is the order of the labels?

    assume there are N lines in a image, (the order is "aaa", "bbb", "ccc"...) each have a bbox, after the LocalizationNetwork there N affine transformation matrices (maybe the order is "ccc", "bbb", "aaa"), but how to decide which is which? if don't align it, how to train it? or if it just have a prescriptive order of.. like from top to bottom? and what will happen if the number of bbox in the image is less or more than N?

    opened by jacobunderlinebenseal 2
  • compiling on Windows

    compiling on Windows

    Hi. I want to know can the code running on a Windows platform. Since it is not clearly declared that the code can't run on windows. I had tried to run the code, but the warp-ctc can't be compiled on Windows. How can I make it?

    opened by whulc 2
  • I have 3 Question

    I have 3 Question

    Q1 : Do we need to use "eval_text_recognition_model.py" file to perform the text recognition ? Q2 : Can you guys provide us with a pre-trained model? Q3 : Is this system capable to recognizing a single text in an image or a line of text containing multiple characters?

    opened by arsalan993 1
  • load pretrained model error

    load pretrained model error

    Hello, I encountered another problem when loading the pre-trained model, as shown in the following figure: 微信图片_20210809205306 微信图片_20210809205256

    When calling the python svhn_train.py --model_prefix provided by you, the responding model file cannot always be found, but I switched the directory to this directory and found that there is a responding model file in this directory, so it is strange. The only difference between me and you is that my mxnet version is 1.0.0 instead of 0.9.3, but I think the functions of the two versions of the model load should be the same, and it will not cause the error. In addition, I would like to ask you, is there a difference between see-ocr and stn-ocr? Are the two models exactly the same? What is the difference between the two?

    Looking forward to your reply!

    opened by fycfycfyc 1
  • train original svhn datasets

    train original svhn datasets

    Excuse me: I noticed that your code gives the training steps of the model on the two variant data sets of svhn, but the training steps of the model on the original svhn data set are not given. If you want the model to be trained on the original svhn data set, how should the original svhn data set be preprocessed? For example, how big should the svhn data set be resized? Looking forward to your reply.Thank you very much!

    opened by fycfycfyc 1
  • Evaluation fail!

    Evaluation fail!

    Hello Bartzi I try to run evaluation SVHN and i get this error: Screenshot from 2020-08-17 13-49-29

    the command: python eval_svhn_model.py '/home/hthai/stn-ocr/datasets/svhn/original_svhn/models/model' 0040 '/home/hthai/stn-ocr/datasets/svhn/evaluation/test.csv' '/home/hthai/stn-ocr/datasets/svhn/svhn_char_map.json'

    How can i fix it ? Thank you !

    opened by ThaiLe189 6
  • ctc_loss.cpp:509:10: fatal error: 'ctc.h' file not found

    ctc_loss.cpp:509:10: fatal error: 'ctc.h' file not found

    Screenshot 2019-06-17 at 4 44 56 PM I am getting the above warning while running "make"

    After that while running this command python3 setup.py build_ext --inplace I am getting the below error. Screenshot 2019-06-17 at 4 46 49 PM Can you please help me with this

    opened by eravallirao 15
  • Training does not end.

    Training does not end.

    I have issued the command for training (svhn) as per the instructions. It does not progress at all. ########################################################################## Command : python train_svhn.py /home/aditya/stn-ocr/generated/centered/train.csv /home/aditya/stn-ocr/generated/centered/valid.csv --log-dir /home/aditya/stn-ocr -b 400 --lr 1e-5

    /home/aditya/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp loading data 2018-10-29 13:53:20,201 Node[0] start with arguments Namespace(batch_size=400, blank_label=0, char_map=None, checkpoint_interval=None, eval_image=None, fix_loc=False, gif=False, gpus=None, ip=None, kv_store='local', load_epoch=None, log_dir='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training', log_file='/home/aditya/stn-ocr/2018-10-29T13:53:16.415078_training/log', log_level='INFO', log_name='training', lr=1e-05, lr_factor=1, lr_factor_epoch=1, model_prefix=None, num_epochs=10, plot_network_graph=False, port=1337, progressbar=False, save_model_prefix=None, send_bboxes=False, train_file='/home/aditya/stn-ocr/generated/centered/train.csv', val_file='/home/aditya/stn-ocr/generated/centered/valid.csv', video=False, zoom=0.9) 2018-10-29 13:53:20,202 Node[0] EPOCH SIZE: 250 2018-10-29 13:53:20,226 Node[0] Start training with [cpu(0)]

    ############################################################################

    It stops right there. No progress.

    opened by avasisht-celadon 9
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 1, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 90 Dec 22, 2022
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 3, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 8, 2021
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 3, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

null 174 Dec 31, 2022
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023