Line based ATR Engine based on OCRopy

Overview

logo

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated and customized from other python scripts.

preview

Pretrained model repository

Pretrained models are available at (https://github.com/Calamari-OCR/calamari_models). The current release can be accessed here (336 MB).

Installing

Installation using Pip

The suggested method is to install calamari into a virtual environment using pip:

virtualenv -p python3 PATH_TO_VENV_DIR (e. g. virtualenv calamari_venv)
source PATH_TO_VENV_DIR/bin/activate
pip install calamari_ocr

which will install calamari and all of its dependencies.

To install the package without a virtual environment simply run

pip install calamari_ocr

To install the package from its source, download the source code and run

python setup.py install

Installation using Conda

Run

conda env create -f environment_master_gpu.yml

Alternatively you can install the cpu versions or the current dev version instead of the stable master.

Command line interface (Standard User)

If you simply want to use calamari for applying existent models to your text lines and optionally train new models you probably should use the command line interface of calamari, which is very similar to the one of OCRopy.

Note that you have to activate the virtual environment if used during the installation in order to make the command line scripts available.

Prediction of a page

Currently only OCR on lines is supported. To segment pages into lines (and the preceding preprocessing steps) we refer to the solutions provided by OCRopus, Kraken, Tesseract, etc. For users (especially less technical ones) in need of an all-in-one package OCR4all might be worth a look.

The prediction step using very deep neural networks implemented on Tensorflow as core feature of calamari should be used:

calamari-predict --checkpoint path_to_model.ckpt --files your_images.*.png

Calamari also supports several voting algorithms to improve different predictions of different models. To enable voting you simply have to pass several models to the --checkpoint argument:

calamari-predict --checkpoint path_to_model_1.ckpt path_to_model_2.ckpt ... --files your_images.*.png

The voting algorithm can be changed by the --voter flag. Possible values are: confidence_voter_default_ctc (default), sequence_voter. Note that both confidence voters depend on the loss function used for training a model, while the sequence voter can be used for all models but might yield slightly worse results.

Training of a model

In calamari you can both train a single model using a given data set or train a fold of several (default 5) models to generate different voters for a voted prediction.

Training a single model

A single model can be trained by the calamar-train-script. Given a data set with its ground truth you can train the default model by calling:

calamari-train --files your_images.*.png

Note, that calamari expects that each image file (.png) has a corresponding ground truth text file (.gt.txt) at the same location with the same base name.

There are several important parameters to adjust the training. For a full list type calamari-train --help.

  • --network=cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5: Specify the network structure in a simple language. The default network consists of a stack of two CNN- and Pooling-Layers, respectively and a following LSTM layer. The network uses the default CTC-Loss implemented in Tensorflow for training and a dropout-rate of 0.5. The creation string thereto is: cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5. To add additional layers or remove a single layer just add or remove it in the comma separated list. Note that the order is important!
  • --line_height=48: The height of each rescaled input file passed to the network.
  • --num_threads=1: The number of threads used during training and line preprocessing.
  • --batch_size=1: The number of lines processed in parallel.
  • --display=1: (epochs) How often an informative string about the current training process is printed in the shell
  • --output_dir: A path where to store checkpoints
  • --checkpoint_frequency: (epochs) How often a model shall be written as checkpoint to the drive
  • --epochs: The maximum number of training iterations (batches) for training. Note: this is the upper boundary if you use early stopping.
  • --samples_per_epoch: The number of samples to process per epoch (by default the size of the dataset)
  • --validation=None: Provide a second data set (images with corresponding .gt.txt) to enable early stopping.
  • --early_stopping_frequency=checkpoint_frequency: How often to check for early stopping on the validation dataset.
  • --early_stopping_nbest=10: How many successive models must be worse than the current best model to break the training loop
  • --early_stopping_best_model_output_dir=output_dir: Output dir for the current best model
  • --early_stopping_best_model_prefix=best: Prefix for the best model (output name will be {prefix}.ckpt
  • --n_augmentations=0: Data augmentation on the training set.
  • --weights: Load network weights from a given pretrained model. Note that the codec will probabily change its size to match the codec of the provided ground truth files. To enforce that some characters may not be deleted use a --whitelist.
  • --whitelist=[] --whitelist_files=[]: Specify either individual characters or a text file listing all white list characters stored as string.

Hint: If you want to use early stopping but don't have a separated validation set you can train a single fold of the calamari-cross-fold-train-script (see next section).

Training a n-fold of models

To train n more-or-less individual models given a training set you can use the calamari-cross-fold-train-script. The default call is

calamari-cross-fold-train --files your_images*.*.png --best_models_dir some_dir

By default this will train 5 default models using 80%=(n-1)/n of the provided data for training and 20%=1/n for validation. These independent models can then be used to predict lines using a voting mechanism. There are several important parameters to adjust the training. For a full list type calamari-cross-fold-train --help.

  • Almost parameters of calamari-train can be used to affect the training
  • --n_folds=5: The number of folds
  • --weights=None: Specify one or n_folds models to use for pretraining.
  • --best_models_dir=REQUIRED: Directory where to store the best model determined on the validation data set
  • --best_model_label={id}: The prefix for each of the best model of each fold. A string that will be formatted. {id} will be replaced by the number of the fold, i. e. 0, ..., n-1.
  • --temporary_dir=None: A directory where to store temporary files, e. g. checkpoints of the scripts to train an individual model. By default a temporary dir using pythons tempfile modules is used.
  • --max_parallel_models=n_folds: The number of models that shall be run in parallel. By default all models are trained in parallel.
  • --single_fold=[]: Use this parameter to train only a subset, e. g. a single fold out of all n_folds.

To use all models to predict and then vote for a set of lines you can use the calamari-predict script and provide all models as checkpoint:

calamari-predict --checkpoint best_models_dir/*.ckpt.json --files your_images.*.png

Evaluating a model

To compute the performance of a model you need first to predict your evaluation data set (see calamari-predict. Afterwards run

calamari-eval --gt *.gt.txt

on the ground truth files to compute an evaluation measure including the full confusion matrix. By default the predicted sentences as produced by the calamari-predict script end in .pred.txt. You can change the default behavior of the validation script by the following parameters

  • --gt=REQUIRED: The ground truth txt files.
  • --pred=None: The prediction files. If None it is expected that the prediction files have the same base name as the ground truth files but with --pred_ext as suffix.
  • --pred_ext=.pred.txt: The suffix of the prediction files if --pred is not specified
  • --n_confusions=-1: Print only the top n_confusions most common errors.

Experimenting with different network hyperparameters (experimental)

To find a good set of hyperparameters (e. g. network structure, learning rate, batch size, ...) you can use the experiment.pyscript that will both train models using the Cross-Fold-Algorithm and evaluate the model on a given evaluation data set. Thereto this script will directly output the performance of each individual fold, the average and its standard deviation, plus the results using the different voting algorithms. If you want to use this experimental script have a look at the parameters (experiment.py --help).

Comments
  • after traning how to create models?

    after traning how to create models?

    I run train command and it saved checkpoints in output directory. now , how can i use these as model? how to make models? which command? is there any more detailed documentation for this?

    opened by UlasSAYGINIM 27
  • Allow namespace prefixes other than 'None' in PageXML

    Allow namespace prefixes other than 'None' in PageXML

    Eynollah, e.g., produces PageXML files that use an explicit prefix (xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15").

    calamari_ocr/ocr/dataset/datareader/pagexml/reader.py, however, expects the prefix to be 'None' and throws an error when processing an eynollah pagexml.

    When I change line 120 of reader.py from

    ns = {"ns": root.nsmap[None]} 
    

    to

    ns = {'ns' : 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15'}
    

    it works. I'm not sure if you can generalize the namespace dictionary to cover both output styles. Maybe xpath's local-name function (instead of lxml find or findall) is an alternative.

    opened by alexander-winkler 17
  • performance degradation for versions > 0.2.5

    performance degradation for versions > 0.2.5

    It seems that a performance issue was introduced between 0.2.5 and 0.3.0 releases. I tested separately on environments with tensorflow cpu and gpu. Tensorflow version: 1.13.1

    Hardware: GPU: NVIDIA Tesla M60 CPU: intel i7 4710hq (8 threads)

    I've got images already in memory, so I use RawDataSet. Then I wrap it with InputDataset. And finally I use Predictor directly in code. The code is here: https://gist.github.com/wosiu/9fa50de9e47615b5fa08b23637e1f947

    | version | GPU time | CPU time | | --- | --- | --- | | 0.2.5 | 1440 ms | 2100 ms | | 0.3.0 | not tested | 5700ms | | 0.3.1 | 5859 ms | 6000ms |

    And some logs I get, not sure if related:

    1. tensorflow-gpu, calamari 0.3.1:
    2019-05-16 15:41:16,329 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:18,694 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:20,461 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:22,187 INFO 21006 140466380875520 metrics2.py:126 ocr_ms took 5859 ms
    
    1. tensorflow cpu, calamari 0.3.1:
    2019-05-16 15:50:48,571 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24599 tid 24599 thread 0 bound to OS proc set 0
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24582 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24583 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24605 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24608 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24585 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24610 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24611 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24609 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24606 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24613 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24607 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24612 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24584 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24615 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24621 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24620 thread 2 bound to OS proc set 0-7
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24614 tid 24614 thread 0 bound to OS proc set 0
    2019-05-16 15:50:50.630679: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.630721: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.677730: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.677766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.717138: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.717175: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.753602: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.753637: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.775119: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.775151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.793670: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.793716: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.827669: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.827717: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.865883: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.865927: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.912646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.912676: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.955744: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.955777: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50,960 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24643 tid 24643 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24655 tid 24655 thread 0 bound to OS proc set 0
    2019-05-16 15:50:52.560072: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.560110: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.599294: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.599324: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.621620: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.621653: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.641955: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.641989: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.663661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.663695: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.684411: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.684477: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.704166: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.704195: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.720328: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.720356: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.735363: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.735405: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.754581: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.754612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.764184: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.764352: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.780335: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.780366: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52,783 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24679 tid 24679 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24684 tid 24684 thread 0 bound to OS proc set 0
    2019-05-16 15:50:54.533429: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.533478: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.565486: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.565553: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54,569 INFO 23732 140378726676224 metrics2.py:126 ocr_ms took 5999 ms
    
    1. tensorflow cpu, calamari 0.3.0:
    2019-05-16 15:57:36,407 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25755 tid 25755 thread 0 bound to OS proc set 0
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25738 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25740 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25760 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25741 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25761 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25762 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25766 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25765 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25767 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25768 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25770 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25763 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25739 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25772 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25778 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25777 thread 2 bound to OS proc set 0-7
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25771 tid 25771 thread 0 bound to OS proc set 0
    2019-05-16 15:57:38.043087: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.043131: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.105437: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.105465: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.158275: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.158300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.213394: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.213423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.234949: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.235122: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.263308: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.263334: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.310647: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.310687: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.387153: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.387188: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.428633: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.428661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.477382: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.477543: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38,481 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25808 tid 25808 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25814 tid 25814 thread 0 bound to OS proc set 0
    2019-05-16 15:57:40.048510: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.048545: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.091460: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.091488: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.121078: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.121109: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.151877: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.152048: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.173696: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.173745: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.197971: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.198044: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.219307: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.219341: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.238584: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.238613: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.263360: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.263388: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.288734: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.288763: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.308051: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.308091: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.329646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.329827: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.342140: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.342170: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40,343 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25839 tid 25839 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25847 tid 25847 thread 0 bound to OS proc set 0
    2019-05-16 15:57:42.054612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.054845: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.100764: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.100795: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42,104 INFO 24999 140185064949504 metrics2.py:126 ocr_ms took 5698 ms
    
    1. tensorflow-gpu, calamari 0.2.5: no warnings or errors
    2019-05-16 16:05:10,802 INFO 30495 139657482069760 metrics2.py:126 ocr_ms took 1440 ms
    
    1. tensorflow cpu, calamari 0.2.5:
    2019-05-16 16:00:12,076 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 10 files in the dataset
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26569 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26590 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26591 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26592 thread 3 bound to OS proc set 0-7
    2019-05-16 16:00:12.699564: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.699610: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.739422: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.739451: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.774491: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.774525: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.808151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.808183: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.830500: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.830530: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.863712: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.863750: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.939775: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.939804: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.980605: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.980636: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.015485: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.015515: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.059779: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.059824: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13,064 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 13 files in the dataset
    2019-05-16 16:00:13.477228: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.477268: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.506554: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.506587: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.521381: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.521414: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.534570: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.534615: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.548266: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.548300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.572625: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.572665: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.595008: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.595052: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.616656: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.616823: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.647358: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.647398: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.667300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.667423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.680838: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.680964: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.696618: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.696664: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.716621: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.716766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13,718 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 2 files in the dataset
    2019-05-16 16:00:14.027123: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.027158: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.096210: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.096237: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14,099 INFO 26062 140471660128000 metrics2.py:126 ocr_ms took 2023 ms
    
    opened by wosiu 14
  • Issue on CTC loss when training on new data

    Issue on CTC loss when training on new data

    HI,

    When training Calamari on my dataset, I got this error tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.

    Can you help me? Thank you

    opened by realjoenguyen 14
  • Prediction API Error

    Prediction API Error

    I used cli to train on SROIE2019 dataset (original images are preprocessed into line images) with :

    calamari-train \
    --device.gpus 0 \
    --trainer.gen SplitTrain \
    --trainer.gen.validation_split_ratio=0.2  \
    --trainer.output_dir /data/model_output \
    --trainer.epochs 25 \
    --early_stopping.frequency=1 \
    --early_stopping.n_to_go=3 \
    --train.images /data/*.jpg
    

    Training went smooth and the logs are train.log

    After the training process, I am trying to load the model as mentioned here, however I get following error:

    >>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/best.ckpt')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 31, in from_checkpoint
        keras.models.load_model(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
        return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
        model_config = json_utils.decode(model_config.decode('utf-8'))
    AttributeError: 'str' object has no attribute 'decode'
    

    I tried loading pretrainined model from antiqua_historical, and again I got the same error:

    >>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/antiqua_historical/0.ckpt')
    /usr/local/lib/python3.8/dist-packages/paiargparse/dataclass_json_overrides.py:78: RuntimeWarning: `NoneType` object value of non-optional type tfaip_commit_hash detected when decoding CalamariScenarioParams.
      warnings.warn(f"`NoneType` object {warning}.", RuntimeWarning)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 26, in from_checkpoint
        ckpt = SavedCalamariModel(checkpoint, auto_update=auto_update_checkpoints)
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 31, in __init__
        self.update_checkpoint()
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 56, in update_checkpoint
        self._single_upgrade()
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 88, in _single_upgrade
        update_model(self.dict, self.ckpt_path)
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/migrations/version3_4to5.py", line 22, in update_model
        pred_model.load_weights(path + ".h5")
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2234, in load_weights
        hdf5_format.load_weights_from_hdf5_group(f, self.layers)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 662, in load_weights_from_hdf5_group
        original_keras_version = f.attrs['keras_version'].decode('utf8')
    AttributeError: 'str' object has no attribute 'decode'
    
    
    
    opened by Mageswaran1989 12
  •  Prediction step using very deep neural networks  feature of calamari

    Prediction step using very deep neural networks feature of calamari

    Hi, I installed calamari-0.2.4 . Tried to test on this simple example ""https://user-images.githubusercontent.com/33478216/46499779-a909b480-c829-11e8-87f2-d4a34d84ab69.png"" by: calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png

    It returns this Error :+1: Found 1 files in the dataset Traceback (most recent call last): File "/home/pc/my_calamari_env/bin/calamari-predict", line 11, in load_entry_point('calamari-ocr==0.2.4', 'console_scripts', 'calamari-predict')() File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 151, in main run(args) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 61, in run predictor = MultiPredictor(checkpoints=args.checkpoint, batch_size=args.batch_size, processes=args.processes) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in init self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 100, in init ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/checkpoint.py", line 20, in init self.json = json.load(f) File "/usr/lib/python3.5/json/init.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.5/json/init.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

    Thanks for your help :)

    opened by Tailor2019 12
  • Applying data processor Text Normalizer

    Applying data processor Text Normalizer

    @ChWick I have updated calamari-ocr version to 2.0.0 and now training takes ages to start. Previously, calamari used to compute codec and start. Now, calamari takes 2+ days to apply text normalization. I cant afford to wait 3 days for training to start. Can someone help? Capture

    opened by abhikatoldtrafford 11
  • Error: Process finished with code 1 in cross-fold

    Error: Process finished with code 1 in cross-fold

    It worked with the new code BUT after the fold 0 is done and found no better model than the 99,056858 I again get an error

    FOLD 0 | Storing checkpoint to 'I:\BIQE\CALAMARI\projects\voetius\TRAINING\crosstrainen\fold_0\model_00019470.ckpt' FOLD 0 | Checking early stopping model Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1948/1948 [04:06<00:00, 6.38it/s] FOLD 0 | No better model found. Currently accuracy of 99.056858% at iter 11682 (remaining nbest = 0) FOLD 0 | Early stopping now. FOLD 0 | Total time 11274.687343358994s for 19469 iterations. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(*args)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 27, in train_individual_model ], args.get("run", None), {"threads": args.get('num_threads', -1)}), verbose=args.get("verbose", False)): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\utils\multiprocessing.py", line 87, in run raise Exception("Error: Process finished with code {}".format(process.returncode)) Exception: Error: Process finished with code 1 """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\drsjh\Anaconda3\envs\calamaridev\Scripts\calamari-cross-fold-train.exe_main.py", line 9, in File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\scripts\cross_fold_train.py", line 80, in main temporary_dir=args.temporary_dir, keep_temporary_files=args.keep_temporary_files, File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 151, in run pool.map_async(train_individual_model, run_args).get() File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 644, in get raise self._value Exception: Error: Process finished with code 1

    opened by cornerman57 11
  • Use pre-trained Calamari models

    Use pre-trained Calamari models

    Thanks for the great work!

    I installed Calamari on a new AWS P2 instance and calamari-models. Tried to test on a simple example by

    calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png
    

    The detected text is way off. I guess it is related to the loading of model.

    I got these warnings:

    Found 1 files in the dataset
    2018-08-05 17:12:16.976735: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    Attempting a workaround: New graph and load weights
    Using CUDNN compatible LSTM backend on CPU
    WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:417: calling reverse_sequence (from tensorflow.python.ops.array_ops) with seq_dim is deprecated and will be removed in a future version.
    Instructions for updating:
    seq_dim is deprecated, use seq_axis instead
    WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:432: calling reverse_sequence (from tensorflow.python.ops.array_ops) with batch_dim is deprecated and will be removed in a future version.
    Instructions for updating:
    batch_dim is deprecated, use batch_axis instead
    2018-08-05 17:12:20.637472: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Minimum/ExponentialMovingAverage not found in checkpoint
    Attempting workaround: only loading trainable variables
    Loading Dataset: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 109.32it/s]
    Data Preprocessing: 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 104.47it/s]
    Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.74it/s]
    Prediction of 1 models took 0.14934062957763672s
    

    Is it due to the tensorflow version that ExponentialMovingAverage are not loaded? Currently installing calamari will install tensowflow 1.9. What tf version do you use in your development?

    Thanks!

    opened by zhangxiangnick 10
  • TypeError: metaclass conflict (problem with tfaip?)

    TypeError: metaclass conflict (problem with tfaip?)

    Hello!

    I'm not sure if I'm missing something, but as there has already been a problem with tfaip (#205), I wanted to point out an issue I'm struggling with when installing the latest version of calamari.

    Here my output. Any hints welcome, the hack provided in the above-mentioned issue does not work.

     user@user:~/virtualenvs/calamari_2-1-1/calamari(master)$ calamari-train --version
    Traceback (most recent call last):
      File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 33, in <module>
        sys.exit(load_entry_point('calamari-ocr==2.1.1', 'console_scripts', 'calamari-train')())
      File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 25, in importlib_load_entry_point
        return next(matches).load()
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/importlib_metadata-4.0.1-py3.6.egg/importlib_metadata/__init__.py", line 166, in load
        module = import_module(match.group('module'))
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/calamari_ocr-2.1.1-py3.6.egg/calamari_ocr/scripts/train.py", line 5, in <module>
        from tfaip.util.logging import logger, WriteToLogFile
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/__init__.py", line 37, in <module>
        from tfaip.scenario.scenariobaseparams import ScenarioBaseParams
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/scenario/scenariobaseparams.py", line 48, in <module>
        class ScenarioBaseParams(Generic[TDataParams, TModelParams], ABC, metaclass=ScenarioBaseParamsMeta):
    TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
    
    opened by alexander-winkler 9
  • training: shuffle data between epochs

    training: shuffle data between epochs

    First of all - thank you for that fantastic framework! I've been using tesseract for more than 1 year, but this one is way better for a single line processing :)

    Proposal: From the logs during training, it seems that input images are not shuffled at all. It would be nice, if they are shuffled at least at the very beginning. And it would be perfect if data are also shuffled after each epoch, so that different batches are created.

    opened by wosiu 9
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi Calamari-OCR/calamari!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • calamari-ocr 2.2.2 on ubuntu  22.04 partial success, difficulty with GPU software

    calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software

    Hi, I installed calamari-ocr-2.2.2 on ubuntu 22.04, and tensorflow-2.6, and python-3.9 in a venv. had to remove keras-2.11 which came with tensorflow2.6, and replace with keras 2.6.0 to get rid of error. Works great with cpu. So far so good.

    With tensorflow 2.6, it seems I am forced into a narrow range of cuda-11.2 and nvidia 360 drivers. I have not been able to get either successfully installed. Anyone have any success stories with Nvidia GPU and ubuntu 22.04 and calamari 2.2.2? Thanks!

    opened by ocrwork 0
  • calamari-eval: unknown arguments

    calamari-eval: unknown arguments

    I am on Calamari 2.2.2, and when freely combining the arguments I see on --help

    calamari-eval --checkpoint hsbfraktur.cala/best.ckpt.json --gt.preload false --n_worst_lines 10   --gt.texts /dev/shm/hsbfraktur.val/*.gt.txt --evaluator.progress_bar false
    

    …I end up with the following cryptic error message…

                 tfaip.util.logging: Uncaught exception
    Traceback (most recent call last):
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/bin/calamari-eval", line 8, in <module>
        sys.exit(run())
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 200, in run
        main(parse_args())
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 206, in parse_args
        return parser.parse_args(args=args).root
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/paiargparse/main_parser.py", line 93, in parse_args
        raise UnknownArgumentError(f"Unknown Arguments {' '.join(argv)}. Possible alternatives:{''.join(help_str)}")
    paiargparse.dataclass_parser.UnknownArgumentError: Unknown Arguments  . Possible alternatives:
    
    opened by bertsky 6
  • featreq: when warmstart-training, init weights of new chars from existing ones

    featreq: when warmstart-training, init weights of new chars from existing ones

    I have the following feature request: Often one needs to finetune a model to add diacritics. Luckily, we can finetune with --warmstart ... --codec.keep_loaded False. In such cases the actual witnesses of the diacritics are usually still sparse in the GT. So it would likely be helpful if the weights of the additional characters / codepoints could be initialized from those of characters that are similar looking or similar in function. Perhaps as an option --codec.init_new_from_old '["à": "a", "ś": "s" ...]' ...

    enhancement 
    opened by bertsky 2
  • HDF5 dataset format: how to convert

    HDF5 dataset format: how to convert

    I presume training on HDF5 will be more efficient than any of the other formats. And at least against the line GT file pairs, filesystem performance might be much better, too.

    So my question is: how do I convert existing datasets into HDF5 format?

    opened by bertsky 4
Releases(v2.2.2)
Owner
null
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

null 431 Jan 4, 2023
Pixel art search engine for opengameart

Pixel Art Reverse Image Search for OpenGameArt What does the final search look like? The final search with an example can be found here. It looks like

Eivind Magnus Hvidevold 92 Nov 6, 2022
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex.

faceprocessor nofacedb/faceprocessor is a face recognition engine for NoFaceDB program complex. Tech faceprocessor uses a number of open source projec

NoFaceDB 3 Sep 6, 2021
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
The world's simplest facial recognition api for Python and the command line

Face Recognition You can also read a translated version of this file in Chinese 简体中文版 or in Korean 한국어 or in Japanese 日本語. Recognize and manipulate fa

Adam Geitgey 47k Jan 7, 2023
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 3, 2023
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
Create single line SVG illustrations from your pictures

Create single line SVG illustrations from your pictures

Javier Bórquez 686 Dec 26, 2022
Create single line SVG illustrations from your pictures

Create single line SVG illustrations from your pictures

Javier Bórquez 326 May 23, 2021
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 7, 2023
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 4, 2022