Provides OCR (Optical Character Recognition) services through web applications

Overview

OCR4all

Build Status

As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety of historical printings and obtain high quality results with reasonable time expenditure. Therefore, OCR4all is explicitly geared towards users with no technical background. If you are one of those users (or if you just want to use the tool and are not interested in the code), please go to the getting started project where you will find guides and test data.

Please note that OCR4all current main focus is a semi-automatic workflow allowing users to perform OCR even on the earliest printed books, which is a very challenging task that often requires a significant amount of manual interaction, especially when almost perfect quality is desired. Nevertheless, we are working towards increasing robustness and the degree of automation of the tool. An important cornerstone for this is the recently agreed cooperation with the OCR-D project which focuses on the mass full-text recognition of historical materials.

This repository contains the code for the main interface and server of the OCR4all project, while the repositories OCR4all/docker_image and OCR4all/docker_base_image are about the creation of a preconfigurated docker image.

For installing the complete project with a docker image, please follow the instructions here.

Mailing List

OCR4all is under active development and consequently, frequent releases containing bug fixes and further functionality can be expected. In order to always be up to date, we highly recommend subscribing to our mailing list where we will always announce notable enhancements.

Built With

Included Projects

  • OCRopus - Collection of document analysis programs
  • calamari - OCR Engine based on OCRopy and Kraken
  • LAREX - Layout analysis on early printed books

Formerly included / inspired by

  • Kraken - OCR engine for all the languages
  • nashi - Some bits of javascript to transcribe scanned pages using PageXML

Contact, Authors, and Helping Hands

Developers

  • Dr. Herbert Baier Saip (lead)
  • Maximilian Nöth (OCR4all, LAREX, and Calamari)
  • Christoph Wick (Calamari)
  • Andreas Büttner (Calamari and nashi)
  • Kevin Chadbourne (OCR4all and LAREX)
  • Yannik Herbst (OCR4all, LAREX, and distribution via VirtualBox)
  • Björn Eyselein (Artifactory and distribution via Docker)

Miscellaneous

  • Raphaëlle Jung (guides and artwork)
  • Dr. Uwe Springmann (ideas and feedback)
  • Prof. Dr. Frank Puppe (funding)

Former Project Members

  • Dennis Christ (OCR4all)
  • Alexander Hartelt (OCR4all)
  • Nico Balbach (OCR4all and LAREX)
  • Christine Grundig (ideas and feedback)
  • ...

Funding

Comments
  • Recognition claims it is finished but does nothing

    Recognition claims it is finished but does nothing

    I am trying to use OCR4all 0.5.0 via Docker on two different workstations.

    On one workstation everything is working fine, on the other the Recognition step finishes after several seconds but generates no results. This is reproducible with the example projects from the getting started repository using the default settings.

    • The console output tab shows:
    Found 109 files in the dataset
    Checkpoint version 2 is up-to-date.
    
    • The console error tab stays empty.
    • The browser console log is unsuspicious.
    • Tomcat's catalina.log is unsuspicious.

    Any ideas why this is happening or hints on other log files with more information?


    One theory is that the second workstation has not enough CPUs (2 available to Docker) to support Calamari. RAM (12 GB available to Docker) should not be an issue.

    opened by b2m 16
  • Missing Dockerfile

    Missing Dockerfile

    The README points to a Dockerfile in the master branch to build and launch the project, but the file has been removed in 134169af50a956d43d4c6aba91152fdb2a718c84. What is the status, and how do we use the project?

    opened by raphink 9
  • Segment/region detection comparison

    Segment/region detection comparison

    If ocrd-pc-segmenter is the same as the segmentation algorithm in ocr4all, this comparison might interesting to you: https://digi.ub.uni-heidelberg.de/diglitData/v/testset-ls-v3.pdf

    opened by jbarth-ubhd 8
  • Line Segmentation - Console Error Message

    Line Segmentation - Console Error Message

    5 von 70 Dateien wollen nicht durch die Line Segmentation, die folgende Fehlermeldung taucht auf, leider kann ich daraus nichts ersehen, was ich verändern muss damit es gehen könnte:

    • Alle Dateien sind von Hand mit LAREX Segmentiert worden
    Traceback (most recent call last):
      File "/usr/local/bin/pagelineseg", line 33, in 
        sys.exit(load_entry_point('ocr4all-helpers==0.2.2', 'console_scripts', 'pagelineseg')())
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 634, in cli
        pool.map(parallel, dataset)
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
        raise self._value
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
        result = (True, func(*args, **kwds))
      File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
        return list(map(*args))
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 625, in parallel
        remove_images=args.remove_images)
      File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 304, in pagexmllineseg
        root = etree.parse(xmlfile).getroot()
      File "src/lxml/etree.pyx", line 3521, in lxml.etree.parse
      File "src/lxml/parser.pxi", line 1859, in lxml.etree._parseDocument
      File "src/lxml/parser.pxi", line 1885, in lxml.etree._parseDocumentFromURL
      File "src/lxml/parser.pxi", line 1789, in lxml.etree._parseDocFromFile
      File "src/lxml/parser.pxi", line 1177, in lxml.etree._BaseParser._parseDocFromFile
      File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
      File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
      File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
      File "/var/ocr4all/data/testset/processing/0064.xml", line 1
    lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 55
    

    @jbarth-ubhd

    opened by lsubhd 7
  • TypeError: [

    TypeError: ["'", 'e', ':', '?'] has type list, but expected one of: bytes, unicode

    Hello! I'm using calamari from within the new OCR4all tool under Linux Mint 19.1 Tessa. The OCR process stops with the following error:

    Traceback (most recent call last):
      File "/usr/local/bin/calamari-predict", line 11, in 
        load_entry_point('calamari-ocr==0.3.1', 'console_scripts', 'calamari-predict')()
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/scripts/predict.py", line 151, in main
        run(args)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/scripts/predict.py", line 74, in run
        prediction = voter.vote_prediction_result(result)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/ocr/voting/voter.py", line 19, in vote_prediction_result
        return self.vote_prediction_result_tuple(tuple(prediction_results))
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-0.3.1-py3.6.egg/calamari_ocr/ocr/voting/voter.py", line 45, in vote_prediction_result_tuple
        p.sentence = [c for c, _ in sv.process_text(sentences)]
    TypeError: ["'", 'e', ':', '?'] has type list, but expected one of: bytes, unicode
    

    What can I do in order to avoid the TypeError?

    Many thanks!

    opened by alexander-winkler 7
  • Error during Line-Segmentation || TypeError: '<' not supported between instances of 'Image' and 'float'

    Error during Line-Segmentation || TypeError: '<' not supported between instances of 'Image' and 'float'

    Dear all,

    once again I have come up with a problem during line segmentation:

    File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 634, in cli pool.map(parallel, dataset) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 625, in parallel remove_images=args.remove_images) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/pagelineseg.py", line 369, in pagexmllineseg cropped = Image.fromarray(nlbin.adaptive_binarize(np.array(cropped)).astype(np.uint8)) File "/usr/local/lib/python3.6/dist-packages/ocr4all_helpers-0.2.2-py3.6.egg/ocr4all_helpers/lib/nlbin.py", line 47, in adaptive_binarize extreme = (np.sum(image<0.05)+np.sum(image>0.95))*1.0/np.prod(image.shape) TypeError: '<' not supported between instances of 'Image' and 'float'

    Is there any further information you need?

    Regards, Leonie

    opened by lsubhd 6
  • Using calamari models in OCR4all

    Using calamari models in OCR4all

    Hello! I have trained a calamari model using the calamari-train (v0.3.5) command. Since I'd like to use OCR4all in order to keep track of the project I tried to copy the model into the ocr4 models-directory

    project/
    └── 0
        ├── 0.ckpt.data-00000-of-00001
        ├── 0.ckpt.index
        ├── 0.ckpt.json
        ├── 0.ckpt.meta
        ├── checkpoint
    

    During the recognition process I get the following error message:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 547, in _ConvertFieldValuePair
        self.ConvertMessage(value, sub_message)
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 452, in ConvertMessage
        self._ConvertFieldValuePair(value, message)
      File "/usr/local/lib/python3.6/dist-packages/google/protobuf/json_format.py", line 552, in _ConvertFieldValuePair
        raise ParseError('Failed to parse {0} field: {1}'.format(name, e))
    google.protobuf.json_format.ParseError: Failed to parse network field: Failed to parse backend field: Message type "BackendParams" has no field named "shuffleBufferSize".
     Available Fields(except extensions): 
    

    Is there a way to use externally trained models in OCR4all? Thanks in advance!

    opened by alexander-winkler 6
  • Error after training run

    Error after training run

    The following error message was the result of trying a training (based on fraktur historical + GT):

    training0_fehlermeldung_2020-12-01.txt

    IT is wondering: what ocr4all is doing in Docker in /Tmp/?

    Error message WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.757118). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.680600). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.302509). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.631931). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.153755). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.732034). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.661815). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.718801). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.743117). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.712587). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.963019). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.834908). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.744983). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.694347). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.802638). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.771759). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.675146). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.636742). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.748071). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.423025). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.787484). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.808512). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.172735). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.214929). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (3.159862). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.875830). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.835742). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.827941). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.780199). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.717965). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.834584). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.468633). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (2.007760). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.707687). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.761901). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (19.359528). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (24.881613). Check your callbacks. WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (7.594938). Check your callbacks. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 save_weights_to_hdf5_group(model_weights_group, model_layers) save_weights_to_hdf5_group(model_weights_group, model_layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in __setitem__ File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_00000453.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xbb35e60, total write size = 179216, bytes this sub-write = 179216, bytes actually written = 18446744073709551615, offset = 5828608)

    During handling of the above exception, another exception occurred:

    OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_00000438.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xaba6980, total write size = 265232, bytes this sub-write = 265232, bytes actually written = 18446744073709551615, offset = 5742592)Traceback (most recent call last):

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start yield yield File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook batch_hook(batch, logs) batch_hook(batch, logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end self.on_batch_end(batch, logs=logs) self.on_batch_end(batch, logs=logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close h5i.dec_ref(id) h5i.dec_ref(id) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_00000438.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x9ca6b70, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_00000453.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xb64dbe0, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)

    During handling of the above exception, another exception occurred:

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run progress_bar=not args.no_progress_bars progress_bar=not args.no_progress_bars File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train self._run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in _run_train self.run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in run_train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train v_cb, es_cb v_cb, es_cb File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 108, in h5py.h5f.create File "h5py/h5f.pyx", line 108, in h5py.h5f.create OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_2/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x7182fa8, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0)OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:02 2020 , filename = '/tmp/calamari3umodg4c/fold_4/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x67ecef8, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0)

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 109, in save_model_to_hdf5 save_weights_to_hdf5_group(model_weights_group, model_layers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 636, in save_weights_to_hdf5_group param_dset[:] = val File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/dataset.py", line 708, in setitem self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5d.pyx", line 222, in h5py.h5d.DatasetID.write File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite OSError: Can't write data (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_00000481.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xb465720, total write size = 49040, bytes this sub-write = 49040, bytes actually written = 18446744073709551615, offset = 2367488)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 753, in on_start yield File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 181, in run_one_epoch step += 1 File "/usr/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 788, in on_batch mode, 'end', step, batch_logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 239, in _call_batch_hook batch_hook(batch, logs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 528, in on_train_batch_end self.on_batch_end(batch, logs=logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 108, in on_batch_end self.last_checkpoint = self.make_checkpoint(self.checkpoint_params.output_dir, self.checkpoint_params.output_model_prefix) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 120, in save_model_to_hdf5 f.close() File "/usr/local/lib/python3.6/dist-packages/h5py/hl/files.py", line 443, in close h5i.dec_ref(id) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref RuntimeError: Problems closing file (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_00000481.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0xacbac40, total write size = 6144, bytes this sub-write = 6144, bytes actually written = 18446744073709551615, offset = 4096)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 371, in main() File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 367, in main run(args) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/train.py", line 359, in run progress_bar=not args.no_progress_bars File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 197, in train self._run_train(train_net, train_start_time, progress_bar, self.dataset, self.validation_dataset, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/trainer.py", line 213, in run_train train_net.train(train_dataset, val_dataset, checkpoint_params, self.txt_postproc, progress_bar, training_callback) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 332, in train v_cb, es_cb File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 397, in fit prefix='val') File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 757, in on_start self.callbacks._call_end_hook(mode) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 262, in _call_end_hook self.on_train_end() File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py", line 379, in on_train_end callback.on_train_end(logs) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 74, in on_train_end version='last') File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/backends/tensorflow_backend/callbacks/earlystopping.py", line 85, in make_checkpoint self.model.save(checkpoint_path + '.h5', overwrite=True) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save signatures, options) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 112, in save_model model, filepath, overwrite, include_optimizer) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 92, in save_model_to_hdf5 f = h5py.File(filepath, mode='w') File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 408, in init swmr=swmr) File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 179, in make_fid fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 108, in h5py.h5f.create OSError: Unable to create file (file write failed: time = Tue Dec 1 08:28:09 2020 , filename = '/tmp/calamari3umodg4c/fold_3/model_last.ckpt.h5', file descriptor = 5, errno = 28, error message = 'No space left on device', buf = 0x5e02c28, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/cross_fold_trainer.py", line 27, in train_individual_model ], args.get("run", None), {"threads": args.get('num_threads', -1)}), verbose=args.get("verbose", False)): File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/utils/multiprocessing.py", line 87, in run raise Exception("Error: Process finished with code {}".format(process.returncode)) Exception: Error: Process finished with code -11 """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/usr/local/bin/calamari-cross-fold-train", line 33, in sys.exit(load_entry_point('calamari-ocr==1.0.5', 'console_scripts', 'calamari-cross-fold-train')()) File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/scripts/cross_fold_train.py", line 80, in main temporary_dir=args.temporary_dir, keep_temporary_files=args.keep_temporary_files, File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/cross_fold_trainer.py", line 151, in run pool.map_async(train_individual_model, run_args).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value Exception: Error: Process finished with code -11

    As a result: no recognition process is running anymore - not for the trained project, nor for the others loaded in ocr4all - with the following error:

    ihxx_recognition_eigenesModell

    opened by lsubhd 5
  • Error in Line Segmentation and GTProduction

    Error in Line Segmentation and GTProduction

    Hi, I use the version of ocr4all is 0.4.0 with the same version of Larex and we mount it to a server. I have a problem, I comment better:

    Perform all the Process Flow of all the pages I did the GT Production through Larex of all the pages In Project Overview, I cross-mark the Line Segmentation and GT columns of some pages. I do the process flow only of those specific pages Mark me with a check mark up to the Line Segmentation column I do GT of those specific pages I go back to Project Overview, and I cross-mark the Line Segmentation and the GT of those pages Another thing, do 2 times the process flow of all the pages and they are specific pages. Another thing I realized is in the generation of XML because I get this error for example if I want to do a training:

    File "src / lxml / parser.pxi", line 654, in lxml.etree._raiseParseError File "/var/ocr4all/data/Rodrigo/processing/0006.xml", line 1 lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 55

    I hope someone had a similar problem.

    Greetings from Argentina!

    opened by emanuel-22 5
  • ValueError: zero-size array to reduction operation maximum which has no identity

    ValueError: zero-size array to reduction operation maximum which has no identity

    Hello! During the recognition process the following error is thrown and the recognition effectively stops proceeding, while the Status still reads "Status: ERROR: The process is still running" (OCR4all ver 0.3.0, LAREX ver 0.3.1):

    Process ForkProcess-3:
    Traceback (most recent call last):
      File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/datasets/input_dataset.py", line 99, in run
        out = self.apply_single(*data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/datasets/input_dataset.py", line 119, in apply_single
        line, params = self.params.data_processor.apply([line], 1, False)[0]
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 19, in apply
        processes=processes, progress_bar=progress_bar, max_tasks_per_child=max_tasks_per_child)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/utils/multiprocessing.py", line 32, in parallel_map
        out = list(map(f, d))
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 50, in _apply_single
        data, params = proc._apply_single(data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/data_preprocessor.py", line 50, in _apply_single
        data, params = proc._apply_single(data)
      File "/usr/local/lib/python3.6/dist-packages/calamari_ocr-1.0.5-py3.6.egg/calamari_ocr/ocr/data_processing/center_normalizer.py", line 15, in _apply_single
        out, params = self.normalize(data, cval=np.amax(data))
      File "<__array_function__ internals>", line 6, in amax
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2668, in amax
        keepdims=keepdims, initial=initial, where=where)
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
        return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
    ValueError: zero-size array to reduction operation maximum which has no identity
    

    I have attached the PAGEXML and image files where the error occurs. I couldn't find anything suspicious here.

    0125 nrm 0125 bin 0125.txt

    opened by alexander-winkler 5
  • Word-based PageXML Output / Wortweise PageXML Ebene in der Ausgabe

    Word-based PageXML Output / Wortweise PageXML Ebene in der Ausgabe

    Liebe Kollegen,

    es würde uns sehr helfen wenn künftig die PageXML-Daten der Ausgabe eine Wortebene enthielten.

    Vielen Dank für Eure Mühe im Voraus!

    Schöne Grüße aus München Florian Landes

    opened by FLE92 5
  • Bump spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE

    Bump spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE

    Bumps spring-webmvc from 4.3.18.RELEASE to 5.2.20.RELEASE.

    Release notes

    Sourced from spring-webmvc's releases.

    v5.2.20.RELEASE

    :star: New Features

    • Restrict access to property paths on Class references #28262
    • Improve diagnostics in SpEL for large array creation #28257

    v5.2.19.RELEASE

    :star: New Features

    • Declare serialVersionUID on DefaultAopProxyFactory #27785
    • Use ByteArrayDecoder in DefaultClientResponse::createException #27667

    :lady_beetle: Bug Fixes

    • ProxyFactoryBean getObject called before setInterceptorNames, silently creating an invalid proxy [SPR-7582] #27817
    • Possible NPE in Spring MVC LogFormatUtils #27783
    • UndertowHeadersAdapter's remove() method violates Map contract #27593
    • Fix assertion failure messages in DefaultDataBuffer.checkIndex() #27577

    :notebook_with_decorative_cover: Documentation

    • Lazy annotation throws exception if non-required bean does not exist #27660
    • Incorrect Javadoc in [NamedParameter]JdbcOperations.queryForObject methods regarding exceptions #27581
    • DefaultResponseErrorHandler update javadoc comment #27571

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR25 #27635
    • Upgrade to Log4j2 2.16.0 #27825

    v5.2.18.RELEASE

    :star: New Features

    • Enhance DefaultResponseErrorHandler to allow logging complete error response body #27558
    • DefaultMessageListenerContainer does not log an error/warning when consumer tasks have been rejected #27457

    :lady_beetle: Bug Fixes

    • Performance impact of con.getContentLengthLong() in AbstractFileResolvingResource.isReadable() downloading huge jars to check component length #27549
    • Performance impact of ResourceUrlEncodingFilter on HttpServletResponse#encodeURL #27548
    • Avoid duplicate JCacheOperationSource bean registration in #27547
    • Non-escaped closing curly brace in RegEx results in initialization error on Android #27502
    • Proxy generation with Java 17 fails with "Cannot invoke "Object.getClass()" because "cause" is null" #27498
    • ConcurrentReferenceHashMap's entrySet violates the Map contract #27455

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR24 #27526

    v5.2.17.RELEASE

    ... (truncated)

    Commits
    • cfa701b Release v5.2.20.RELEASE
    • 996f701 Refine PropertyDescriptor filtering
    • 90cfde9 Improve diagnostics in SpEL for large array creation
    • 94f52bc Upgrade to Artifactory Resource 0.0.17
    • d4478ba Upgrade Java versions in CI image
    • 136e6db Upgrade Ubuntu version in CI images
    • 8f1f683 Upgrade Java versions in CI image
    • ce2367a Upgrade to Log4j2 2.17.1
    • acf7823 Next development version (v5.2.20.BUILD-SNAPSHOT)
    • 1a03ffe Upgrade to Log4j2 2.16.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump spring-web from 4.3.18.RELEASE to 6.0.0

    Bump spring-web from 4.3.18.RELEASE to 6.0.0

    Bumps spring-web from 4.3.18.RELEASE to 6.0.0.

    Release notes

    Sourced from spring-web's releases.

    v6.0.0

    See What's New in Spring Framework 6.x and Upgrading to Spring Framework 6.x for upgrade instructions and details of new features.

    :star: New Features

    • Avoid direct URL construction and URL equality checks #29486
    • Simplify creating RFC 7807 responses from functional endpoints #29462
    • Allow test classes to provide runtime hints via declarative mechanisms #29455

    :notebook_with_decorative_cover: Documentation

    • Align javadoc of DefaultParameterNameDiscoverer with its behavior #29494
    • Document AOT support in the TestContext framework #29482
    • Document Ahead of Time processing in the reference guide #29350

    :hammer: Dependency Upgrades

    • Upgrade to Reactor 2022.0.0 #29465

    :heart: Contributors

    Thank you to all the contributors who worked on this release:

    @​ophiuhus and @​wilkinsona

    v6.0.0-RC4

    :star: New Features

    • Introduce DataFieldMaxValueIncrementer for SQL Server sequences #29447
    • Introduce findAllAnnotationsOnBean variant on ListableBeanFactory #29446
    • Introduce support for Jakarta WebSocket 2.1 #29436
    • Allow @ControllerAdvice in WebFlux to handle exceptions before a handler is selected #22991

    :lady_beetle: Bug Fixes

    • Bean with unresolved generics do not use fallback algorithms with AOT #29454
    • TomcatRequestUpgradeStrategy is not compatible with Tomcat 10.1 #29434
    • Autowiring of a generic type produced by a factory bean fails after AOT processing #29385

    :notebook_with_decorative_cover: Documentation

    • Reference PDF containing full docs not available #28451

    :hammer: Dependency Upgrades

    • Revisit Servlet API baseline: Servlet 6.0 in the build, Servlet 5.0 compatibility at runtime #29435
    • Upgrade to Context Propagation 1.0.0 #29442
    • Upgrade to Jackson 2.14.0 #29351
    • Upgrade to Micrometer 1.10.0 #29441

    ... (truncated)

    Commits
    • 5a30a43 Release v6.0.0
    • 42856ba Add milestone repo for optional Netty 5 support
    • 9be6cea Polishing deprecated methods
    • 37b4391 Align javadoc of DefaultParameterNameDiscoverer with its behavior
    • 09a58a5 Polish
    • 10f4ad1 Assert fixed in DefaultErrorResponseBuilder
    • 9457ed3 Document AOT support in the TestContext framework
    • 074ec97 Fix section formatting in the testing chapter
    • 9ede4af Revert "Ignore HttpComponents Javadoc"
    • bfc1251 Merge branch '5.3.x'
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump jackson-databind from 2.10.0 to 2.12.7.1

    Bump jackson-databind from 2.10.0 to 2.12.7.1

    Bumps jackson-databind from 2.10.0 to 2.12.7.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Preprocessing appears to run but produces no output

    Preprocessing appears to run but produces no output

    Dear OCr4all team, I have loaded 5 image files and started preprocessing. OCR4all seems to process the first page but not the other pages. In the next step, even the first page is missing. Does it actually perform the preprocessing? Thanks for any help!

    Bildschirmfoto 2022-06-07 um 23 51 41

    Edit: The console error tab does not output any error message.

    opened by ESLincke 1
  • Kraken for OCR purpose

    Kraken for OCR purpose

    Hi! I would like to use Kraken instead of calamari for the OCR part, not only for segmentation. How can I do it? Just pass the directory with the models when running the docker image? If I want to use a custom model, developed and trained in python, what does it need to return? Will I use it in the same way in which I use Kraken?

    Type: Question 
    opened by aliceinland 1
  • Bump spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE

    Bump spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE

    Bumps spring-core from 4.3.18.RELEASE to 5.2.22.RELEASE.

    Release notes

    Sourced from spring-core's releases.

    v5.2.22.RELEASE

    :star: New Features

    • Refine CachedIntrospectionResults property introspection #28446

    :lady_beetle: Bug Fixes

    • Ignore invalid STOMP frame #28444

    v5.2.21.RELEASE

    :star: New Features

    • Remove DNS lookups during websocket connection initiation #28281

    :lady_beetle: Bug Fixes

    • Improve documentation and matching algorithm in data binders #28334
    • CodeGenerationException thrown when using AnnotationMBeanExporter on JDK 17 #28279
    • ResponseEntity objects are accumulated in ConcurrentReferenceHashMap #28273
    • NotWritablePropertyException when attempting to declaratively configure ClassLoader properties #28272

    v5.2.20.RELEASE

    :star: New Features

    • Restrict access to property paths on Class references #28262
    • Improve diagnostics in SpEL for large array creation #28257

    v5.2.19.RELEASE

    :star: New Features

    • Declare serialVersionUID on DefaultAopProxyFactory #27785
    • Use ByteArrayDecoder in DefaultClientResponse::createException #27667

    :lady_beetle: Bug Fixes

    • ProxyFactoryBean getObject called before setInterceptorNames, silently creating an invalid proxy [SPR-7582] #27817
    • Possible NPE in Spring MVC LogFormatUtils #27783
    • UndertowHeadersAdapter's remove() method violates Map contract #27593
    • Fix assertion failure messages in DefaultDataBuffer.checkIndex() #27577

    :notebook_with_decorative_cover: Documentation

    • Lazy annotation throws exception if non-required bean does not exist #27660
    • Incorrect Javadoc in [NamedParameter]JdbcOperations.queryForObject methods regarding exceptions #27581
    • DefaultResponseErrorHandler update javadoc comment #27571

    :hammer: Dependency Upgrades

    • Upgrade to Reactor Dysprosium-SR25 #27635
    • Upgrade to Log4j2 2.16.0 #27825

    ... (truncated)

    Commits
    • 8f4c172 Release v5.2.22.RELEASE
    • 9f238c9 Polishing
    • 50177b1 Refine CachedIntrospectionResults property introspection
    • 159a99b Ignore invalid STOMP frame
    • 41e158c Next development version (v5.2.22.BUILD-SNAPSHOT)
    • 833e750 Improve documentation and matching algorithm in data binders
    • d70054d Upgrade to Log4j2 2.17.2
    • 36e4951 Polishing
    • 87b5080 Consistent use of getLocalAddr() without DNS lookups in request adapters
    • 5cbf85a Avoid return value reference in potentially cached MethodParameter instance
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(0.6.1)
  • 0.6.1(Jan 28, 2022)

    Features

    • adds additional info messages regarding the newly added deep3 models to the UI
    • removes some obsolete information from the UI

    Bugfixes

    • fixes some minor UI bugs
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Jan 26, 2022)

    Features

    • adds Kraken for baseline based layout analysis (regions and lines)
    • adds support for Calamari >v2.x
    • includes new and improved models for Calamari
    • upgrades LAREX interface to support latest LAREX releases with many new features
    • adds compatibility with latest ocr4all-helper-scripts version with e.g. improved line segmentation
    • adds DOCX result generation
    • adds support for custom page delimiters for result generation

    Bugfixes

    • removes outdated / duplicated parameters for some processing steps

    Other

    • removes legacy code used for handling GTC_Web
    • includes update to the latest version of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC3(Jan 24, 2022)

    Features

    • Change settings activated by default for recognition

    Bugfixes

    • Hide unused dummy / kraken segmentation settings
    • Sort imagePathList to ensure same image variant order in UI
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC2(Dec 23, 2021)

    Features

    • Adds selecting between Kraken and Dummy segmentation in Process Flow

    Bugfixes

    • fixes available calamari-predict parameters
    • updates backend pinging to avoid session time outs
    • removes unused files and debug code
    • updates datatables
    • fixes several typos and updates description texts
    • hides unused settings in Process Flow
    • fixes console output to front end during training
    Source code(tar.gz)
    Source code(zip)
  • 0.6-RC1(Nov 23, 2021)

  • 0.5.0(Nov 7, 2020)

    Features

    • Added extensive REST API to control OCR4all workflow without the GUI
    • Added the ability to choose certain Text Result Generation strategies (GT only, Recognition only, Combined)
    • Added the ability to keep or remove empty text lines in the Text Result Generation
    • PAGE XML schema version 2019-07-15 added as default schema version

    Bugfixes

    • Fixed a bug which caused certain TIFF images to get converted incorrectly
    • Projects can now be loaded even when no processing directory exists yet
    • Result Generation now only exports the selected PAGE XML files
    • Fixes bug which crashed on Line Segmentation when the underlying PAGE XML contained TextRegion elements without a specific subtype
    • Fixes version number of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC3(Nov 6, 2020)

    Bugfixes

    • temporarily make process state collector less lenient again as the implemented regex patterns caused time outs on very large projects
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC2(Oct 21, 2020)

    Bugfixes

    • Fixes bug which crashed on Line Segmentation when the underlying PAGE XML contained TextRegion elements without a specific subtype
    • Fixes version number of prima-core-libs
    Source code(tar.gz)
    Source code(zip)
  • 0.5-RC1(Oct 16, 2020)

    Features

    • Added extensive REST API to control OCR4all workflow without the GUI
    • Added the ability to choose certain Text Result Generation strategies (GT only, Recognition only, Combined)
    • Added the ability to keep or remove empty text lines in the Text Result Generation
    • Process state collector is now more lenient and works better with externally created PAGE XML files
    • PAGE XML schema version 2019-07-15 added as default schema version

    Bugfixes

    • Fixed a bug which caused certain TIFF images to get converted incorrectly
    • Projects can now be loaded even when no processing directory exists yet
    • Result Generation now only exports the selected PAGE XML files
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Jul 29, 2020)

    Features

    • Added filter to image list in sidebar to either (de)select all pages or only even/odd pages to ease working with e.g. bilingual editions

    Bugfixes

    • Result generation for text files no longer crashes on lines which contain neither recognized text nor ground truth text
    • Result generation for text files now respects the reading order of PAGE XML files (if a reading order exists)

    Other

    • prima-core-libs added to ease working with PAGE XML files
    • removed some obsolete files
    • renamed Artifact from OCR4all_Web to ocr4all
    Source code(tar.gz)
    Source code(zip)
  • 0.4-RC1(Jul 8, 2020)

    • Adds filter to image list in sidebar to either (de)select all pages or only even/odd pages to ease working with e.g. bilingual editions
    • Result generation for text files no longer crashes on lines which contain neither recognized text or ground truth text
    • Result generation for text files now respects the reading order of PAGE XML files (if a reading order exists)
    • prima-core-libs added to ease working with PAGE XML files
    • remove some obsolete files
    • rename Artifact from OCR4all_Web to ocr4all
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(May 27, 2020)

    Features

    • Upgrade to Calamari version 1.0.5 with TensorFlow 2 backend
    • Adds automatic project conversion from legacy to latest (entirely PAGE XML based and considerably less memory intensive)
    • Adds Ground Truth export
    • Adds checkbox for word level PAGE XML generation during Recognition

    Bugixes

    • Greyscale and despeckled images are now correctly sent to LAREX
    • Removed obsolete image type selection for all workflow steps related to Larex/GTP
    • Reduced unnecessarily verbose TF2 logging during Training and Recognition
    • Fixed the Despeckling workflow step so that despeckled images will get saved again
    • Various changes to the UI on the textual level (typos and improvements)
    • Large PDF files should now be streamed page per page to avoid memory issues
    Source code(tar.gz)
    Source code(zip)
  • 0.3-RC3(May 27, 2020)

  • 0.3-RC2(May 19, 2020)

    Second release candidate for OCR4all v0.3.

    • Adds possibility to convert legacy to latest projects in the UI
    • Adds checkbox for word level Page XML generation during Recognition
    • Removes obsolete image type selection for all workflow steps related to Larex/GTP
    • Reduces unnecessarily verbose TF2 logging during Training and Recognition
    • Fixes the Despeckling workflow step so that despeckled images will get saved again
    • Various changes to the UI on the textual level (typos and improvements)
    Source code(tar.gz)
    Source code(zip)
Owner
An Open Source Tool Providing a Comprehensive But Easy to Use (Semi-)Automatic OCR Workflow for Historical Printings
null
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

gosseract OCR Golang OCR package, by using Tesseract C++ library. OCR Server Do you just want OCR server, or see the working example of this package?

Hiromu OCHIAI 1.9k Dec 28, 2022
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 3, 2023
Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 1, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
Handwritten Number Recognition using CNN and Character Segmentation

Handwritten-Number-Recognition-With-Image-Segmentation Info About this repository This Repository is aimed at reading handwritten images of numbers an

Sparsha Saha 17 Aug 25, 2022
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 2, 2023
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 3, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

Deep Insight 99 Nov 1, 2022