Detect textlines in document images

Overview

Build Status

Textline Detection

Detect textlines in document images

Introduction

This tool performs border, region and textline detection from document image data and returns the results as PAGE-XML. The goal of this project is to extract textlines of a document in order to feed them to an OCR model. This is achieved by four successive stages as follows:

The first three stages are based on pixelwise segmentation.

Border detection

For the purpose of text recognition (OCR) and in order to avoid noise being introduced from texts outside the printspace, one first needs to detect the border of the printed frame. This is done by a binary pixelwise-segmentation model trained on a dataset of 2,000 documents where about 1,200 of them come from the dhSegment project (you can download the dataset from here) and the remainder having been annotated in SBB. For border detection, the model needs to be fed with the whole image at once rather than separated in patches.

Layout detection

As a next step, text regions need to be identified by means of layout detection. Again a pixelwise segmentation model was trained on 131 labeled images from the SBB digital collections, including some data augmentation. Since the target of this tool are historical documents, we consider as main region types text regions, separators, images, tables and background - each with their own subclasses, e.g. in the case of text regions, subclasses like header/heading, drop capital, main body text etc. While it would be desirable to detect and classify each of these classes in a granular way, there are also limitations due to having a suitably large and balanced training set. Accordingly, the current version of this tool is focussed on the main region types background, text region, image and separator.

Textline detection

In a subsequent step, binary pixelwise segmentation is used again to classify pixels in a document that constitute textlines. For textline segmentation, a model was initially trained on documents with only one column/block of text and some augmentation with regards to scaling. By fine-tuning the parameters also for multi-column documents, additional training data was produced that resulted in a much more robust textline detection model.

Heuristic methods

Some heuristic methods are also employed to further improve the model predictions:

  • After border detection, the largest contour is determined by a bounding box and the image cropped to these coordinates.
  • For text region detection, the image is scaled up to make it easier for the model to detect background space between text regions.
  • A minimum area is defined for text regions in relation to the overall image dimensions, so that very small regions that are actually noise can be filtered out.
  • Deskewing is applied on the text region level (due to regions having different degrees of skew) in order to improve the textline segmentation result.
  • After deskewing, a calculation of the pixel distribution on the X-axis allows the separation of textlines (foreground) and background pixels.
  • Finally, using the derived coordinates, bounding boxes are determined for each textline.

Installation

pip install .

Models

In order to run this tool you also need trained models. You can download our pretrained models from here:
https://qurator-data.de/sbb_textline_detector/

Usage

The basic command-line interface can be called like this:

sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>

The tool does accept raw (RGB/grayscale) images as input, but results will be much improved when a properly binarized image is used instead. We also provide a tool to perform this binarization step.

Usage with OCR-D

In addition, there is a CLI for OCR-D:

ocrd-sbb-textline-detector -I OCR-D-IMG -O OCR-D-SEG-LINE-SBB -P model /path/to/the/models/textline_detection

Segmentation works on raw (RGB/grayscale) images, but honours AlternativeImages from earlier preprocessing steps, so it's OK to perform (say) deskewing first, followed by textline detection. Results from previous cropping or binarization steps are allowed and retained, but will be ignored. (So these are only useful if themselves needed for deskewing or dewarping prior to segmentation.)

This processor will replace any previously existing Border, ReadingOrder and TextRegion instances (but keep other region types unchanged).

Comments
  • Rename ocrd_sbb_textline_detector

    Rename ocrd_sbb_textline_detector

    I suggest to rename ocrd_sbb_textline_detector to ocrd-sbb-textline-detector or ocrd-sbb-segment-line which fit better to the OCR-D conventions.

    CC'ing @kba.

    opened by stweil 18
  • ensure valid coordinates by intersection with parent…

    ensure valid coordinates by intersection with parent…

    • Border: intersect with page frame
    • text regions: intersect with (new) Border
    • text lines: intersect with (new) text region (and back-transform at all)

    This fixes the negative coordinates, and also the line-region inconsistency (line bboxes way beyond tight region outlines).

    I'll fix removing the existing derived images with cropped in a PR to core, so this will be transparent and automatic here.

    opened by bertsky 10
  • support AlternativeImage input in OCR-D processor

    support AlternativeImage input in OCR-D processor

    The README states the following about current OCR-D support:

    Usage with OCR-D

    ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN
    ocrd-sbb-textline-detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \
            -p '{ "model": "/path/to/the/models/textline_detection" }'
    

    Segmentation works on raw RGB images, but respects and retains AlternativeImages from binarization steps, so it's a good idea to do binarization first, then perform the textline detection. The used binarization processor must produce an AlternativeImage for the binarized image, not replace the original raw RGB image.

    That latter statement shows that there has been some awareness of the problem, but unfortunately it is not consistent with the usage example given above, or with the actual implementation.

    IIUC All the implementation (and the usage example) does is take a binarized image as if it were the original image and create a segmentation for it. It does not even look at the AlternativeImages of the PAGE input file. This will not work for OCR-D pipelines where that binarized image has also been cropped (due to a prior Border detection) or deskewed (due to a prior @orientation detection) or dewarped. The image itself cannot tell you, only the PAGE-XML's AlternativeImage/@comments can.

    So please use workspace.image_from_page instead of bare input_file.local_filename or page.imageFilename. (The latter two also would not work for remote images.)

    bug enhancement 
    opened by bertsky 9
  • No text lines detected - Regression?

    No text lines detected - Regression?

    Using https://qurator-data.de/examples/actevedef_718448162.first-page.zip, ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG -O OCR-D-SEG-LINE-SBB-TLD -P model "/var/lib/textline_detection" only gives:

            <pc:Border>
                <pc:Coords points="105,80 2418,80 2418,3952 105,3952"/>
            </pc:Border>
    
    # pip list | egrep -i 'ocrd|sbb'
    ocrd                   2.38.0
    ocrd-modelfactory      2.38.0
    ocrd-models            2.38.0
    ocrd-utils             2.38.0
    ocrd-validators        2.38.0
    qurator-sbb-textline   0.0.1
    

    I'm investigating.

    bug 
    opened by mikegerber 8
  • ValueError thrown due to shape inconsistence (empty page)

    ValueError thrown due to shape inconsistence (empty page)

    For this empty page, I get a ValueError when applying sbb-textline-detector via my_ocrd_workflow.

    14:38:43.639 INFO processor.OcrdSbbTextlineDetectorRecognize - INPUT FILE 80 / <OcrdFile fileGrp=OCR-D-IMG-BIN, ID=FILE_0081_OCR-D-IMG-BIN, mimetype=application/vnd.prima.page+xml, url=OCR-D-IMG-BIN/FILE_0081_OCR-D-IMG-BIN.xml, local_filename=OCR-D-IMG-BIN/FILE_0081_OCR-D-IMG-BIN.xml]/>
    Traceback (most recent call last):
      File "/usr/local/bin/ocrd-sbb-textline-detector", line 8, in <module>
        sys.exit(ocrd_sbb_textline_detector())
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 25, in ocrd_sbb_textline_detector
        return ocrd_cli_wrap_processor(OcrdSbbTextlineDetectorRecognize, *args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
        run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/ocrd/processor/helpers.py", line 69, in run_processor
        processor.process()
      File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/ocrd_cli.py", line 71, in process
        x.run()
      File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 2091, in run
        textline_mask_tot=self.textline_contours(image_page)
      File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 492, in textline_contours
        prediction_textline=self.do_prediction(patches,img,model_textline)
      File "/usr/local/lib/python3.6/dist-packages/qurator/sbb_textline_detector/main.py", line 284, in do_prediction
        img_patch.reshape(1, img_patch.shape[0], img_patch.shape[1], img_patch.shape[2]))
      File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
        x, _, _ = self._standardize_user_data(x)
      File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
        exception_prefix='input')
      File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
        str(data_shape))
    ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 142, 3)
    

    Ideally the output should be a PAGE-XML file with no region/textlines.

    bug 
    opened by cneud 8
  • IndexError: list index out of range

    IndexError: list index out of range

    I'll have >= 3 images where this error occurs: schiltberger1580/025 schiltberger1580/029 schiltberger1580/033

    I'm using ocr-d, with sbb_textline_detector commit 689822c63aa3d539341dbd50168907c7499976a6 with this model files:

    jb@xxx:~/sbb-models> md5sum *
    348fc636df6d4c323754c6145d80f9e9  model_page_mixed_best.h5
    16ad73f7a6ed08a29e64d2763771ed28  model_strukturerkennung.h5
    cef95b9346922490722c1c2c98d46023  model_textline_new.h5
    

    procedure is:

    ocrd workspace init
    ocrd workspace add -G IMG -i 0001 -m image/tif 0001.tif 
    /usr/bin/time ocrd-olena-binarize -I IMG -O BIN -m mets.xml -p '{"impl":"sauvola"}'
    /usr/bin/time ocrd-sbb-textline-detector -I BIN -O SEG -m mets.xml -p '{"model":"/home/jb/sbb-models"}'
    
    bug 
    opened by jbarth-ubhd 7
  • ValueError: Error when checking input: expected input_1 to have shape (..., ..., ...) but got array with shape (..., ..., ...)

    ValueError: Error when checking input: expected input_1 to have shape (..., ..., ...) but got array with shape (..., ..., ...)

    ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 4, 3)

    Image:

    https://digi.ub.uni-heidelberg.de/diglitData/v/ocrd/lichtwark1932bd2_-_h.tif

    Workflow:

    ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-001 -P model $HOME/ocrd_models/sbb/binarization/models
    ocrd-cis-ocropy-deskew -I OCR-D-001 -O OCR-D-002
    ocrd-sbb-textline-detector -I OCR-D-002 -O OCR-D-003 -P model $HOME/ocrd_models/sbb/textline
    ocrd-calamari-recognize -I OCR-D-003 -O OCR-D-OCR -P checkpoint "$HOME/ocrd_models/calamari/calamari_models/gt4histocr/*.ckpt.json"
    

    Stack trace:

    Using TensorFlow backend.
    11:25:20.867 INFO processor.OcrdSbbTextlineDetectorRecognize - INPUT FILE 0 / <OcrdFile fileGrp=OCR-D-002, 
    ID=OCR-D-002_00001, mimetype=application/vnd.prima.page+xml, url=OCR-D-002/OCR-D-002_00001.xml, 
    local_filename=OCR-D-002/OCR-D-002_00001.xml]/> 
    Traceback (most recent call last):
      File "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/bin/ocrd-sbb-textline-detector", line 
    8, in <module>
        sys.exit(ocrd_sbb_textline_detector())
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", 
    line 829, in __call__
        return self.main(*args, **kwargs)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", 
    line 782, in main
        rv = self.invoke(ctx)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", 
    line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/click/core.py", 
    line 610, in invoke
        return callback(*args, **kwargs)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/qurator/sbb_textli
    ne_detector/ocrd_cli.py", line 32, in ocrd_sbb_textline_detector
        return ocrd_cli_wrap_processor(OcrdSbbTextlineDetectorRecognize, *args, **kwargs)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/decorators/__
    init__.py", line 81, in ocrd_cli_wrap_processor
        run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd/processor/hel
    pers.py", line 70, in run_processor
        processor.process()
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/qurator/sbb_textli
    ne_detector/ocrd_cli.py", line 78, in process
        x.run()
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/qurator/sbb_textli
    ne_detector/main.py", line 2102, in run
        textline_mask_tot=self.textline_contours(image_page)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/qurator/sbb_textli
    ne_detector/main.py", line 496, in textline_contours
        prediction_textline=self.do_prediction(patches,img,model_textline)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/qurator/sbb_textli
    ne_detector/main.py", line 288, in do_prediction
        img_patch.reshape(1, img_patch.shape[0], img_patch.shape[1], img_patch.shape[2]))
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/keras/engine/train
    ing.py", line 1441, in predict
        x, _, _ = self._standardize_user_data(x)
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/keras/engine/train
    ing.py", line 579, in _standardize_user_data
        exception_prefix='input')
      File 
    "/beegfs/home/hd/hd_hd/hd_wu120/ocrd_all/venv/local/sub-venv/headless-tf1/lib/python3.7/site-packages/keras/engine/train
    ing_utils.py", line 145, in standardize_input_data
        str(data_shape))
    ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 4, 3)
    136.34user 12.35system 1:45.21elapsed 141%CPU (0avgtext+0avgdata 1919624maxresident)k
    1365502inputs+26519outputs (1398major+3820225minor)pagefaults 0swaps
    
    opened by jbarth-ubhd 5
  • make Tensorflow less gabby

    make Tensorflow less gabby

    Running the detector prints loads of unnecessary and (to non-developers) confusing TF messages.

    Please consider doing the following to silence them to actual errors:

    • before importing TF (or Keras, or whatever imports TF): os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
    • after importing TF: tf.get_logger().setLevel('ERROR')

    Getting the TF logger to adhere to OCR-D conventions is another story, but there I cannot offer a solution for now.

    opened by bertsky 5
  • Detecting Text-Lines only

    Detecting Text-Lines only

    Thank you for your hard work,

    If I understand correctly, each level of segmentation is independent of the other, therefore I would recommend a flag option to select detecting text-lines only, without region nor page. This would benefit to reduce the memory footprint, and speed up the process.

    opened by ghost 5
  • TextLine coordinates too coarse

    TextLine coordinates too coarse

    Would it be possible to get good polygonal outlines from the text line segmentation instead of coarse bounding boxes?

    There is a stark contrast between the precise contours of the text regions (which never overlap) and the coarse rectangles of text lines inside them (which often extrude beyond their parent and overlap between adjacent lines).

    This makes it risky to apply line-level dewarping afterwards, and requires an OCR engine that can cope with intruders in the line image. In the example given in #29, I get these line images from ocrd-cis-ocropy-dewarp:

    OCR-D-IMG-DEW-SBB_0001_r21_l24

    OCR-D-IMG-DEW-SBB_0001_r21_l25

    OCR-D-IMG-DEW-SBB_0001_r21_l26

    OCR-D-IMG-DEW-SBB_0001_r21_l27

    OCR-D-IMG-DEW-SBB_0001_r21_l28

    OCR-D-IMG-DEW-SBB_0001_r21_l29

    OCR-D-IMG-DEW-SBB_0001_r21_l30

    OCR-D-IMG-DEW-SBB_0001_r21_l31

    OCR-D-IMG-DEW-SBB_0001_r21_l32

    OCR-D-IMG-DEW-SBB_0001_r21_l33

    OCR-D-IMG-DEW-SBB_0001_r21_l34

    opened by bertsky 4
  • replace PrintSpace with Border

    replace PrintSpace with Border

    I'm not sure if this is intentional, but currently the textline detector uses the PrintSpace element for the outer hull of all detected regions. Shouldn't that be Border instead?

    See also:

    • https://github.com/OCR-D/core/issues/488
    • https://ocr-d.de/en/gt-guidelines/trans/lySatzspiegel.html

    Here are the two affected places:

    https://github.com/qurator-spk/sbb_textline_detection/blob/d36b01591d2328fc03f2956ff98b66e50a5f81f5/qurator/sbb_textline_detector/main.py#L1941

    https://github.com/qurator-spk/sbb_textline_detection/blob/d36b01591d2328fc03f2956ff98b66e50a5f81f5/qurator/sbb_textline_detector/ocrd_cli.py#L83-L85

    opened by bertsky 4
  • Very good overall performance, but this one fails?

    Very good overall performance, but this one fails?

    Here the original image: https://digi.ub.uni-heidelberg.de/diglitData/v/blaeu1655bd6_-_00_129.tif

    here the image fed into sbb-textline (binarized etc): https://digi.ub.uni-heidelberg.de/diglitData/v/blaeu1655bd6_-_00_129-binarized.png

    and here the detected segments: grafik

    model used:

    -rw-r--r-- 1 jb users  458969872 Dec 10  2019 /usr/local/ocrd_models/sbb/textline/model_page_mixed_best.h5
    -rw-rw-r-- 1 jb users 1194551551 Feb 13  2020 /usr/local/ocrd_models/sbb/textline/models.tar.gz
    -rw-r--r-- 1 jb users  458970960 Jun 26  2019 /usr/local/ocrd_models/sbb/textline/model_strukturerkennung.h5
    
    opened by jbarth-ubhd 6
  • Good segmentation results but bad OCR when using sbb-textline-detection with OCR-D

    Good segmentation results but bad OCR when using sbb-textline-detection with OCR-D

    I'm not sure whether this is the right place to ask as sbb-textline-detector itself worked perfectly in our OCR-D workflows and the produced segmentation results look good as well but running any recognition (calamari-recognize as well as tesserocr-recognize) afterwards yields weird text output that seems worse than it should be (regarding the good segmentation results).

    I basically used the (formerly) recommended workflow and substituted everything starting from the region segmentation up to the line segmentation with sbb-textline-detector.

    The region segmentation produced by this looks pretty good and this impression is confirmed by the pixel accuracy evaluation we ran for several segmentation workflows (with cis-ocropy-segment, tesserocr-segment-region, …). The line segmentation looks pretty good as well and should probably be a good basis for running OCR on it but as stated above the results are somehow surprisingly bad. I tried to run the recognition directly on the produced segmentation (OCR-D-SEG-LINE) without dewarping first but the results are even worse that way.

    Am I missing something obvious (e.g. adding a certain step after running sbb-textline-detector)?

    Workflow steps
    "olena-binarize -I input -O OCR-D-BIN -P impl sauvola"
    "anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"
    "olena-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P impl kim"
    "cis-ocropy-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page"
    "cis-ocropy-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P level-of-operation page"
    "sbb-textline-detector -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG-LINE -P model /home/mn/Desktop/sbbmodels/mixed"
    "cis-ocropy-dewarp -I OCR-D-SEG-LINE -O OCR-D-SEG-LINE-RESEG-DEWARP"
    "calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /home/mn/Desktop/ocrd_calamari/gt4histocr-calamari/\*.ckpt.json"
        
    Region segmentation output
    Line segmentation output
    Text output!




    ) weglaſſen. — Und — — großen, oder bald einen groͤßern, kleinern Raum dazwiſchen laſſen. ..C 2 ..H. A.. .. ſ. A
    rts, ſondern gerade auf das Papier
    rr uue— rſ — ewoͤhnlichſte Schrift iſt die Current— n .nehr von der Rechten zur Linken g me o ene i die unter oder uͤber die Linie hervor— uchſtaben alle gleich weit hervorragen. roßer Fehler, wenn die Buchſtaben zu br ao ſ — ſ o i ui — in Wort ausmachen, einzeln zu ſchrei— ern ſie muͤſſen, ſo viel moͤglich iſt, ſo en vorhergehenden, als mit den fol— — . ſRa o. ſ X2 itt — — —

    The input image for the example page and the produced PAGE XML can be found here in case it helps.

    bug 
    opened by maxnth 11
Owner
QURATOR-SPK
Curation Technologies
QURATOR-SPK
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Detect and fix skew in images containing text

Alyn Skew detection and correction in images containing text Image with skew Image after deskew Install and use via pip! Recommended way(using virtual

Kakul 230 Dec 21, 2022
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

null 8 Jul 29, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Binarize document images

Binarization Binarization for document images Examples Introduction This tool performs document image binarization (i.e. transform colour/grayscale to

QURATOR-SPK 48 Jan 2, 2023
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 3, 2023
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
huoyijie 1.2k Dec 29, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

doc2text doc2text extracts higher quality text by fixing common scan errors Developing text corpora can be a massive pain in the butt. Much of the tex

Joe Sutherland 1.3k Jan 4, 2023
This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

Pinch-zoom This is a python project based on real-time hand-gesture detection, to zoom in or out, using the distance between the index finger and the

Harshit Bhalla 6 Jul 11, 2022
Detect the mathematical formula from the given picture and the same formula is extracted and converted into the latex code

Mathematical formulae extractor The goal of this project is to create a learning based system that takes an image of a math formula and returns corres

null 6 May 22, 2022
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

null 3 Sep 17, 2022