ocroseg - This is a deep learning model for page layout analysis / segmentation.

Overview

ocroseg

This is a deep learning model for page layout analysis / segmentation.

There are many different ways in which you can train and run it, but by default, it will simply return the text lines in a page image.

Segmentation

Segmentation is carried out using the ocroseg.Segmenter class. This needs a model that you can download or train yourself.

%%bash
model=lowskew-000000259-011440.pt
test -f $model || wget --quiet -nd https://storage.googleapis.com/tmb-models/$model
%pylab inline
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
Populating the interactive namespace from numpy and matplotlib

The Segmenter object handles page segmentation using a DL model.

import ocroseg
seg = ocroseg.Segmenter("lowskew-000000259-011440.pt")
seg.model
Sequential(
  (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (6): ReLU()
  (7): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (10): ReLU()
  (11): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (12): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
  (13): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (14): ReLU()
  (15): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(32, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (16): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
  (17): Sigmoid()
)

Let's segment a page with this.

image = 1.0 - imread("testdata/W1P0.png")[:2000]
print image.shape
imshow(image)
(2000, 2592)





<matplotlib.image.AxesImage at 0x7f6078b09690>

png

The extract_textlines method returns a list of text line images, bounding boxes, etc.

lines = seg.extract_textlines(image)
imshow(lines[0]['image'])
<matplotlib.image.AxesImage at 0x7f60781c05d0>

png

The segmenter accomplishes this by predicting seeds for each text line. With a bit of mathematical morphology, these seeds are then extended into a text line segmentation.

imshow(seg.lines)
<matplotlib.image.AxesImage at 0x7f60781a5510>

png

Training

The text line segmenter is trained using pairs of page images and line images stored in tar files.

%%bash
tar -ztvf testdata/framedlines.tgz | sed 6q
-rw-rw-r-- tmb/tmb      110404 2017-03-19 16:47 A001BIN.framed.png
-rw-rw-r-- tmb/tmb       10985 2017-03-16 16:15 A001BIN.lines.png
-rw-rw-r-- tmb/tmb       74671 2017-03-19 16:47 A002BIN.framed.png
-rw-rw-r-- tmb/tmb        8528 2017-03-16 16:15 A002BIN.lines.png
-rw-rw-r-- tmb/tmb      147716 2017-03-19 16:47 A003BIN.framed.png
-rw-rw-r-- tmb/tmb       12023 2017-03-16 16:15 A003BIN.lines.png


tar: write error
from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/framedlines.tgz")).next()
subplot(121); imshow(sample["framed.png"])
subplot(122); imshow(sample["lines.png"])
<matplotlib.image.AxesImage at 0x7f60e3d9bc10>

png

There are also some tools for data augmentation.

Generally, you can train these kinds of segmenters on any kind of image data, though they work best on properly binarized, rotation and skew-normalized page images. Note that by conventions, pages are white on black. You need to make sure that the model you load matches the kinds of pages you are trying to segment.

The actual models used are pretty complex and require LSTMs to function well, but for demonstration purposes, let's define and use a tiny layout analysis model. Look in bigmodel.py for a realistic model.

%%writefile tinymodel.py
def make_model():
    r = 3
    model = nn.Sequential(
        nn.Conv2d(1, 8, r, padding=r//2),
        nn.ReLU(),
        nn.MaxPool2d(2, 2),
        nn.Conv2d(8, 1, r, padding=r//2),
        nn.Sigmoid()
    )
    return model
Writing tinymodel.py
%%bash
./ocroseg-train -d testdata/framedlines.tgz --maxtrain 10 -M tinymodel.py --display 0
raw sample:
__key__ 'A001BIN'
__source__ 'testdata/framedlines.tgz'
lines.png float32 (3300, 2592)
png float32 (3300, 2592)

preprocessed sample:
__key__ <type 'list'> ['A002BIN']
__source__ <type 'list'> ['testdata/framedlines.tgz']
input float32 (1, 3300, 2592, 1)
mask float32 (1, 3300, 2592, 1)
output float32 (1, 3300, 2592, 1)

ntrain 0
model:
Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (3): Conv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Sigmoid()
)

0 0 ['A006BIN'] 0.24655306 ['A006BIN'] 0.31490618 0.55315816 lr 0.03
1 1 ['A007BIN'] 0.24404158 ['A007BIN'] 0.30752876 0.54983306 lr 0.03
2 2 ['A004BIN'] 0.24024434 ['A004BIN'] 0.31007746 0.54046077 lr 0.03
3 3 ['A008BIN'] 0.23756175 ['A008BIN'] 0.30573484 0.5392694 lr 0.03
4 4 ['A00LBIN'] 0.22300518 ['A00LBIN'] 0.28594157 0.52989864 lr 0.03
5 5 ['A00MBIN'] 0.22032338 ['A00MBIN'] 0.28086954 0.52204597 lr 0.03
6 6 ['A00DBIN'] 0.22794804 ['A00DBIN'] 0.27466372 0.512208 lr 0.03
7 7 ['A009BIN'] 0.22404794 ['A009BIN'] 0.27621177 0.51116604 lr 0.03
8 8 ['A001BIN'] 0.22008553 ['A001BIN'] 0.27836022 0.5008192 lr 0.03
9 9 ['A00IBIN'] 0.21842314 ['A00IBIN'] 0.26755702 0.4992323 lr 0.03
You might also like...
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

CVPR 2021 Oral paper
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.
~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

cosc428-structor I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Convent

Unofficial implementation of
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Generate text images for training deep learning ocr model
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

ARU-Net - Deep Learning Chinese Word Segment
ARU-Net - Deep Learning Chinese Word Segment

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents Contents Introduction Installation Demo Training Introduction This is the

Comments
  • cannot find module named dlmodels?

    cannot find module named dlmodels?

    seg = ocroseg.Segmenter("lowskew-000000259-011440.pt") Traceback (most recent call last): File "", line 1, in File "ocroseg/segmentation.py", line 300, in init self.model = torch.load(mname) File "/home/sunday/dingyk/tf2/local/lib/python2.7/site-packages/torch/serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "/home/sunday/dingyk/tf2/local/lib/python2.7/site-packages/torch/serialization.py", line 542, in _load result = unpickler.load() File "/home/sunday/dingyk/tf2/local/lib/python2.7/site-packages/dltrainers/init.py", line 4, in from trainers import * File "/home/sunday/dingyk/tf2/local/lib/python2.7/site-packages/dltrainers/trainers.py", line 16, in import dlmodels as dlm ImportError: No module named dlmodels

    opened by happog 3
  • ocroseg-train [ unrecognized arguments: -M bigmodel.py ]

    ocroseg-train [ unrecognized arguments: -M bigmodel.py ]

    @tmbdev thank you for your hard work.

    When trying to train a new model:

    home@home-lnx:~/ocroseg$ ocroseg-train -d testdata/framedlines.tgz --maxtrain 10 -M bigmodel.py --display 0
    usage: train a page segmenter [-h] [-l LR] [-b BATCHSIZE] [-o OUTPUT]
    .
    .
    .
    train a page segmenter: error: unrecognized arguments: -M bigmodel.py
    

    Some help would be appreciated @felarof99 @dbmdz @kba @leeoniya

    opened by ghost 1
  • CPU version of ocroseg [Cannot initialize CUDA without ATen_cuda library]

    CPU version of ocroseg [Cannot initialize CUDA without ATen_cuda library]

    @tmbdev when running ocroseg-train -d framedlines.tgz an error message appears, maybe because I don't have an nvidia graphics card, any help is appreciated:

    (python27) home@home-lnx:~/ocroseg/testdata$ ocroseg-train -d framedlines.tgz
    raw sample:
    __key__ 'A001BIN'
    __source__ 'framedlines.tgz'
    lines.png float32 (3300, 2592)
    png float32 (3300, 2592)
    
    preprocessed sample:
    __key__ <type 'list'> ['A006BIN']
    __source__ <type 'list'> ['framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    
    Traceback (most recent call last):
      File "/home/home/anaconda2/envs/python27/bin/ocroseg-train", line 184, in <module>
        model.cuda()
      File "/home/home/anaconda2/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
        return self._apply(lambda t: t.cuda(device))
      File "/home/home/anaconda2/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
        module._apply(fn)
      File "/home/home/anaconda2/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 191, in _apply
        param.data = fn(param.data)
      File "/home/home/anaconda2/envs/python27/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
        return self._apply(lambda t: t.cuda(device))
    RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.
    
    opened by ghost 0
  • ocroseg-train stops

    ocroseg-train stops

    My machine specification is: 8 CPU / Quadro M4000 / 30 GB RAM / 100 GB When running ocroseg-train it gets stopped after a couple of minutes, whats wrong?

    (python2) paperspace@ps7w0ic92:~/ocropus3/ocroseg$ ocroseg-train -d ./testdata/framedlines.tgz --maxtrain 10 --display 0
    raw sample:
    __key__ 'A001BIN'
    __source__ './testdata/framedlines.tgz'
    lines.png float32 (3300, 2592)
    png float32 (3300, 2592)
    
    preprocessed sample:
    __key__ <type 'list'> ['A003BIN']
    __source__ <type 'list'> ['./testdata/framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    
    ntrain 0
    model:
    Sequential(
      (0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (4): Conv2d(10, 20, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (5): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (6): ReLU()
      (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (8): Conv2d(20, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (10): ReLU()
      (11): LSTM2(
        (hlstm): RowwiseLSTM(
          (lstm): LSTM(40, 20, bidirectional=1)
        )
        (vlstm): RowwiseLSTM(
          (lstm): LSTM(40, 20, bidirectional=1)
        )
      )
      (12): Conv2d(40, 20, kernel_size=(1, 1), stride=(1, 1))
      (13): BatchNorm2d(20, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (14): ReLU()
      (15): LSTM2(
        (hlstm): RowwiseLSTM(
          (lstm): LSTM(20, 20, bidirectional=1)
        )
        (vlstm): RowwiseLSTM(
          (lstm): LSTM(40, 20, bidirectional=1)
        )
      )
      (16): Conv2d(40, 1, kernel_size=(1, 1), stride=(1, 1))
      (17): Sigmoid()
    )
    
    /home/paperspace/anaconda3/envs/python2/lib/python2.7/site-packages/dltrainers/layers.py:421: UserWarning: volatile was removed (Variable.volatile is always False)
      volatile = not isinstance(img, Variable) or img.volatile
    __key__ <type 'list'> ['A002BIN']
    __source__ <type 'list'> ['./testdata/framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    
    __key__ <type 'list'> ['A00FBIN']
    __source__ <type 'list'> ['./testdata/framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    .
    .
    .
    .
    __key__ <type 'list'> ['A00MBIN']
    __source__ <type 'list'> ['./testdata/framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    
    __key__ <type 'list'> ['A004BIN']
    __source__ <type 'list'> ['./testdata/framedlines.tgz']
    input float32 (1, 3300, 2592, 1)
    mask float32 (1, 3300, 2592, 1)
    output float32 (1, 3300, 2592, 1)
    
    Killed
    (python2) paperspace@ps7w0ic92:~/ocropus3/ocroseg$ 
    
    opened by ghost 1
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. ?? Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

null 17 Dec 8, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

null 22 Dec 8, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

OCR-D 59 Sep 10, 2022