Page to PAGE Layout Analysis Tool

Overview

P2PaLA

Python Version Code Style

Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.

πŸ’₯ Try our new DEMO for online baseline detection. ❗ ❗

If you find this toolkit useful in your research, please cite:

@misc{p2pala2017,
  author = {Lorenzo QuirΓ³s},
  title = {P2PaLA: Page to PAGE Layout Analysis tookit},
  year = {2017},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}

Check this paper for more details Arxiv.

Requirements

  • Linux (OSX may work, but untested.).
  • Python (2.7, 3.6 under conda virtual environment is recomended)
  • Numpy
  • PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
  • OpenCv (3.4.5.20).
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
  • tensorboard-pytorch (v0.9) [Optional]. pip install tensorboardX > A diferent conda env is recomended to keep tensorflow separated from PyTorch

Install

python setup.py install

To install python dependencies alone, use requirements file conda env create --file conda_requirements.yml

Usage

  1. Input data must follow the folder structure data_tag/page, where images must be into the data_tag folder and xml files into page. For example:
mkdir -p data/{train,val,test,prod}/page;
tree data;
data
β”œβ”€β”€ prod
β”‚   β”œβ”€β”€ page
β”‚   β”‚   β”œβ”€β”€ prod_0.xml
β”‚   β”‚   └── prod_1.xml
β”‚   β”œβ”€β”€ prod_0.jpg
β”‚   └── prod_1.jpg
β”œβ”€β”€ test
β”‚   β”œβ”€β”€ page
β”‚   β”‚   β”œβ”€β”€ test_0.xml
β”‚   β”‚   └── test_1.xml
β”‚   β”œβ”€β”€ test_0.jpg
β”‚   └── test_1.jpg
β”œβ”€β”€ train
β”‚   β”œβ”€β”€ page
β”‚   β”‚   β”œβ”€β”€ train_0.xml
β”‚   β”‚   └── train_1.xml
β”‚   β”œβ”€β”€ train_0.jpg
β”‚   └── train_1.jpg
└── val
    β”œβ”€β”€ page
    β”‚   β”œβ”€β”€ val_0.xml
    β”‚   └── val_1.xml
    β”œβ”€β”€ val_0.jpg
    └── val_1.jpg
  1. Run the tool.
python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"

❗ Pre-trained models available here

  1. Use TensorBoard to visualize train status:
tensorboard --logdir ./work/runs
  1. xml-PAGE files must be at "./work/results/test/"

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

  1. For detail about arguments and config file, see docs or python P2PaLa.py -h.
  2. For more detailed example see egs:
    • Bozen dataset see
    • cBAD complex competition dataset see
    • OHG dataset see

License

GNU General Public License v3.0 See LICENSE to see the full text.

Acknowledgments

Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix

Comments
  • RTX cards require minimum Pytorch 1.0 [CUDNN_STATUS_EXECUTION_FAILED]

    RTX cards require minimum Pytorch 1.0 [CUDNN_STATUS_EXECUTION_FAILED]

    On my Linux mint 19.1 using an RTX 2070

    When trying to recognize using the default installation:

    (p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
    2019-01-21 13:42:19,280 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
    2019-01-21 13:42:19,282 - P2PaLA - INFO - Working on prod inference...
    2019-01-21 13:42:19,283 - P2PaLA - INFO - Results will be saved to ./work/results/prod
    2019-01-21 13:42:19,599 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
    /home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/cuda/__init__.py:95: UserWarning: 
        Found GPU0 GeForce RTX 2070 which requires CUDA_VERSION >= 9000 for
         optimal performance and fast startup time, but your PyTorch was compiled
         with CUDA_VERSION 8000. Please install the correct PyTorch binary
         using instructions from http://pytorch.org
        
      warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
    

    So I installed latest torch and torchvision:

    (p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ pip install --ignore-installed torch torchvision
    

    Then ran recognition:

    (p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
    /home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
      warnings.warn(warning.format(ret))
    2019-01-21 13:58:31,771 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
    2019-01-21 13:58:31,773 - P2PaLA - INFO - Working on prod inference...
    2019-01-21 13:58:31,774 - P2PaLA - INFO - Results will be saved to ./work/results/prod
    2019-01-21 13:58:32,125 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
    2019-01-21 13:58:34,859 - P2PaLA - INFO - Preprocessing data from ./images/
    P2PaLA.py:1195: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
      pr_x = Variable(sample["image"], volatile=True)
    THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
    2019-01-21 13:58:35,463 - P2PaLA - INFO - Production stage done. total time taken: 0.604010820388794
    2019-01-21 13:58:35,463 - P2PaLA - INFO - Average time per page: 0.604010820388794
    2019-01-21 13:58:35,463 - P2PaLA - INFO - All Done...
    

    Now the problem is when trying to train

    (p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
    /home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
      warnings.warn(warning.format(ret))
    2019-01-21 14:06:09,788 - optparse - INFO - Reading configuration from config_BL_only.txt
    2019-01-21 14:06:09,789 - optparse - DEBUG - Creating output dir: ./work_BL_only
    2019-01-21 14:06:09,790 - optparse - DEBUG - Creating checkpoints dir: ./work_BL_only/checkpoints
    2019-01-21 14:06:09,790 - P2PaLA - INFO - Working on training stage...
    2019-01-21 14:06:09,791 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
    2019-01-21 14:06:09,791 - P2PaLA - INFO - Preprocessing data from ./data/train
    /home/home/Desktop/programs/P2PaLA/nn_models/models.py:293: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
      init.uniform(m.weight.data, 0.0, 0.02)
    /home/home/Desktop/programs/P2PaLA/nn_models/models.py:298: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
      init.uniform(m.weight.data, 1.0, 0.02)
    THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
    Traceback (most recent call last):
      File "P2PaLA.py", line 1262, in <module>
        main()
      File "P2PaLA.py", line 606, in main
        epoch_lossD += d_loss.data[0]
    IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
    
    opened by ghost 14
  • Is it possible for fine tune the existing model for different regions?

    Is it possible for fine tune the existing model for different regions?

    Hi,

    I want to extend this work to find the regions on different documents with different regions (other than defined in the regions list regions = ["$tip", "$par", "$not", "$nop", "$pag"]. Is this possible? If so, what change will be needed ? I am new to segmentation problems, please guide.

    @lquirosd

    opened by lordzuko 6
  • How to see output polygon drawn on an input image?

    How to see output polygon drawn on an input image?

    Please advise on utility to draw the output baseline polygon on source image. I see an example on your online page(btw it doesn't work this time, skips the Process button). But I can't fond a way to look at it on my machine. If any built-in function please write the cli command: cmd data/source_img.jpg work/results_prod/page/source_img.xml
    -> to produce drawn countor on source image. In my run there are zero-size jpg files in work/results/prod
    Probably it's a sort of error so it's interesting to draw xml lines over image(s).

    opened by longwall 4
  • Baseline + polygon detection of handwriting

    Baseline + polygon detection of handwriting

    Hello. I have ran P2Pala succesfully against typewritten and print with the default model, but I am not getting very good results when running against handwriting. Is it possible to train a model to work against handwriting, and if so, what kind of ground truth, and how much is required?

    opened by stevethearkiv 4
  • TextLine region

    TextLine region

    @lquirosd

    Currently, I am trying to train p2pala to recognize the "TextLine" regions not baselines. How exactly can I do that, how can I select the default TextLine region itself

    An example page-xml is attached along with my training config txt file. sample.zip

    Waiting for your reply

    opened by mrocr 4
  • "No region type defined for r1 at 00001096"

    I've used P2PaLA to train a Document Layout Analysis model for zone segmentation with PRImA Layout Analysis Dataset. PRImA dataset use PAGE XML with 2010 schema version. When model load data from corpus, I've got:

    No region type defined for r1 at 0000001096 Element type "Node" undefined on color dic, set to default=175

    This message have happened for all <TextRegion> in the XML file. After training phase have done, I see the result/test and nothing was predicted. I don't know what it mean. Please help me. Thanks.

    opened by vndee 4
  • error while running the pre trained model in google colab

    error while running the pre trained model in google colab

    the error is as shown below: 2020-04-02 09:33:10,254 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18_inference.txt 2020-04-02 09:33:10,263 - P2PaLA - INFO - Working on prod inference... 2020-04-02 09:33:10,268 - P2PaLA - INFO - Results will be saved to ./work/results/prod 2020-04-02 09:33:10,844 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth 2020-04-02 09:33:11,408 - P2PaLA - INFO - Preprocessing data from ./images Premature end of JPEG file Traceback (most recent call last): File "P2PaLA.py", line 1268, in main() File "P2PaLA.py", line 1250, in main out_folder=res_path, File "/content/gdrive/My Drive/P2PaLA-master/data/imgprocess.py", line 187, in gen_page os.path.realpath(self.img_data[img_id]), os.path.join(out_folder, img_name) File "/content/gdrive/My Drive/P2PaLA-master/data/imgprocess.py", line 503, in symlink_force raise e File "/content/gdrive/My Drive/P2PaLA-master/data/imgprocess.py", line 497, in symlink_force os.symlink(target, link_name) OSError: [Errno 95] Operation not supported: '/content/gdrive/My Drive/P2PaLA-master/images/5.jpeg' -> './work/results/prod/5.jpeg'

    opened by akshay94950 3
  • require opencv-python-headless variant

    require opencv-python-headless variant

    The requirements.txt currently lists opencv-python, which drags in libraries for windowing systems like X11 – not ideal for headless servers. The Python bindings for OpenCV do offer other build variants like -contrib and -headless though (which are on PyPI for many platforms, and can also be built by ENABLE_HEADLESS=1 python setup.py bdist_wheel).

    Therefore I suggest switching to opencv-python-headless instead.

    opened by bertsky 2
  • Any chance to see more pre-trained models?

    Any chance to see more pre-trained models?

    Hello, I played a little with the provided model with the weights ALAR_min_model_17_12_18.pth As to the word min in the name I wonder ther are other models. Do you plan to publish it?

    I've thouigh out some tricks to improve the accuracy of coverage the htr text regions but it's still rough. In many cases important parts of letters are cropped out. I don't have neither hardware nor labeled datasets for the trainig. Could you share a more powerful model?

    opened by longwall 2
  • [Enhancement] Page-XML extractor

    [Enhancement] Page-XML extractor

    To adapt the script to extract information/coordinates about page, from XML's formats most knowledge in the industry, like YOLO and PASCAL/VOC.

    Observation: I could help with this task.

    opened by EvertonTomalok 2
  • JoseRPrietoF version?

    JoseRPrietoF version?

    @JoseRPrietoF What is different in your version of P2PaLA?

    Modified version to do table segmentation and act separation on Passau and Chancery corpus.

    • what do you mean by act separation?
    • also, when you say table segmentation, do you mean it detects baseline, and then extracts text?
    opened by ghost 2
  • Bump opencv-python from 3.4.5.20 to 4.2.0.32

    Bump opencv-python from 3.4.5.20 to 4.2.0.32

    Bumps opencv-python from 3.4.5.20 to 4.2.0.32.

    Release notes

    Sourced from opencv-python's releases.

    4.2.0.32

    OpenCV version 4.2.0.

    Changes:

    • macOS environment updated from xcode8.3 to xcode 9.4
    • macOS uses now Qt 5 instead of Qt 4
    • Nasm version updated to Docker containers
    • multibuild updated

    Fixes:

    • don't use deprecated brew tap-pin, instead refer to the full package name when installing #267
    • replace get_config_var() with get_config_vars() in setup.py #274
    • add workaround for DLL errors in Windows Server #264

    3.4.9.31

    OpenCV version 3.4.9.

    Changes:

    • macOS environment updated from xcode8.3 to xcode 9.4
    • macOS uses now Qt 5 instead of Qt 4
    • Nasm version updated to Docker containers
    • multibuild updated

    Fixes:

    • don't use deprecated brew tap-pin, instead refer to the full package name when installing #267
    • replace get_config_var() with get_config_vars() in setup.py #274
    • add workaround for DLL errors in Windows Server #264

    4.1.2.30

    OpenCV version 4.1.2.

    Changes:

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Lorenzo QuirΓ³s DΓ­az
Lorenzo QuirΓ³s DΓ­az
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 6, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

null 99 Dec 12, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

null 22 Dec 8, 2022
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
Document Layout Analysis

Eynollah Document Layout Analysis Introduction This tool performs document layout analysis (segmentation) from image data and returns the results as P

QURATOR-SPK 198 Dec 29, 2022
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

OCR-D 59 Sep 10, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
Simple app for visual editing of Page XML files

Name nw-page-editor - Simple app for visual editing of Page XML files. Version: 2021.02.22 Description nw-page-editor is an application for viewing/ed

Mauricio Villegas 27 Jun 20, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

UniversitΓ€tsbibliothek Mannheim 152 Dec 20, 2022
~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.

cosc428-structor I had an open-ended Computer Vision assignment to complete, and an out-of-copyright book that I wanted to turn into an ebook. Convent

Chad Oliver 45 Dec 6, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.

CellProfiler 732 Dec 23, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
A post-processing tool for scanned sheets of paper.

unpaper Originally written by Jens Gulden β€” see AUTHORS for more information. Licensed under GNU GPL v2 β€” see COPYING for more information. Overview u

null 27 Dec 7, 2022