docstrum

Overview

Docstrum Algorithm

Getting Started

This repo is for developing a Docstrum algorithm presented by O’Gorman (1993).

Disclaimer

This source code is built on top of the work by Chadoliver. Please find the original code from here (https://github.com/chadoliver/cosc428-structor).

Objective

This project aims at segmenting a document image into meaningful components. The domain of image is specified on historical machine-printed/hand-written document image.

Dependencies

  • python 2.7
  • Packages:
    • numpy
    • cv2

Process

Evaluation

  • TBD

Citing Docstrum

O'Gorman, L., 1993. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11), pp.1162-1173. pdf.

@article{o1993document,
  title={The document spectrum for page layout analysis},
  author={O'Gorman, Lawrence},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  volume={15},
  number={11},
  pages={1162--1173},
  year={1993},
  publisher={IEEE}
}

Notes

How to remove .DS_Store

find . -name '.DS_Store' -type f -delete
You might also like...
Comments
  • Docstrum.ipynb fail to run

    Docstrum.ipynb fail to run

    when run Docstrum.ipynb in jupyter notebook,i get this error:


    NameError Traceback (most recent call last) in () ----> 1 peakind = signal.find_peaks_cwt([10,20,20,50,20,20,30], np.arange(1,10))

    NameError: name 'signal' is not defined

    How to solve this?

    opened by polynesia 1
Owner
Chulwoo Mike Pack
Ph.D. Student at University of Nebraska - Lincoln. Research Topic: Image Processing & Machine Learning
Chulwoo Mike Pack