P2PaLA
Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.
If you find this toolkit useful in your research, please cite:
@misc{p2pala2017,
author = {Lorenzo QuirΓ³s},
title = {P2PaLA: Page to PAGE Layout Analysis tookit},
year = {2017},
publisher = {GitHub},
note = {GitHub repository},
howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}
Check this paper for more details Arxiv.
Requirements
- Linux (OSX may work, but untested.).
- Python (2.7, 3.6 under conda virtual environment is recomended)
- Numpy
- PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
- OpenCv (3.4.5.20).
- NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
- tensorboard-pytorch (v0.9) [Optional].
pip install tensorboardX
> A diferent conda env is recomended to keep tensorflow separated from PyTorch
Install
python setup.py install
To install python dependencies alone, use requirements file
conda env create --file conda_requirements.yml
Usage
- Input data must follow the folder structure
data_tag/page
, where images must be into thedata_tag
folder and xml files intopage
. For example:
mkdir -p data/{train,val,test,prod}/page;
tree data;
data
βββ prod
β βββ page
β β βββ prod_0.xml
β β βββ prod_1.xml
β βββ prod_0.jpg
β βββ prod_1.jpg
βββ test
β βββ page
β β βββ test_0.xml
β β βββ test_1.xml
β βββ test_0.jpg
β βββ test_1.jpg
βββ train
β βββ page
β β βββ train_0.xml
β β βββ train_1.xml
β βββ train_0.jpg
β βββ train_1.jpg
βββ val
βββ page
β βββ val_0.xml
β βββ val_1.xml
βββ val_0.jpg
βββ val_1.jpg
- Run the tool.
python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
β Pre-trained models available here
- Use TensorBoard to visualize train status:
tensorboard --logdir ./work/runs
- xml-PAGE files must be at "./work/results/test/"
We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.
- For detail about arguments and config file, see docs or
python P2PaLa.py -h
. - For more detailed example see egs:
License
GNU General Public License v3.0 See LICENSE to see the full text.
Acknowledgments
Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix