ARU-Net - Deep Learning Chinese Word Segment

Overview

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents

Contents

Introduction

This is the Tensorflow code corresponding to A Two-Stage Method for Text Line Detection in Historical Documents . This repo contains the neural pixel labeling part described in the paper. It contains the so-called ARU-Net (among others) which is basically an extended version of the well known U-Net [2]. Besides the model and the basic workflow to train and test models, different data augmentation strategies are implemented to reduce the amound of training data needed. The repo's features are summarized below:

  • Inference Demo
    • Trained and freezed tensorflow graph included
    • Easy to reuse for own inference tests
  • Workflow
    • Full training workflow to parametrize and train your own models
    • Contains different models, data augmentation strategies, loss functions
    • Training on specific GPU, this enables the training of several models on a multi GPU system in parallel
    • Easy validation for trained model either using classical or ema-shadow weights

Please cite [1] if you find this repo useful and/or use this software for own work.

Installation

  1. Use python 2.7
  2. Any version of tensorflow version > 1.0 should be ok.
  3. Python packages: matplotlib (>=1.3.1), pillow (>=2.1.0), scipy (>=1.0.0), scikit-image (>=0.13.1), click (>=5.x)
  4. Clone the Repo
  5. Done

Demo

To run the demo follow:

  1. Open a shell
  2. Make sure Tensorflow is available, e.g., go to docker environment, activate conda, ...
  3. Navigate to the repo folder YOUR_PATH/ARU-Net/
  4. Run:
python run_demo_inference.py 

The demo will load a trained model and perform inference for five sample images of the cBad test set [3], [4]. The network was trained to predict the position of baselines and separators for the begining and end of each text line. After running the python script you should see a matplot window. To go to the next image just close it.

Example

The example images are sampled from the cBad test set [3], [4]. One image along with its results are shown below.

image_1 image_2 image_3

Training

This section describes step-by-step the procedure to train your own model.

Train data:

The following describes how the training data should look like:

  • The images along with its pixel ground truth have to be in the same folder
  • For each image: X.jpg, there have to be images named X_GT0.jpg, X_GT1.jpg, X_GT2.jpg, ... (for each channel to be predicted one GT image)
  • Each ground truth image is binary and contains ones at positions where the corresponding class is present and zeros otherwise (see demo_images/demo_traindata for a sample)
  • Generate a list containing row-wise the absolute pathes to the images (just the document images not the GT ones)

Val data:

The following describes how the validation data should look like:

Train the model:

The following describes how to train a model:

  • Have a look at the pix_lab/main/train_aru.py script
  • Parametrize it like you wish (have a look at the data_provider, cost and optimizer scripts to see all parameters)
  • Setting the correct paths, adapting the number of output classes and using the default parametrization should work fine for a first training
  • Run:
python -u pix_lab/main/train_aru.py &> info.log 

Validate the model:

The following describes how to validate a trained model:

  • Train and val losses are printed in info.log
  • To validate the checkpoints using the classical weights as well as its ema-shadows, adapt and run:
pix_lab/main/validate_ckpt.py

Comments

If you are interested in a related problem, this repo could maybe help you as well. The ARU-Net can be used for each pixel labeling task, besides the baseline detection task, it can be easily used for, e.g., binarization, page segmentation, ... purposes.

References

Please cite [1] if using this code.

A Two-Stage Method for Text Line Detection in Historical Documents

[1] T. Grüning, G. Leifert, T. Strauß, R. Labahn, A Two-Stage Method for Text Line Detection in Historical Documents

@article{Gruning2018,
arxivId = {1802.03345},
author = {Gr{\"{u}}ning, Tobias and Leifert, Gundram and Strau{\ss}, Tobias and Labahn, Roger},
title = {{A Two-Stage Method for Text Line Detection in Historical Documents}},
url = {http://arxiv.org/abs/1802.03345},
year = {2018}
}

U-Net: Convolutional Networks for Biomedical Image Segmentation

[2] O. Ronneberger, P, Fischer, T, Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation

@article{Ronneberger2015,
arxivId = {1505.04597},
author = {Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
journal = {Miccai},
pages = {234--241},
title = {{U-Net: Convolutional Networks for Biomedical Image Segmentation}},
year = {2015}
}

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

[3] T. Grüning, R. Labahn, M. Diem, F. Kleber, S. Fiel, READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

@article{Gruning2017,
arxivId = {1705.03311},
author = {Gr{\"{u}}ning, Tobias and Labahn, Roger and Diem, Markus and Kleber, Florian and Fiel, Stefan},
title = {{READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents}},
url = {http://arxiv.org/abs/1705.03311},
year = {2017}
}

A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents

[4] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos, ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

@misc{Diem2017,
author = {Diem, Markus and Kleber, Florian and Fiel, Stefan and Gr{\"{u}}ning, Tobias and Gatos, Basilis},
doi = {10.5281/zenodo.257972},
title = {ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)},
year = {2017}
}
Comments
  • Train and validation loss are always 0.0000

    Train and validation loss are always 0.0000

    Hi,

    I created some ground truth to try your network, and after tweaking some things, I'm able to launch the training and it goes without errors. However, the averge loss for training and validation is always 0 ...

    capture d ecran de 2019-01-22 11-10-14

    Actually, the "raw" loss returned by sess.run(...) is 0.

    Do you have an idea of what could cause that ?

    Thanks !

    opened by apirrone 19
  • Questions

    Questions

    Please can you send me the Bozen dataset with its associated ground truth (ground truth of class separator and baseline) that used in your paper "A Two-Stage Method for Text Line Detection in Historical Documents".

    Please , how we can show the number of trainable parameters architecture in terminal screen using your ARU-Net model.

    Thank you in advance for your help.

    opened by olfaa 3
  • How not to use weight on cbad dataset

    How not to use weight on cbad dataset

    I am trying to use your architecture on the cbad dataset 2017.

    With crossentropyloss I need weights to overcome the unbalanced classes between foreground and background. Neither in the paper or the code weights are mentioned. How did you overcome this problem?

    Also The cbad dataset has around 256 images for training but you trained your model on 1024/ epoch. I also do not know how to get those 1024 images.

    Thanks.

    opened by elyamanyahmed95 3
  • Can I get coordinates of baselines

    Can I get coordinates of baselines

    Hello, I got output channels from few my images and it looks really cool.
    The next evident question is about further segmentation. Is it possible to get array of X-Y coordinates of each given baseline curve? Ideally each word or phrase is bound by rectangle or (better) polygon - defined by its nodes coordinates. As I see in the referenced article baseline curve is vectorized. Is there the code in this project?

    opened by longwall 3
  • demo model fail with low resolution

    demo model fail with low resolution

    I try to inference with document image using pretrained model. With high resolution images, it works fine. But some documents with lower resolution, i.e: 700x800 it 's not working well.

    opened by lynx97 2
  • Dataset about 2017cBAD test xml groundtruth

    Dataset about 2017cBAD test xml groundtruth

    I can't get 2017cBAD dataset about test,where can I get the xml of test dataset's groundtruth?Could you give me a hand.Track A [Simple Documents] and Track B [Complex Documents]. thank you a lot

    opened by lius0-0 2
  • How to input a batch of images during inference rather than just one image

    How to input a batch of images during inference rather than just one image

    I want to accelerate the inference process by inputting a batch of images, I concatenate several images and feed them to the variable 'x' in def inference in inference_pb.py, but the results are wrong while inputting only one image can get the right results.

    opened by jewelc92 2
  • Question

    Question

    Please can you send me the Bozen dataset with its associated ground truth (ground truth of class separator and baseline) that used in your paper "A Two-Stage Method for Text Line Detection in Historical Documents".

    Please ,I want to know how to get the respective pixel level GT.

    Thank you in advance for your help.

    opened by xinxin664 1
  • Train from the demo model

    Train from the demo model

    Hello,

    I would like to train ARU-Net on printed text but keeping what have been done on handwriting. Is there anywhere I can find the checkpoints created during the training of the model100_ema.pb demo file ?

    Thanks !

    opened by Dragule 1
  • Overfitting issue

    Overfitting issue

    Hello it's me again !

    I was using ARU-Net to find baselines in my old documents with quite a lot of success, training over 100 epochs seemed to yield good enough results, with coherent training and validation loss values (validation slightly over training in the end). However, I tried running it for 250 epochs, and I noticed that the validation loss keeps getting higher after around 100 epochs, as can be seen here :

    plot

    I don't have a lot of data (about 200 labeled images), so I thought maybe the distribution of validation and training data (80% of training, 20% of validation, randomly distributed) was unfortunate on this example. But here is what I get running the experiment 13 times (I'm showing the average of loss at each epoch), each time 250 epochs with a new random distribution each time :

    overfit

    Do you have an idea of what could cause that ?

    Thanks !

    opened by apirrone 1
  • Ground Truth images

    Ground Truth images

    Hello, I want to train network with another dataset.

    I read the paper but can't get some parts. Can you please explain how can I generate pixel ground truth. I need some example of ground truth file (X_GT1.jpg,X_GT2.jpg, X_GT3.jpg). If I had an example of Ground truth data, I could generate others.

    Thank you

    opened by alperdemir03 1
  • [Get text]

    [Get text]

    Hi sir, Your project is so sexy to me. And I hope to apply this nice contribution to detection text. I mean I detected my line text but I need the separation in this line. I try to find contours in your output and then matching them with box line but It's hard to do. Can you give me some idea?

    opened by cuonghv3 1
  • input image size

    input image size

    Hello and thanks for your great work. I'm working on scanned Persian documents and I want to extract their lines. In many images ARUnet gives good results but for very small and very large images it fails! Is there any restrictions on the size of the input image?

    opened by Mchapariniya 3
  • demo weights and training times

    demo weights and training times

    Do the demo weights come from training only on Track A, only on Track B, or on both tracks?

    Right now, training on Track A using 1024 samples/epoch takes ~20min/epoch and training on both training sets of Track A and Track B for the same number of samples takes ~40 min/epoch (possibly because Track B images are larger) (Nvidia GTX 1080). Given that on the paper it's stated that it takes 3h to 24h to train the ARU-Net, do these times make sense or am I doing something wrong, (because even for 20min/epoch it will take much longer than 24h to reach epoch 250).

    opened by xen0f0n 5
Owner
null
Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.

Learning to Segment Every Thing This repository contains the code for the following paper: R. Hu, P. Dollár, K. He, T. Darrell, R. Girshick, Learning

Ronghang Hu 417 Oct 3, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 3, 2023
CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

简介 基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别 文本检测:CTPN 文本识别:DenseNet + CTC 环境部署 sh setup.sh 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释 Demo 将测试图片放入test_images

Yang Chenguang 2.6k Dec 29, 2022
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 8, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

null 3 Jul 13, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

null 186 Dec 29, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 6, 2022
a deep learning model for page layout analysis / segmentation.

OCR Segmentation a deep learning model for page layout analysis / segmentation. dependencies tensorflow1.8 python3 dataset: uw3-framed-lines-degraded-

null 99 Dec 12, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Generate text images for training deep learning ocr model

New version release:https://github.com/oh-my-ocr/text_renderer Text Renderer Generate text images for training deep learning OCR model (e.g. CRNN). Su

Qing 1.2k Jan 4, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 8, 2022