Corner-based Region Proposal Network

Related tags

Computer Vision crpn
Overview

Corner-based Region Proposal Network

CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possible locations of text instances and a region-wise subnetwork for further classification and regression. In our experiments, it achieves F-measure of 0.876 and 0.845 on ICDAR 2013 and 2015 respectively. The paper is available at arXiv.

Installation

This code is based on Caffe and py-faster-rcnn. It has been tested on Ubuntu 16.04 with CUDA 8.0.

  1. Clone this repository

    git clone https://github.com/xhzdeng/crpn.git
    
  2. Build Caffe and pycaffe

    cd $CRPN_ROOT/caffe-fast-rcnn
    make -j8 && make pycaffe
    
  3. Build the Cython modules

    cd $CRPN_ROOT/lib
    make
    
  4. Prepare your own training data directory. For convenience, it should have this basic structure.

    $VOCdevkit/
    $VOCdevkit/VOC2007                    # image sets, annotations, etc. 
    

    And create symlinks for YOUR dataset

    cd $CRPN_ROOT/data
    ln -s [path] VOCdevkit
    
  5. Download pretrained ImageNet VGG-16 model. You can find it at Caffe Model Zoo.

  6. Train with YOUR dataset

    cd $CRPN_ROOT
    ./experiments/scripts/train.sh [NET] [MODEL] [DATASET] [ITER_NUM]
    # NET is the network arch to use, only {vgg16} in this implemention
    # MODEL is the pre-trained model you want to use to initial your weights
    # DATASET points to your dataset, please refer the contents of train.sh
    # IETR_NUM 
    
  7. Test with YOUR models

    cd $CRPN_ROOT
    ./experiments/scripts/test.sh [NET] [MODEL] [DATASET]
    # NET is the network arch to use, only {vgg16} in this implemention
    # MODEL is the testing model
    # DATASET points to your dataset, please refer the contents of test.sh
    

    Test outputs are saved under:

    output/<experiment directory>/<dataset name>/<network snapshot name>/
    

Demo

```
cd $CRPN_ROOT
./tools/demo.py --net [NET] --model [MODEL]
# NET is the network arch to use, only {vgg16} in this implemention
# MODEL is the path of caffemodel you want to use
```

Models

Now, you can download the pretrained model from OneDrive or BaiduYun, which is trained 100k iters on SynthText. I also have uploaded a testing model trained recently. It achieves an F-measure of 0.8456 at 840p resolution on ICDAR 2015, similar performance but slightly faster than we depicted in the paper.

Citation

If you find the paper and code useful in your research, please consider citing:

@article{deng2018crpn,
    Title = {Detecting Multi-Oriented Text with Corner-based Region Proposals},
    Author = {Linjie Deng and Yanxiang Gong and Yi Lin and Jingwen Shuai and Xiaoguang Tu and Yufei Zhang and Zheng Ma and Mei Xie},
    Journal = {arXiv preprint arXiv:1804.02690},
    Year = {2018}
}
Comments
  • demo 出错

    demo 出错

    F0416 21:56:18.036826 12284 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (75 vs. 76) All inputs must have the same shape, except at concat_axis. *** Check failure stack trace: *** Aborted (core dumped)

    下载了您的代码,在运行demo.py的时候出现了这样的问题,怎么解决?

    opened by zq130320339 8
  • Minor error on CTPN performance in your paper (page 11, table 2)

    Minor error on CTPN performance in your paper (page 11, table 2)

    CTPN's performance on ICDAR 2013 dataset is F-measure 0.88 (R 0.83, P 0.93), not 0.822 in your paper (table 2 on page 11.) They published test model at https://github.com/tianzhi0549/CTPN (downloadable test model has F 0.86 on ICDAR 2013, which does not include side refinement feature described in the paper.)

    CTPN does not handle angled text well, though. Their score on 2015 dataset is quite low.

    opened by jwnsu 4
  • Nice work & paper, trained model file available for download outside China?

    Nice work & paper, trained model file available for download outside China?

    Thanks for sharing the paper and repository. Wondering where to download your pretrained model file so that we can try with ICDAR 13 and 15 dataset. A pointer to training dataset will be great too.

    opened by jwnsu 4
  • 关于训练中遇到的几个问题,希望得到建议

    关于训练中遇到的几个问题,希望得到建议

    论文中提到,将ICDAR2015将短边resize到900,ICDAR2013将短边resize到640,这是靠反复实验得到的一个参数吗?或者说有没有什么选择的方法?作者有没有考虑过multi-scale的测试?我换了数据,loss在训练一千次之后,一直到十万次,数值一直是在零点几到二点几之间波动,learning rate已经设置到1*e-5,感觉一直没有收敛,请问作者在之前实验过程中遇到过吗?

    opened by 741077510 3
  • the third step in searching and grouping corners candidates

    the third step in searching and grouping corners candidates

    i am confused with the sentences below "For each diagonal, select any one corner from last two types and rotate the diagonal until three points (two endpoints and the third one) are collinear, then a quadrilateral proposal determined by those two diagonals will be obtained." how could three conners determine a quadrilateral proposal? while rotating the diagonal, where is the fixed point?

    opened by SchumannTian 3
  • 将自己的数据集做成的VOC格式不知道是否存在问题?

    将自己的数据集做成的VOC格式不知道是否存在问题?

    查看了作者读取数据集的函数,发现读取的是四个点八个坐标值,与网上的教程“将ICDAR21015转化为VOC2007”有所区别,那里提及的bndbox是以x,y,w,h,angle的格式制作annotations。我参照作者的代码,将数据集制作为如下格式,不知道与作者用于训练的VOC2007数据集的格式是否有差异? image

    opened by 741077510 2
  • how to interpret detection lines

    how to interpret detection lines

    After run test model on some test images (experiments/scripts/test.sh), got following lines (show 1st 10 lines here):

    testimage-001 1.000 69.0 624.3 159.9 624.4 testimage-001 1.000 167.5 624.5 207.0 624.6 testimage-001 1.000 120.6 564.6 172.2 564.7 testimage-001 1.000 64.4 564.7 112.8 564.8 testimage-001 1.000 9.5 624.5 60.9 624.5 testimage-001 1.000 178.0 563.4 224.6 562.2 testimage-001 0.999 348.3 222.6 441.9 222.5 testimage-001 0.999 10.6 443.2 77.4 443.3 testimage-001 0.999 9.6 564.4 56.9 564.4 testimage-001 0.999 9.7 222.7 104.0 222.5 testimage-001 0.999 10.1 504.0 74.1 504.0

    How to interpret them? It does not look like a boundbox in standard voc format (i.e. xmin,ymin,xmax,ymax), since y1 and y2 are so close to each other, does not look a valid boundbox.

    opened by jwnsu 2
  • train.sh

    train.sh

    [723.29144 542.9097 723.29144 552.94037 723.29144 552.94037 723.29144 542.9097 ] [270. nan 723.29144 555.44806 270. nan 723.29144 557.9557 ] [723.29144 560.4634 723.29144 562.97107 723.29144 562.97107 723.29144 560.4634 ]] Traceback (most recent call last): File "./tools/train_net.py", line 112, in max_iters=args.max_iters) File "/home/zq/crpn/tools/../lib/fast_rcnn/train.py", line 162, in train_net model_paths = sw.train_model(max_iters) File "/home/zq/crpn/tools/../lib/fast_rcnn/train.py", line 103, in train_model self.solver.step(1) File "/home/zq/crpn/tools/../lib/rpn/labelmap_layer.py", line 66, in forward x2 = int(round(bbox[2] * spatial_scale)) ValueError: cannot convert float NaN to integer

    这样的错误是要删除对应的图片吗,还是修改sort_points中的代码?

    opened by zq130320339 1
  • train.sh

    train.sh

    I0423 10:43:13.650262 4174 net.cpp:242] This network produces output loss_rpn_bl I0423 10:43:13.650265 4174 net.cpp:242] This network produces output loss_rpn_br I0423 10:43:13.650269 4174 net.cpp:242] This network produces output loss_rpn_tl I0423 10:43:13.650272 4174 net.cpp:242] This network produces output loss_rpn_tr I0423 10:43:13.650315 4174 net.cpp:255] Network initialization done. I0423 10:43:13.650451 4174 solver.cpp:56] Solver scaffolding done. Loading pretrained model weights from Model/pretrain.caffemodel HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 139667392706304: #000: H5F.c line 408 in H5Fis_hdf5(): unable open file major: File accessibilty minor: Not an HDF5 file #001: H5Fint.c line 532 in H5F__is_hdf5(): unable to open file major: Low-level I/O minor: Unable to initialize object #002: H5FD.c line 809 in H5FD_open(): open failed major: Virtual File Layer minor: Unable to initialize object #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = 'Model/pretrain.caffemodel', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0 major: File accessibilty minor: Unable to open file HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 139667392706304: #000: H5F.c line 586 in H5Fopen(): unable to open file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1236 in H5F_open(): unable to open file: time = Mon Apr 23 10:43:13 2018 , name = 'Model/pretrain.caffemodel', tent_flags = 0 major: File accessibilty minor: Unable to open file #002: H5FD.c line 809 in H5FD_open(): open failed major: Virtual File Layer minor: Unable to initialize object #003: H5FDsec2.c line 346 in H5FD_sec2_open(): unable to open file: name = 'Model/pretrain.caffemodel', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0 major: File accessibilty minor: Unable to open file F0423 10:43:13.652704 4174 net.cpp:791] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open Model/pretrain.caffemodel *** Check failure stack trace: *** ./experiments/scripts/train.sh: line 55: 4174 Aborted (core dumped) ./tools/train_net.py --gpu 0 --solver models/${NET}/solver.pt --weights ${WEIGHTS} --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg models/${NET}/config.yml ${EXTRA_ARGS}

    作者你好,这是什么错误?怎么解决

    opened by zq130320339 0
  • Cannot Build Caffe and pycaffe successfully

    Cannot Build Caffe and pycaffe successfully

    Hello @xhzdeng, I am trying to run your code. But when I run cd $CRPN_ROOT/caffe-fast-rcnn make -j8 && make pycaffe I got the following error Makefile:6: *** Makefile.config not found. See Makefile.config.example.. Stop.

    Should I download more files?

    opened by jiansfoggy 0
  • hi,when i use test.sh to evaluate icdar2015 dataset ,AP for text = 0.0459?

    hi,when i use test.sh to evaluate icdar2015 dataset ,AP for text = 0.0459?

    some infomation,

    • ./tools/test_net.py --gpu 0 --def models/vgg16/test.pt --net /home/deep3/work/crpn/models/vgg16/test.caffemodel --imdb voc_2007_test --cfg models/vgg16/config.yml Called with args: Namespace(caffemodel='/home/deep3/work/crpn/models/vgg16/test.caffemodel', cfg_file='models/vgg16/config.yml', comp_mode=False, gpu_id=0, imdb_name='voc_2007_test', max_per_image=100, prototxt='models/vgg16/test.pt', set_cfgs=None, vis=False, wait=True)

    AP for text = 0.0459 Mean AP = 0.0459 Mean REC = 0.1231 Mean PREC = 0.1032

    what's wrong? thanks

    opened by runauto 1
  • Low F-measure for ICDAR2015

    Low F-measure for ICDAR2015

    Hello, @xhzdeng I am trying to get F-measure for ICDAR2015 test subset. I change TEST.SCALES to 840 in config.yaml and save detentions to txt from lib/fast_rcnn/test.py just after the line 316 to be able to run ICDAR eval tool. For the provided test.caffemodel I am getting is only 0.8357763975155279.

    opened by Wovchena 0
Owner
xhzdeng
xhzdeng
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
governance proposal to make fei redeemable for eth

Feil Proposal ?? Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

null 13 Mar 31, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
Rotational region detection based on Faster-RCNN.

R2CNN_Faster_RCNN_Tensorflow Abstract This is a tensorflow re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detecti

UCAS-Det 581 Nov 22, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 3, 2023
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

null 453 Dec 28, 2022
Handwriting Recognition System based on a deep Convolutional Recurrent Neural Network architecture

Handwriting Recognition System This repository is the Tensorflow implementation of the Handwriting Recognition System described in Handwriting Recogni

Edgard Chammas 346 Jan 7, 2023
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

null 948 Dec 23, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022