Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Related tags

Computer Vision RRPN
Overview

Paper source

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

https://arxiv.org/abs/1703.01086

News

We update RRPN in pytorch 1.0! View https://github.com/mjq11302010044/RRPN_plusplus for more details. Text Spotter f-measure results are 89.5 % in IC15, 92.0% in IC13. The testing speed can reach 13.3 fps in IC13 with input shorter size of 640px !

License

RRPN is released under the MIT License (refer to the LICENSE file for details). This project is for research purpose only, further use for RRPN should contact authors.

Citing RRPN

If you find RRPN useful in your research, please consider citing:

@article{Jianqi17RRPN,
    Author = {Jianqi Ma and Weiyuan Shao and Hao Ye and Li Wang and Hong Wang and Yingbin Zheng and Xiangyang Xue},
    Title = {Arbitrary-Oriented Scene Text Detection via Rotation Proposals},
    journal = {IEEE Transactions on Multimedia},
    volume={20}, 
    number={11}, 
    pages={3111-3122}, 
    year={2018}
}

Contents

  1. Requirements: software
  2. Requirements: hardware
  3. Basic installation
  4. Demo
  5. Beyond the demo: training and testing

Requirements: software

  1. Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers!

# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1

You can download my Makefile.config for reference. 2. Python packages you might not have: cython, python-opencv, easydict

Requirements: hardware

  1. For training the end-to-end version of RRPN with VGG16, 4~5G of GPU memory is sufficient (using CUDNN)

Installation (sufficient for the demo)

  1. Clone the RRPN repository
# git clone https://github.com/mjq11302010044/RRPN.git
  1. We'll call the directory that you cloned RRPN into RRPN_ROOT

  2. Build the Cython modules

    cd $RRPN_ROOT/lib
    make
  3. Build Caffe and pycaffe

    cd $RRPN_ROOT/caffe-fast-rcnn
    # Now follow the Caffe installation instructions here:
    #   http://caffe.berkeleyvision.org/installation.html
    
    # If you're experienced with Caffe and have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j4 && make pycaffe
  4. Download pre-computed RRPN detectors

    Trained VGG16 model download link: https://drive.google.com/open?id=0B5rKZkZodGIsV2RJUjVlMjNOZkE
    

    Then move the model into $RRPN_ROOT/data/faster_rcnn_models.

Demo

After successfully completing basic installation, you'll be ready to run the demo.

To run the demo

cd $RRPN_ROOT
python ./tools/rotation_demo.py

The txt results will be saved in $RRPN_ROOT/result

Beyond the demo: installation for training and testing models

You can use the function get_rroidb() in $RRPN_ROOT/lib/rotation/data_extractor.py to manage your training data:

Each training sample should be managed in a python dict like:

im_info = {
	'gt_classes': # Set to 1(Only text)
	'max_classes': # Set to 1(Only text)
	'image': # image path to access
	'boxes': # ground truth box
	'flipped' : # Flip an image or not (Not implemented)
	'gt_overlaps' : # overlap of a class(text)
	'seg_areas' : # area of an ground truth region
	'height': # height of an image data
	'width': # width of an image data
	'max_overlaps' : # max overlap with each gt-proposal
	'rotated': # Random angle to rotate an image
}

Then assign your database to the variable 'roidb' in main function of $RRPN_ROOT/tools/train_net.py

116: roidb = get_rroidb("train") # change to your data manage function

Download pre-trained ImageNet models

Pre-trained ImageNet models can be downloaded for the networks described in the paper: VGG16.

cd $RRPN_ROOT
./data/scripts/fetch_imagenet_models.sh

VGG16 comes from the Caffe Model Zoo, but is provided here for your convenience. ZF was trained at MSRA.

Then you can train RRPN by typing:

./experiment/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] rrpn

[NET] usually takes VGG16

Trained RRPN networks are saved under:(We set the directory to './' by default.)

./

One can change the directory in variable output_dir in $RRPN_ROOT/tools/train_net.py

Any question about this project please send message to Jianqi Ma([email protected]), and enjoy it!

Comments
  • required GPU Memory?

    required GPU Memory?

    @mjq11302010044 I finally managed to run it :) thank you for your great support. I do not have cuDNN installed ... that might be an issue with the memory required by the demo? I am using a fairly new GPU, NVIDIA QUADRO-M4000 with 8GB! How is it possible I am running out of Memory? Also, I've resized your images, but still...:

    libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.

    Loaded network /home/vale/masterarbeit/H_AOSTD/data/faster_rcnn_models/vgg16_faster_rcnn.caffemodel F1109 05:23:29.260838 22269 syncedmem.cpp:64] Check failed: error == cudaSuccess (2 vs. 0) out of memory

    I guess I could optimize utilization with cuDNN. But is it necessary? Quite bad experience with cuDNN...

    opened by idefix92 4
  • Rotated IOU greater than 1

    Rotated IOU greater than 1

    The following example produces IOU greater than 1.

    b1 = np.array([[46.83, 44.03, 3.9, 1.63, 0]], dtype=np.float32)
    b2 = np.array([[46.83, 44.03, 1.63, 3.9, 1.45]], dtype=np.float32)
    rbbox_overlaps(b1, b2) = 1.35
    

    The expected iou should be near 1. Is there something I'm missing here? Note that I changed the code to use angle in radians.

    opened by shashanktyagi 2
  • GT框的倾斜角θ的输入

    GT框的倾斜角θ的输入

    你好。 请问在读取VOC数据集的时候,倾斜角θ是如何读入的? 我查看了datasets/pascal_voc.py/_load_pascal_annotation,跟faster-rcnn一样,读取的只有bndboxxmin, xmax, ymin, ymax,并没有倾斜角θ的读取。 所以,请问是如何输入GT框的倾斜角θ的?

    opened by mltloveyy 1
  • Check failed: error == cudaSuccess (2 vs. 0)  out of memory *** Check failure stack trace: *** Aborted (core dumped)

    Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)

    When I run the code ./tools/rotation_demo.py, I get the following error:

    Loaded network /RRPN/data/faster_rcnn_models/vgg16_faster_rcnn.caffemodel Memory need is 426752000 Memory need is 426752000 Memory need is 106752000 Memory need is 106752000 Memory need is 213504000 Memory need is 213504000 Memory need is 53376000 Memory need is 53376000 Memory need is 106752000 F1010 11:27:03.680461 8528 syncedmem.cpp:57] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)

    I have tried all possible available solution but they were not able to resolve this. I am compiling with Cudnn 7 and cuda 9.0. I have downgraded both of them and the problem was still not solved. I am using GT 710 2GB. Is there a way anyone can help me here. I am not even sure that if this is a bug or genuinely a hardware limitation. So before I go and buy a new GPU I would appreciate your help.

    @mjq11302010044 @idefix92

    opened by famunir 1
  • Training Speed and Iteration Count

    Training Speed and Iteration Count

    Helo @mjq11302010044. I'm trying to create a model on my own just like your trained model that you shared.

    But the training speed is very low. The iteration speed is aroud 5 seconds and the iteration count is 490000.

    Do you have any trick to speed up the training speed? What can be the least iteration count to get the demo works?

    opened by ghost 1
  • Changes to caffe comparing to the official one?

    Changes to caffe comparing to the official one?

    Is it possible to list the major changes to this version of caffe you use? I want know the potential issues/conflict while merging it with the newer version of caffe.

    opened by xrf116 1
  • The pretrained model performance is worse than the paper claimed

    The pretrained model performance is worse than the paper claimed

    Hi, I tested your pretrained VGG 16 model on ICDAR2015 with no parameters change. The following is the result of icdar2015 official evaluation script. All these values are lower than what is claimed in the paper.

    Calculated!{"recall": 0.6721232546942706, "precision": 0.7977142857142857, "hmean": 0.7295531748105567, "AP": 0}
    
    opened by Godricly 1
  • Is the angle anti-clockwise?

    Is the angle anti-clockwise?

    the code of function convert_region in rotate_polygon_nms_kernel.cu and rbbox_overlaps_kernel.cu seems to expect anti-clockwise degree as param. Do I have an exact understanding?


    I compared the result with OpenCV3's rotatedRectangleIntersection, and the result equals.

    opened by makefile 0
  • How to compile the

    How to compile the "lib" for win10

    my computer motherboard seems to be not compatible with Linux OS(asus Z370), anyone has tips for compiling the lib so it can be runned in win10 ??? thank you!!!

    opened by YanShuang17 0
  • rotate_roi_align_layer.cu(147): error

    rotate_roi_align_layer.cu(147): error

    rotate_roi_align_layer.cu(147): error: calling a constexpr host function("fmax") from a global function("RotateROIAlignForward") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

    opened by zhao-97 0
  • 复现论文精度

    复现论文精度

    你好,谢谢你们的很棒的工作,请问一下, 1、如果想复现论文里的ICDAR2015的精度,是直接按照readme运行吗,我按照readme里直接运行,和论文里差20个点? 2、我看caffe代码里没有实现数据增强,论文里说加了数据增强,是你们没有加进去吗? 3、我看论文里的训练策略是前20w次迭代lr 1e-3,后10w次1e-4,这个和代码里的训练策略不符。

    opened by cjt222 0
  • can not train model

    can not train model

    I0107 11:43:32.129838 14961 layer_factory.hpp:77] Creating layer rpn_loss_cls I0107 11:43:32.132026 14961 net.cpp:150] Setting up rpn_loss_cls I0107 11:43:32.132074 14961 net.cpp:157] Top shape: (1) I0107 11:43:32.132081 14961 net.cpp:160] with loss weight 1 I0107 11:43:32.132100 14961 net.cpp:165] Memory required for data: 298545136 I0107 11:43:32.132104 14961 layer_factory.hpp:77] Creating layer rpn_loss_bbox I0107 11:43:32.132115 14961 net.cpp:106] Creating Layer rpn_loss_bbox I0107 11:43:32.132118 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_pred_rpn_bbox_pred_0_split_0 I0107 11:43:32.132123 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_targets I0107 11:43:32.132127 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_inside_weights I0107 11:43:32.132129 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_outside_weights I0107 11:43:32.132133 14961 net.cpp:411] rpn_loss_bbox -> rpn_loss_bbox F0107 11:43:32.132158 14961 smooth_L1_loss_layer.cpp:28] Check failed: bottom[0]->channels() == bottom[1]->channels() (225 vs. 270) *** Check failure stack trace: *** ./experiments/scripts/faster_rcnn_end2end.sh: line 78: 14961 Aborted (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver /data/wuxl/RRPN2/models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}

    real 0m18.619s user 0m17.324s sys 0m1.984s

    how to solve the problem

    opened by wuxiaolianggit 0
Owner
null
YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

null 1.1k Dec 30, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 4, 2023
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 4, 2022
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

Yukang Wang 101 Dec 12, 2022
A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Jan Zdenek 43 Mar 15, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 1, 2023