Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Last update: Nov 22, 2022

Related tags

Computer Vision RRPN

Overview

Paper source

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

https://arxiv.org/abs/1703.01086

News

We update RRPN in pytorch 1.0! View https://github.com/mjq11302010044/RRPN_plusplus for more details. Text Spotter f-measure results are 89.5 % in IC15, 92.0% in IC13. The testing speed can reach 13.3 fps in IC13 with input shorter size of 640px !

License

RRPN is released under the MIT License (refer to the LICENSE file for details). This project is for research purpose only, further use for RRPN should contact authors.

Citing RRPN

If you find RRPN useful in your research, please consider citing:

@article{Jianqi17RRPN,
    Author = {Jianqi Ma and Weiyuan Shao and Hao Ye and Li Wang and Hong Wang and Yingbin Zheng and Xiangyang Xue},
    Title = {Arbitrary-Oriented Scene Text Detection via Rotation Proposals},
    journal = {IEEE Transactions on Multimedia},
    volume={20}, 
    number={11}, 
    pages={3111-3122}, 
    year={2018}
}

Requirements: software
Requirements: hardware
Basic installation
Demo
Beyond the demo: training and testing

Requirements: software

Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers!

# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1

You can download my Makefile.config for reference. 2. Python packages you might not have: cython, python-opencv, easydict

Requirements: hardware

For training the end-to-end version of RRPN with VGG16, 4~5G of GPU memory is sufficient (using CUDNN)

Installation (sufficient for the demo)

Clone the RRPN repository

# git clone https://github.com/mjq11302010044/RRPN.git

We'll call the directory that you cloned RRPN into RRPN_ROOT
Build the Cython modules
```
cd $RRPN_ROOT/lib
make
```

Build Caffe and pycaffe

cd $RRPN_ROOT/caffe-fast-rcnn
# Now follow the Caffe installation instructions here:
#   http://caffe.berkeleyvision.org/installation.html

# If you're experienced with Caffe and have all of the requirements installed
# and your Makefile.config in place, then simply do:
make -j4 && make pycaffe

Download pre-computed RRPN detectors

Trained VGG16 model download link: https://drive.google.com/open?id=0B5rKZkZodGIsV2RJUjVlMjNOZkE

Then move the model into $RRPN_ROOT/data/faster_rcnn_models.

Demo

After successfully completing basic installation, you'll be ready to run the demo.

To run the demo

cd $RRPN_ROOT
python ./tools/rotation_demo.py

The txt results will be saved in $RRPN_ROOT/result

Beyond the demo: installation for training and testing models

You can use the function get_rroidb() in $RRPN_ROOT/lib/rotation/data_extractor.py to manage your training data:

Each training sample should be managed in a python dict like:

im_info = {
	'gt_classes': # Set to 1(Only text)
	'max_classes': # Set to 1(Only text)
	'image': # image path to access
	'boxes': # ground truth box
	'flipped' : # Flip an image or not (Not implemented)
	'gt_overlaps' : # overlap of a class(text)
	'seg_areas' : # area of an ground truth region
	'height': # height of an image data
	'width': # width of an image data
	'max_overlaps' : # max overlap with each gt-proposal
	'rotated': # Random angle to rotate an image
}

Then assign your database to the variable 'roidb' in main function of $RRPN_ROOT/tools/train_net.py

116: roidb = get_rroidb("train") # change to your data manage function

Download pre-trained ImageNet models

Pre-trained ImageNet models can be downloaded for the networks described in the paper: VGG16.

cd $RRPN_ROOT
./data/scripts/fetch_imagenet_models.sh

VGG16 comes from the Caffe Model Zoo, but is provided here for your convenience. ZF was trained at MSRA.

Then you can train RRPN by typing:

./experiment/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] rrpn

[NET] usually takes VGG16

Trained RRPN networks are saved under:(We set the directory to './' by default.)

./

One can change the directory in variable output_dir in $RRPN_ROOT/tools/train_net.py

Any question about this project please send message to Jianqi Ma([email protected]), and enjoy it!

Comments

required GPU Memory?

@mjq11302010044 I finally managed to run it :) thank you for your great support. I do not have cuDNN installed ... that might be an issue with the memory required by the demo? I am using a fairly new GPU, NVIDIA QUADRO-M4000 with 8GB! How is it possible I am running out of Memory? Also, I've resized your images, but still...:

libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.

Loaded network /home/vale/masterarbeit/H_AOSTD/data/faster_rcnn_models/vgg16_faster_rcnn.caffemodel F1109 05:23:29.260838 22269 syncedmem.cpp:64] Check failed: error == cudaSuccess (2 vs. 0) out of memory

I guess I could optimize utilization with cuDNN. But is it necessary? Quite bad experience with cuDNN...

opened by idefix92 4
Rotated IOU greater than 1
The following example produces IOU greater than 1.

b1 = np.array([[46.83, 44.03, 3.9, 1.63, 0]], dtype=np.float32) b2 = np.array([[46.83, 44.03, 1.63, 3.9, 1.45]], dtype=np.float32) rbbox_overlaps(b1, b2) = 1.35

The expected iou should be near 1. Is there something I'm missing here? Note that I changed the code to use angle in radians.
opened by shashanktyagi 2
GT框的倾斜角θ的输入

你好。请问在读取VOC数据集的时候，倾斜角θ是如何读入的？我查看了datasets/pascal_voc.py/_load_pascal_annotation，跟faster-rcnn一样，读取的只有bndboxxmin, xmax, ymin, ymax，并没有倾斜角θ的读取。所以，请问是如何输入GT框的倾斜角θ的？

opened by mltloveyy 1
Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)

When I run the code ./tools/rotation_demo.py, I get the following error:

Loaded network /RRPN/data/faster_rcnn_models/vgg16_faster_rcnn.caffemodel Memory need is 426752000 Memory need is 426752000 Memory need is 106752000 Memory need is 106752000 Memory need is 213504000 Memory need is 213504000 Memory need is 53376000 Memory need is 53376000 Memory need is 106752000 F1010 11:27:03.680461 8528 syncedmem.cpp:57] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped)

I have tried all possible available solution but they were not able to resolve this. I am compiling with Cudnn 7 and cuda 9.0. I have downgraded both of them and the problem was still not solved. I am using GT 710 2GB. Is there a way anyone can help me here. I am not even sure that if this is a bug or genuinely a hardware limitation. So before I go and buy a new GPU I would appreciate your help.

@mjq11302010044 @idefix92

opened by famunir 1
Training Speed and Iteration Count

Helo @mjq11302010044. I'm trying to create a model on my own just like your trained model that you shared.

But the training speed is very low. The iteration speed is aroud 5 seconds and the iteration count is 490000.

Do you have any trick to speed up the training speed? What can be the least iteration count to get the demo works?

opened by ghost 1
Changes to caffe comparing to the official one?

Is it possible to list the major changes to this version of caffe you use? I want know the potential issues/conflict while merging it with the newer version of caffe.

opened by xrf116 1
The pretrained model performance is worse than the paper claimed
Hi, I tested your pretrained VGG 16 model on ICDAR2015 with no parameters change. The following is the result of icdar2015 official evaluation script. All these values are lower than what is claimed in the paper.

Calculated!{"recall": 0.6721232546942706, "precision": 0.7977142857142857, "hmean": 0.7295531748105567, "AP": 0}
opened by Godricly 1
Is the angle anti-clockwise?

the code of function convert_region in rotate_polygon_nms_kernel.cu and rbbox_overlaps_kernel.cu seems to expect anti-clockwise degree as param. Do I have an exact understanding?

I compared the result with OpenCV3's rotatedRectangleIntersection, and the result equals.

opened by makefile 0
How to compile the "lib" for win10

my computer motherboard seems to be not compatible with Linux OS(asus Z370), anyone has tips for compiling the lib so it can be runned in win10 ??? thank you!!!

opened by YanShuang17 0
rotate_roi_align_layer.cu(147): error

rotate_roi_align_layer.cu(147): error: calling a constexpr host function("fmax") from a global function("RotateROIAlignForward") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

opened by zhao-97 0
复现论文精度

你好，谢谢你们的很棒的工作，请问一下， 1、如果想复现论文里的ICDAR2015的精度，是直接按照readme运行吗，我按照readme里直接运行，和论文里差20个点？ 2、我看caffe代码里没有实现数据增强，论文里说加了数据增强，是你们没有加进去吗？ 3、我看论文里的训练策略是前20w次迭代lr 1e-3,后10w次1e-4，这个和代码里的训练策略不符。

opened by cjt222 0
can not train model

I0107 11:43:32.129838 14961 layer_factory.hpp:77] Creating layer rpn_loss_cls I0107 11:43:32.132026 14961 net.cpp:150] Setting up rpn_loss_cls I0107 11:43:32.132074 14961 net.cpp:157] Top shape: (1) I0107 11:43:32.132081 14961 net.cpp:160] with loss weight 1 I0107 11:43:32.132100 14961 net.cpp:165] Memory required for data: 298545136 I0107 11:43:32.132104 14961 layer_factory.hpp:77] Creating layer rpn_loss_bbox I0107 11:43:32.132115 14961 net.cpp:106] Creating Layer rpn_loss_bbox I0107 11:43:32.132118 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_pred_rpn_bbox_pred_0_split_0 I0107 11:43:32.132123 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_targets I0107 11:43:32.132127 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_inside_weights I0107 11:43:32.132129 14961 net.cpp:454] rpn_loss_bbox <- rpn_bbox_outside_weights I0107 11:43:32.132133 14961 net.cpp:411] rpn_loss_bbox -> rpn_loss_bbox F0107 11:43:32.132158 14961 smooth_L1_loss_layer.cpp:28] Check failed: bottom[0]->channels() == bottom[1]->channels() (225 vs. 270) *** Check failure stack trace: *** ./experiments/scripts/faster_rcnn_end2end.sh: line 78: 14961 Aborted (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver /data/wuxl/RRPN2/models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}

real 0m18.619s user 0m17.324s sys 0m1.984s

how to solve the problem

opened by wuxiaolianggit 0

Owner

GitHub

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)（Rotation Detection）（Rotated BBox）

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

1.1k Dec 30, 2022

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

30 Oct 22, 2022

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

309 Dec 6, 2022

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

930 Jan 4, 2023

A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT

151 Dec 12, 2022

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

261 Nov 21, 2022

Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

671 Dec 27, 2022