A novel region proposal network for more general object detection ( including scene text detection ).

Related tags

Computer Vision detection rpn object-detection text-detection object-proposals region-proposals scene-text scene-text-detection

Overview

DeRPN: Taking a further step toward more general object detection

DeRPN is a novel region proposal network which concentrates on improving the adaptivity of current detectors. The paper is available here.

Recent Update

· Mar. 13, 2019: The DeRPN pretrained models are added.

· Jan. 25, 2019: The code is released.

Contact Us

Welcome to improve DeRPN together. For any questions, please feel free to contact Lele Xie ([email protected]) or Prof. Jin ([email protected]).

Citation

If you find DeRPN useful to your research, please consider citing our paper as follow:

@article{xie2019DeRPN,
  title     = {DeRPN: Taking a further step toward more general object detection},
  author    = {Lele Xie, Yuliang Liu, Lianwen Jin*, Zecheng Xie}
  joural    = {AAAI}
  year      = {2019}
}

Main Results

Note: The reimplemented results are slightly different from those presented in the paper for different training settings, but the conclusions are still consistent. For example, this code doesn't use multi-scale training which should boost the results for both DeRPN and RPN.

COCO-Text

training data: COCO-Text train

test data: COCO-Text test

	network	[email protected]	[email protected]	[email protected]	[email protected]
RPN+Faster R-CNN	VGG16	32.48	52.54	7.40	17.59
DeRPN+Faster R-CNN	VGG16	47.39	70.46	11.05	25.12
RPN+R-FCN	ResNet-101	37.71	54.35	13.17	22.21
DeRPN+R-FCN	ResNet-101	48.62	71.30	13.37	27.57

Pascal VOC

training data: VOC 07+12 trainval

test data: VOC 07 test

Inference time is evaluated on one TITAN XP GPU.

	network	inference time	[email protected]	[email protected]	AP
RPN+Faster R-CNN	VGG16	64 ms	75.53	42.08	42.60
DeRPN+Faster R-CNN	VGG16	65 ms	76.17	44.97	43.84
RPN+R-FCN	ResNet-101	85 ms	78.87	54.30	50.04
DeRPN+R-FCN (900) *	ResNet-101	84 ms	79.21	54.43	50.28

( "*": On Pascal VOC dataset, we found that it is more suitable to train the DeRPN+R-FCN model with 900 proposals. For other experiments, we use the default proposal number to train the models, i.e., 2000 proposals fro Faster R-CNN, 300 proposals for R-FCN. )

MS COCO

training data: COCO 2017 train

test data: COCO 2017 test/val

test set	network	AP	AP50	AP75	AP_S	AP_M	AP_L
RPN+Faster R-CNN	VGG16	24.2	45.4	23.7	7.6	26.6	37.3
DeRPN+Faster R-CNN	VGG16	25.5	47.2	25.2	10.3	27.9	36.7
RPN+R-FCN	ResNet-101	27.7	47.9	29.0	10.1	30.2	40.1
DeRPN+R-FCN	ResNet-101	28.4	49.0	29.5	11.1	31.7	40.5

val set	network	AP	AP50	AP75	AP_S	AP_M	AP_L
RPN+Faster R-CNN	VGG16	24.1	45.0	23.8	7.6	27.8	37.8
DeRPN+Faster R-CNN	VGG16	25.5	47.3	25.0	9.9	28.8	37.8
RPN+R-FCN	ResNet-101	27.8	48.1	28.8	10.4	31.2	42.5
DeRPN+R-FCN	ResNet-101	28.4	48.5	29.5	11.5	32.9	42.0

Getting Started

Requirements
Installation
Preparation for Training & Testing
Usage

Requirements

Cuda 8.0 and cudnn 5.1.
Some python packages: cython, opencv-python, easydict et. al. Simply install them if your system misses these packages.
Configure the caffe according to your environment (Caffe installation instructions). As the code requires pycaffe, caffe should be built with python layers. In Makefile.config, make sure to uncomment this line:

WITH_PYTHON_LAYER := 1

An NVIDIA GPU with more than 6GB is required for ResNet-101.

Installation

Clone the DeRPN repository

git clone https://github.com/HCIILAB/DeRPN.git

Build the Cython modules
```
cd $DeRPN_ROOT/lib
make
```

Build caffe and pycaffe

cd $DeRPN_ROOT/caffe
make -j8 && make pycaffe

Preparation for Training & Testing

Dataset

Download the datasets of Pascal VOC 2007 & 2012, MS COCO 2017 and COCO-Text.
You need to put these datasets under the $DeRPN_ROOT/data folder (with symlinks).

For COCO-Text, the folder structure is as follow:

$DeRPN_ROOT/data/coco_text/images/train2014
$DeRPN_ROOT/data/coco_text/images/val2014
$DeRPN_ROOT/data/coco_text/annotations  
# train2014, val2014, and annotations are symlinks from /pth_to_coco2014/train2014, 
# /pth_to_coco2014/val2014 and /pth_to_coco2014/annotations2014/, respectively.

For COCO, the folder structure is as follow:

$DeRPN_ROOT/data/coco/images/train2017
$DeRPN_ROOT/data/coco/images/val2017
$DeRPN_ROOT/data/coco/images/test-dev2017
$DeRPN_ROOT/data/coco/annotations  
# the symlinks are similar to COCO-Text

For Pascal VOC, the folder structure is as follow:

$DeRPN_ROOT/data/VOCdevkit2007
$DeRPN_ROOT/data/VOCdevkit2012
#VOCdevkit2007 and VOCdevkit2012 are symlinks from $VOCdevkit whcich contains VOC2007 and VOC2012.

Pretrained models

Please download the ImageNet pretrained models (VGG16 and ResNet-101, password: k4z1), and put them under

$DeRPN_ROOT/data/imagenet_models

We also provide the DeRPN pretrained models here (password: fsd8).

Usage

cd $DeRPN_ROOT
./experiments/scripts/faster_rcnn_derpn_end2end.sh [GPU_ID] [NET] [DATASET]

# e.g., ./experiments/scripts/faster_rcnn_derpn_end2end.sh 0 VGG16 coco_text

Copyright

This code is free to the academic community for research purpose only. For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected].

You might also like...

Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

2.5k Jan 3, 2023

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note：If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

453 Dec 28, 2022

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

governance proposal to make fei redeemable for eth

Feil Proposal 🌲 Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

13 Mar 31, 2022

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

354 Dec 12, 2022

A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

444 Dec 30, 2022

MXNet OCR implementation. Including text recognition and detection.

insightocr Text Recognition Accuracy on Chinese dataset by caffe-ocr Network LSTM 4x1 Pooling Gray Test Acc SimpleNet N Y Y 99.37% SE-ResNet34 N Y Y 9

99 Nov 1, 2022

Comments

about some code in tools/ron/generate_derpn_labels_targets_layer.py/self.eval_obs_region(gt_boxes)

def eval_obs_region(self, gt_boxes):
        height, width = self.height, self.width
        observe_region = np.zeros((height, width), np.int32)
        scaled_gt = gt_boxes[:,:4].copy()/float(self._feat_stride)#gt scaled to featrue map
        gt_w = scaled_gt[:, 2] -scaled_gt[:, 0]
        gt_h = scaled_gt[:, 3] -scaled_gt[:, 1]
        gt_cx = (scaled_gt[:, 2] + scaled_gt[:, 0])*0.5
        gt_cy = (scaled_gt[:, 3] + scaled_gt[:, 1])*0.5

        start_x = np.maximum((gt_cx - gt_w*self.extend_ratio*0.5).astype(np.int32), 0)
        end_x = np.minimum((gt_cx + gt_w*self.extend_ratio*0.5).astype(np.int32), width-1)
        start_y = np.maximum((gt_cy - gt_h*self.extend_ratio*0.5).astype(np.int32), 0)
        end_y = np.minimum((gt_cy + gt_h*self.extend_ratio*0.5).astype(np.int32), height-1)
        for ith in range(gt_boxes.shape[0]):                
            observe_region[start_y[ith]:end_y[ith]+1,start_x[ith]:end_x[ith]+1] = 1
        return observe_region, gt_cx, gt_cy`

` The function of the above code, i think it is to reflect the gtbox into the feature map, and start_x,end_x ,start_y and end_y is to compute the coordinate in the feature map. But what does the self.extend-ratio mean and why it is equal to 1.2? And maybe i dismiss the real function of the above codes. thanks a lo

opened by ztyxd 4

Code optimization

The paper is very interesting, and thank you very much for sharing the code. I encountered loss nan when I used multi-batch training. After the check, code in generate_derpn_labels_targets_layer.py should be modified except for the assertion code. Old ############# for i in range(cls_num_ofchannel.shape[1]): cls_weights[0,i, :, :].fill(cls_num_ofchannel[0,i]) for i in range(reg_num_ofchannel.shape[1]): reg_weights[0,i, :, :].fill(reg_num_ofchannel[0,i]) ############# New ############# for ith_im in range(num_images): for i in range(cls_num_ofchannel.shape[1]): cls_weights[ith_im,i, :, :].fill(cls_num_ofchannel[ith_im,i]) for i in range(reg_num_ofchannel.shape[1]): reg_weights[ith_im,i, :, :].fill(reg_num_ofchannel[ith_im,i]) #############

opened by Li-Lai 0