Rotational region detection based on Faster-RCNN.

Overview

R2CNN_Faster_RCNN_Tensorflow

Abstract

This is a tensorflow re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
It should be noted that we did not re-implementate exactly as the paper and just adopted its idea.

This project is based on Faster-RCNN, and completed by YangXue and YangJirui.

DOTA test results

1

Comparison

Part of the results are from DOTA paper.

Task1 - Oriented Leaderboard

Approaches mAP PL BD BR GTF SV LV SH TC BC ST SBF RA HA SP HC
SSD 10.59 39.83 9.09 0.64 13.18 0.26 0.39 1.11 16.24 27.57 9.23 27.16 9.09 3.03 1.05 1.01
YOLOv2 21.39 39.57 20.29 36.58 23.42 8.85 2.09 4.82 44.34 38.35 34.65 16.02 37.62 47.23 25.5 7.45
R-FCN 26.79 37.8 38.21 3.64 37.26 6.74 2.6 5.59 22.85 46.93 66.04 33.37 47.15 10.6 25.19 17.96
FR-H 36.29 47.16 61 9.8 51.74 14.87 12.8 6.88 56.26 59.97 57.32 47.83 48.7 8.23 37.25 23.05
FR-O 52.93 79.09 69.12 17.17 63.49 34.2 37.16 36.2 89.19 69.6 58.96 49.4 52.52 46.69 44.8 46.3
R2CNN 60.67 80.94 65.75 35.34 67.44 59.92 50.91 55.81 90.67 66.92 72.39 55.06 52.23 55.14 53.35 48.22
RRPN 61.01 88.52 71.20 31.66 59.30 51.85 56.19 57.25 90.81 72.84 67.38 56.69 52.84 53.08 51.94 53.58
ICN 68.20 81.40 74.30 47.70 70.30 64.90 67.80 70.00 90.80 79.10 78.20 53.60 62.90 67.00 64.20 50.20
R2CNN++ 71.16 89.66 81.22 45.50 75.10 68.27 60.17 66.83 90.90 80.69 86.15 64.05 63.48 65.34 68.01 62.05

Task2 - Horizontal Leaderboard

Approaches mAP PL BD BR GTF SV LV SH TC BC ST SBF RA HA SP HC
SSD 10.94 44.74 11.21 6.22 6.91 2 10.24 11.34 15.59 12.56 17.94 14.73 4.55 4.55 0.53 1.01
YOLOv2 39.2 76.9 33.87 22.73 34.88 38.73 32.02 52.37 61.65 48.54 33.91 29.27 36.83 36.44 38.26 11.61
R-FCN 47.24 79.33 44.26 36.58 53.53 39.38 34.15 47.29 45.66 47.74 65.84 37.92 44.23 47.23 50.64 34.9
FR-H 60.46 80.32 77.55 32.86 68.13 53.66 52.49 50.04 90.41 75.05 59.59 57 49.81 61.69 56.46 41.85
R2CNN - - - - - - - - - - - - - - - -
FPN 72.00 88.70 75.10 52.60 59.20 69.40 78.80 84.50 90.60 81.30 82.60 52.50 62.10 76.60 66.30 60.10
ICN 72.50 90.00 77.70 53.40 73.30 73.50 65.00 78.20 90.80 79.10 84.80 57.20 62.10 73.50 70.20 58.10
R2CNN++ 75.35 90.18 81.88 55.30 73.29 72.09 77.65 78.06 90.91 82.44 86.39 64.53 63.45 75.77 78.21 60.11

Face Detection

Environment: NVIDIA GeForce GTX 1060
2

ICDAR2015

3

Requirements

1、tensorflow >= 1.2
2、cuda8.0
3、python2.7 (anaconda2 recommend)
4、opencv(cv2)

Download Model

1、please download resnet50_v1resnet101_v1 pre-trained models on Imagenet, put it to data/pretrained_weights.
2、please download mobilenet_v2 pre-trained model on Imagenet, put it to data/pretrained_weights/mobilenet.
3、please download trained model by this project, put it to output/trained_weights.

Data Prepare

1、please download DOTA
2、crop data, reference:

cd $PATH_ROOT/data/io/DOTA
python train_crop.py 
python val_crop.py

3、data format

├── VOCdevkit
│   ├── VOCdevkit_train
│       ├── Annotation
│       ├── JPEGImages
│    ├── VOCdevkit_test
│       ├── Annotation
│       ├── JPEGImages

Compile

cd $PATH_ROOT/libs/box_utils/
python setup.py build_ext --inplace
cd $PATH_ROOT/libs/box_utils/cython_utils
python setup.py build_ext --inplace

Demo

Select a configuration file in the folder (libs/configs/) and copy its contents into cfgs.py, then download the corresponding weights.

DOTA

python demo_rh.py --src_folder='/PATH/TO/DOTA/IMAGES_ORIGINAL/' 
                  --image_ext='.png' 
                  --des_folder='/PATH/TO/SAVE/RESULTS/' 
                  --save_res=False
                  --gpu='0'

FDDB

python camera_demo.py --gpu='0'

Eval

python eval.py --img_dir='/PATH/TO/DOTA/IMAGES/' 
               --image_ext='.png' 
               --test_annotation_path='/PATH/TO/TEST/ANNOTATION/'
               --gpu='0'

Inference

python inference.py --data_dir='/PATH/TO/DOTA/IMAGES_CROP/'      
                    --gpu='0'

Train

1、If you want to train your own data, please note:

(1) Modify parameters (such as CLASS_NUM, DATASET_NAME, VERSION, etc.) in $PATH_ROOT/libs/configs/cfgs.py
(2) Add category information in $PATH_ROOT/libs/label_name_dict/lable_dict.py     
(3) Add data_name to line 75 of $PATH_ROOT/data/io/read_tfrecord.py 

2、make tfrecord

cd $PATH_ROOT/data/io/  
python convert_data_to_tfrecord.py --VOC_dir='/PATH/TO/VOCdevkit/VOCdevkit_train/' 
                                   --xml_dir='Annotation'
                                   --image_dir='JPEGImages'
                                   --save_name='train' 
                                   --img_format='.png' 
                                   --dataset='DOTA'

3、train

cd $PATH_ROOT/tools
python train.py

Tensorboard

cd $PATH_ROOT/output/summary
tensorboard --logdir=.

Citation

Some relevant achievements based on this code.

@article{[yang2018position](https://ieeexplore.ieee.org/document/8464244),
	title={Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network},
	author={Yang, Xue and Sun, Hao and Sun, Xian and  Yan, Menglong and Guo, Zhi and Fu, Kun},
	journal={IEEE Access},
	volume={6},
	pages={50839-50849},
	year={2018},
	publisher={IEEE}
}

@article{[yang2018r-dfpn](http://www.mdpi.com/2072-4292/10/1/132),
	title={Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks},
	author={Yang, Xue and Sun, Hao and Fu, Kun and Yang, Jirui and Sun, Xian and Yan, Menglong and Guo, Zhi},
	journal={Remote Sensing},
	volume={10},
	number={1},
	pages={132},
	year={2018},
	publisher={Multidisciplinary Digital Publishing Institute}
} 
Comments
  • 如何测试&mAP=0.279的问题

    如何测试&mAP=0.279的问题

    @feufhd @yangxue0827 抱歉打扰了,我跑了val_crop将DOTA数据集分割,运行eval.py进行验证,但是出现了两个问题: 1.当我用demo_rh.py检测时,目视解译效果很不错,但是mAP效果不佳。旋转目标检测的结果如下: `cls : baseball-diamond|| Recall: 0.3277027027027027 || Precison: 0.6799065420560748|| AP: 0.2857233511210732 mAP is : 0.27839696674514086


    rotation eval: Writing roundabout VOC results file Writing tennis-court VOC results file Writing swimming-pool VOC results file Writing storage-tank VOC results file Writing soccer-ball-field VOC results file Writing small-vehicle VOC results file Writing ship VOC results file Writing plane VOC results file Writing large-vehicle VOC results file Writing helicopter VOC results file Writing harbor VOC results file Writing ground-track-field VOC results file Writing bridge VOC results file Writing basketball-court VOC results file Writing baseball-diamond VOC results file cls : roundabout|| Recall: 0.05137844611528822 || Precison: 0.3942307692307692|| AP: 0.03896313329141163 cls : tennis-court|| Recall: 0.7942961165048543 || Precison: 0.9273822174991144|| AP: 0.7841216647894594 cls : swimming-pool|| Recall: 0.6034278959810875 || Precison: 0.4337298215802889|| AP: 0.4713228312417497 cls : storage-tank|| Recall: 0.08919098143236075 || Precison: 0.8359229334990678|| AP: 0.08359899690789072 cls : soccer-ball-field|| Recall: 0.4365482233502538 || Precison: 0.5|| AP: 0.37065461517016957 cls : small-vehicle|| Recall: 0.26025719534598896 || Precison: 0.44350029815146097|| AP: 0.14424862685392775 cls : ship|| Recall: 0.24484951836659447 || Precison: 0.7319509506751171|| AP: 0.20502145866938037 cls : plane|| Recall: 0.4970338303671637 || Precison: 0.8035251425609123|| AP: 0.47595213365825684 cls : large-vehicle|| Recall: 0.31162768312638367 || Precison: 0.6660152232051019|| AP: 0.23590015709743262 cls : helicopter|| Recall: 0.2857142857142857 || Precison: 0.42105263157894735|| AP: 0.2161832557262215 cls : harbor|| Recall: 0.1851032448377581 || Precison: 0.6090385198665453|| AP: 0.13665860486666898 cls : ground-track-field|| Recall: 0.4742268041237113 || Precison: 0.5302593659942363|| AP: 0.37650906662703715 cls : bridge|| Recall: 0.019583333333333335 || Precison: 0.27647058823529413|| AP: 0.006289862595004398 cls : basketball-court|| Recall: 0.37198795180722893 || Precison: 0.674863387978142|| AP: 0.33368834624831495 cls : baseball-diamond|| Recall: 0.3502252252252252 || Precison: 0.7300469483568075|| AP: 0.31682695380012843 mAP is : 0.2797293138362036 ` 2.当我用demo.py检测时,得到的_r.jpg和_h.jpg效果很差,基本上没有有效的检测框被标注出来。 另外提到一点,我在运行val_crop.py后对验证集val分割得到的图片和标注文件不能够完全一一对应,经过文件名的筛选,有139张图片没有对应的标注文件,也有139个标注文件没有对应的图片。因此我将val/images和val/annotation两个文件夹的文件做了一一对应的检查,然后运行eval.py得到了上面的结果。 我会继续排查mAP与目视解译效果不匹配的问题,也希望有人能给我一些帮助,谢谢:) 一点猜想:可能是我的标注文件有问题,所以用看起来效果很好的检测结果去eval的时候,好的检测结果匹配到了不正确的gt而降低??

    opened by Smoothing97 25
  • 代码和文章里的不一致

    代码和文章里的不一致

    你好,repo的代码只是选取了resnet的最后一层作为特征,但是文章里说用了dense fpn的结构,请问这是为啥?没有开放嘛? 文章是Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multiscale Rotation Region Convolutional Neural Network @yangxue0827

    opened by destinyzs 11
  • FYI : Both train_crop.py and val_crop.py are generating wrong coordinates!

    FYI : Both train_crop.py and val_crop.py are generating wrong coordinates!

    FYI: After you tag your images using labelme you run these scripts: train_crop.py val_crop.py

    So the coordinates generated after chopping the images are wrong and in some cases totally wrong. Don't rely on this and write your own code for generating cropped images.

    opened by dexception 8
  • Inference error

    Inference error

    I compile the code and run inference in TF1.10.1, it reports

    keep = tf.py_func(rotate_gpu_nms, inp=[det_tensor, iou_threshold, device_id], Tout=tf.int64) *** TypeError: cannot create weak reference to 'builtin_function_or_method' object

    Could you help? Thanks

    opened by shenfalong 8
  • question regarding input image resize

    question regarding input image resize

    hello,

    could you please shed some light on the input image resize that's performed in read_tfrecord.py-read_and_prepocess_single_img-image_preprocess.short_side_resize.

    firstly can you please explain why this is done ?

    additionally, can you also please explain why you have provided the default value of cfg.py: IMG_SHORT_SIDE_LEN = 800

    Thanks, Omer

    opened by omerbrandis 7
  • cfgs中的参数请教

    cfgs中的参数请教

    你好,想问一下lib/configs/cfgs中的这两个参数: IMG_SHORT_SIDE_LEN = 800 IMG_MAX_LENGTH = 1000 怎么用,因为我发现IMG_MAX_LENGTH参数貌似没有用到,所以想请教一些问题: 1)IMG_SHORT_SIDE_LEN可以随便设置吗?例如我有一批图片,最短边为200,必须设置为200吗? 2)如果我要在512*512的图片上面进行训练,这两个参数应该如何设置? 期待你的回复~

    opened by Tangzixia 7
  • 关于训练结果的问题

    关于训练结果的问题

    您好楼主!

    我想问一下,我使用咱们R2CNN程序和训练好的权重文件108000.ckpt,在DOTA的val上跑结果。最终显示mAP只有0.38。然后各种调参数,重新从头训练到50w次了,mAP最多也只到0.48。同样的问题还有RRPN。 有点想不通这精度的差距能差在哪里,会跟机器和配置啥有关吗?还是我有什么地方没有注意到么?

    望大哥解惑,谢谢了~~

    opened by dexterod 5
  • why I just get  mAP = 0.48 using the provided trained model on val dataset of DOTA?

    why I just get mAP = 0.48 using the provided trained model on val dataset of DOTA?

    I use 108000.ckpt, however I only got total mAP of 0.48 on DOTA val dataset. I didn't change anything of cfg parameters. I use the official DOTA_devkit to eval, it seems that some classes doesn't work well, like small vehicle only got 0.2 mAP.

    opened by clw5180 5
  • Try to run

    Try to run "python setup.py build_ext --inplace" in Windows 10

    C:\Users\Downloads\Compressed\R2CNN_Faster-RCNN_Tensorflow-master\libs\box_utils>python setup.py build_ext --inplace running build_ext Traceback (most recent call last): File "setup.py", line 132, in cmdclass={'build_ext': custom_build_ext}, File "C:\Users\Anaconda3\lib\distutils\core.py", line 148, in setup dist.run_commands() File "C:\Users\Anaconda3\lib\distutils\dist.py", line 966, in run_commands self.run_command(cmd) File "C:\Users\Anaconda3\lib\distutils\dist.py", line 985, in run_command cmd_obj.run() File "C:\Users\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run _build_ext.build_ext.run(self) File "C:\Users\slcib\Anaconda3\lib\distutils\command\build_ext.py", line 340, in run self.build_extensions() File "setup.py", line 104, in build_extensions customize_compiler_for_nvcc(self.compiler) File "setup.py", line 77, in customize_compiler_for_nvcc default_compiler_so = self.compiler_so AttributeError: 'MSVCCompiler' object has no attribute 'compiler_so'

    opened by Cibi075 3
  • Bad result for horizontal/vertical object

    Bad result for horizontal/vertical object

    Hi, thanks for sharing your great work! I'm trying to use your model in text object detection. It predict very well for inclined object. However, With horizontal/vertical Object, the rotated box is bad as you can see images below Horizontal Object : https://drive.google.com/open?id=18yNNqqpmAEg6CAyqmXRGzvthAFpqOeFp Vertical Object : https://drive.google.com/open?id=1M2PGxyXaBeYwgIPLv71w9veh0rfbCNrd

    Any suggestion on fixing this?

    Best, Vu

    opened by vudq 3
  • how to generate PB file

    how to generate PB file

    I use ./lib/export_pbs to generate PB file, but It not work. the error as follow : ValueError: Shape must be rank 4 but is rank 5 for 'resnet_v1_101/Pad' (op: 'Pad') with input shapes: [1,1,?,?,3], [4,2]. can you tell me? how can I get the PB file, Thanks @yangxue0827

    opened by pureyangcry 3
  • Cannot use train.crop.py

    Cannot use train.crop.py

    ('class_list', 15) ('find image', 7) ('find label', 7) (0, 'read image', 'P2598.png') Traceback (most recent call last): File "train_crop.py", line 243, in clip_image(img.strip('.png'), img_data, box, 800, 800) File "train_crop.py", line 199, in clip_image save_to_xml(xml, subImage.shape[0], subImage.shape[1], box[idx, :], class_list) File "train_crop.py", line 116, in save_to_xml f = open(save_path,'w') IOError: [Errno 2] No such file or directory: 'rootdir/r2cnn/R2CNN_Faster-RCNN_Tensorflow/trail/save/labeltxt/P2598_0000_0256.xml'

    opened by prayagpawar 0
  •  ValueError: could not convert string to float: imagesource:GoogleEarth

    ValueError: could not convert string to float: imagesource:GoogleEarth

    File "txt2xml.py", line 124, in load_annoataion x1, y1, x2, y2, x3, y3, x4, y4 = list(map(float, line[:8])) ValueError: could not convert string to float: imagesource:GoogleEarth

    opened by prayagpawar 0
  • The inference result is very different from what you expected. Which part should be modified?.

    The inference result is very different from what you expected. Which part should be modified?.

    The inference result is very different from what you expected. Which part should be modified?..... No action has been taken other than the guidelines presented in README.md.

    parking02 jpg_h infer_face01 infer_face02

    opened by chungnyul 1
  • Error pop outs when adjusting parameter in cfgs.py.

    Error pop outs when adjusting parameter in cfgs.py.

    Hi, recently I want to improve the result on my own dataset. Due to the reason that my result has many FP predicted bbox, so I try to adjust the parameter like show below:

    1. FAST_RCNN_NMS_IOU_THRESHOLD = 0.5
      2.FAST_RCNN_IOU_POSITIVE_THRESHOLD = 0.8 3.FAST_RCNN_IOU_NEGATIVE_THRESHOLD = 0.5

    When I was training , console popped out an error message:

    2021-03-05 09:03:25.126176: F tensorflow/stream_executor/cuda/cuda_dnn.cc:430] could not convert BatchDescriptor {count: 0 feature_map_count: 512 spatial: 7 7 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM Aborted (core dumped)

    Can someone tell me why,Thanks!

    opened by chrischu83117 0
  • Update train_crop.py

    Update train_crop.py

    Modify: int(xy) --> int(float(xy))

    Solving this issue: Traceback (most recent call last): File "data/io/DOTA/train_crop.py", line 244, in box = format_label(txt_data) File "data/io/DOTA/train_crop.py", line 136, in format_label [int(xy) for xy in i.split(' ')[:8]] + [class_list.index(i.split(' ')[8])] File "data/io/DOTA/train_crop.py", line 136, in [int(xy) for xy in i.split(' ')[:8]] + [class_list.index(i.split(' ')[8])] ValueError: invalid literal for int() with base 10: '1786.0'

    opened by mouyuanyap 0
Owner
UCAS-Det
UCAS-Det
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 3, 2023
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

null 453 Dec 28, 2022
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 4, 2022
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

null 1.1k Dec 30, 2022
deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications

Automatic Weapon Detection Deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications. Loved the pro

Janhavi 4 Mar 4, 2022
Shape Detection - It's a shape detection project with OpenCV and Python.

Shape Detection It's a shape detection project with OpenCV and Python. Setup pip install opencv-python for doing AI things. pip install simpleaudio fo

null 1 Nov 26, 2022
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.

Charset Detection, for Everyone ?? The Real First Universal Charset Detector A library that helps you read text from an unknown charset encoding. Moti

TAHRI Ahmed R. 332 Dec 31, 2022
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 2, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

null 86 Oct 5, 2022
Semantic-based Patch Detection for Binary Programs

PMatch Semantic-based Patch Detection for Binary Programs Requirement tensorflow-gpu 1.13.1 numpy 1.16.2 scikit-learn 0.20.3 ssdeep 3.4 Usage tar -xvz

Mr.Curiosity 3 Sep 2, 2022
Fatigue Driving Detection Based on Dlib

Fatigue Driving Detection Based on Dlib

null 5 Dec 14, 2022