caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

You might also like...
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

End-to-end pipeline for real-time scene text detection and recognition.
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

 AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car
Comments
  • Message type

    Message type "caffe.LayerParameter" has no field named "roi_pooling_param"

    make:进入目录'/home/xuy/code/R2CNN/src/lib/fastnms' make: “adaptor.so”是最新的。 make:离开目录“/home/xuy/code/R2CNN/src/lib/fastnms” WARNING: Logging before InitGoogleLogging() is written to STDERR W0407 13:56:59.493078 26902 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface W0407 13:56:59.493132 26902 _caffe.cpp:140] Use this instead (with the named "weights" parameter): W0407 13:56:59.493141 26902 _caffe.cpp:142] Net('/home/xuy/code/R2CNN/model/prototxt/test/TextBoxes-v3.prototxt', 1, weights='/home/xuy/code/R2CNN/model/caffemodel/TextBoxes-v2_iter_12w.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 483:21: Message type "caffe.LayerParameter" has no field named "roi_pooling_param". F0407 13:56:59.494863 26902 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/xuy/code/R2CNN/model/prototxt/test/TextBoxes-v3.prototxt *** Check failure stack trace: *** Aborted (core dumped)

    opened by gittigxuy 9
  • struct problem

    struct problem

    Thank your for sharing your code.I did not find the "logs" file in your project,but your "code structor" contains a "logs" file.Maybe i make something wrong? Looking forward to your reply.

    opened by wenmingMeng 2
  • 你好,请问该份代码是否可以用于其他场景的检测,比如检测多种不同的车(bus/car/train),这种多类别的情况?

    你好,请问该份代码是否可以用于其他场景的检测,比如检测多种不同的车(bus/car/train),这种多类别的情况?

    你好,我在test输出数据的时候print了predbbox和score,发现bbox的数据集中在第二个bbox中: -> print(scores[index]) (Pdb) r [9.9988842e-01 6.6253820e-06 5.9292459e-05 4.0550061e-05 5.0738868e-06] [ 0. 0. 0. 0. 0. 0. 0. 0. 43.752 96.91591 482.25037 96.84546 482.31024 469.7978 43.811916 469.86823 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] 于是我就在猜想您的代码原来是写来检测text的,即检测单类别,那我如果用来检测多类别是否有哪里需要改动?

    opened by ch98road 1
  • 使用您提供的roidb无法train

    使用您提供的roidb无法train

    您好,看到您说 or you can load cached roidb data of this project, and place it to src/data/ , [](可是我下载代码看了之后发现https://github.com/beacandler/R2CNN/blob/19d23828885010de4f4411c4ccd01a3032416c4e/src/lib/roi_data_layer/roidb.py#L27, 您的roidb数据的len是15000多个,可是依照我们的icdar的dataset, len(imdb.image_index)只有2000,所以您的大部分数据无法进行roidb[i]['image'] = imdb.image_path_at(i) roidb[i]['width'] = sizes[i][0] roidb[i]['height'] = sizes[i][1] 这些操作,因为i只能到2000,所以能不能麻烦您提供一下dataset, 需要里边的image信息才能train起来,或者您生成这个merged dataset的代码

    opened by Haiyan-Chris-Wang 0
Owner
candler
a computer vision worker
candler
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 3, 2023
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

null 453 Dec 28, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022