caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

candler

Last update: Dec 28, 2021

Related tags

Overview

R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Abstract
Structor
Installation
Demo
Test
Train
Experiments
Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make

Download Model

please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.

Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/

    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches	Anchor Scales	Pooled sizes	Inclined NMS	Test scales(short side)	F-measure(Mine VS paper)
R²CNN-2	(4, 8, 16)	(7, 7)	Y	(720)	71.12% VS 68.49%
R²CNN-3	(4, 8, 16)	(7, 7)	Y	(720)	73.10% VS 74.29%
R²CNN-4	(4, 8, 16, 32)	(7, 7)	Y	(720)	74.14% VS 74.36%
R²CNN-4	(4, 8, 16, 32)	(7, 7)	Y	(720, 1200)	79.05% VS 81.80%
R²CNN-5	(4, 8, 16, 32)	(7, 7) (11, 3) (3, 11)	Y	(720)	74.34% VS 75.34%
R²CNN-5	(4, 8, 16, 32)	(7, 7) (11, 3) (3, 11)	Y	(720, 1200)	78.70% VS 82.54%

Appendixes

Approaches	Anchor Scales	aspect ration	Pooled sizes	Inclined NMS	Test scales(short side)	F-measure
R²CNN-4	(4, 8, 16, 32)	(0.5, 1, 2)	(7, 7)	Y	(720)	74.36%
R²CNN-4	(4, 8, 16, 32)	(0.5, 1, 2)	(7, 7)	Y	(720, 1200)	VS 81.80%
R²CNN-4-TextBoxes-OHEM	(4, 8, 16, 32)	(0.5, 1, 2, 3, 5, 7, 10)	(7, 7)	Y	(720)	76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

102 Jun 29, 2022

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

101 Dec 12, 2022

A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

43 Mar 15, 2022

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

763 Jan 1, 2023

A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

170 Dec 26, 2022

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

AdvancedEAST AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Dete

1.2k Dec 29, 2022

Comments

Message type "caffe.LayerParameter" has no field named "roi_pooling_param"

make:进入目录'/home/xuy/code/R2CNN/src/lib/fastnms' make: “adaptor.so”是最新的。 make:离开目录“/home/xuy/code/R2CNN/src/lib/fastnms” WARNING: Logging before InitGoogleLogging() is written to STDERR W0407 13:56:59.493078 26902 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface W0407 13:56:59.493132 26902 _caffe.cpp:140] Use this instead (with the named "weights" parameter): W0407 13:56:59.493141 26902 _caffe.cpp:142] Net('/home/xuy/code/R2CNN/model/prototxt/test/TextBoxes-v3.prototxt', 1, weights='/home/xuy/code/R2CNN/model/caffemodel/TextBoxes-v2_iter_12w.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 483:21: Message type "caffe.LayerParameter" has no field named "roi_pooling_param". F0407 13:56:59.494863 26902 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/xuy/code/R2CNN/model/prototxt/test/TextBoxes-v3.prototxt *** Check failure stack trace: *** Aborted (core dumped)

opened by gittigxuy 9
struct problem

Thank your for sharing your code.I did not find the "logs" file in your project,but your "code structor" contains a "logs" file.Maybe i make something wrong? Looking forward to your reply.

opened by wenmingMeng 2
你好，请问该份代码是否可以用于其他场景的检测，比如检测多种不同的车（bus/car/train），这种多类别的情况？

你好，我在test输出数据的时候print了predbbox和score，发现bbox的数据集中在第二个bbox中： -> print(scores[index]) (Pdb) r [9.9988842e-01 6.6253820e-06 5.9292459e-05 4.0550061e-05 5.0738868e-06] [ 0. 0. 0. 0. 0. 0. 0. 0. 43.752 96.91591 482.25037 96.84546 482.31024 469.7978 43.811916 469.86823 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] 于是我就在猜想您的代码原来是写来检测text的，即检测单类别，那我如果用来检测多类别是否有哪里需要改动？

opened by ch98road 1
使用您提供的roidb无法train

您好，看到您说 or you can load cached roidb data of this project, and place it to src/data/ ， [](可是我下载代码看了之后发现https://github.com/beacandler/R2CNN/blob/19d23828885010de4f4411c4ccd01a3032416c4e/src/lib/roi_data_layer/roidb.py#L27，您的roidb数据的len是15000多个，可是依照我们的icdar的dataset， len(imdb.image_index)只有2000，所以您的大部分数据无法进行roidb[i]['image'] = imdb.image_path_at(i) roidb[i]['width'] = sizes[i][0] roidb[i]['height'] = sizes[i][1] 这些操作，因为i只能到2000，所以能不能麻烦您提供一下dataset, 需要里边的image信息才能train起来，或者您生成这个merged dataset的代码

opened by Haiyan-Chris-Wang 0

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Related tags

Overview

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

Contents

Structor

Code structor

Data structor

Installation

Install caffe

Build

Download Model

Demo

Test

Single Test

Multi-scale Test

Train

Train data

Train commands

Experiments

Mine VS Paper

Appendixes

Furthermore

You might also like...

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

End-to-end pipeline for real-time scene text detection and recognition.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

A curated list of papers and resources for scene text detection and recognition

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

A toolbox of scene text detection and recognition

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

Comments

Message type "caffe.LayerParameter" has no field named "roi_pooling_param"

struct problem

你好，请问该份代码是否可以用于其他场景的检测，比如检测多种不同的车（bus/car/train），这种多类别的情况？

使用您提供的roidb无法train

Owner

candler

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

A novel region proposal network for more general object detection ( including scene text detection ).

Scene text detection and recognition based on Extremal Region(ER)

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Official implementation of Character Region Awareness for Text Detection (CRAFT)

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

R²CNN: Rotational Region CNN for Orientation Robust Scene Text Detection