TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

Yukang Wang

Last update: Dec 12, 2022

Related tags

Overview

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

Introduction

The code and trained models of:

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection, TIP 2019 [Paper]

Citation

Please cite the related works in your publications if it helps your research:


@article{xu2018textfield,
  title={TextField: Learning A Deep Direction Field for Irregular Scene Text Detection},
  author={Xu, Yongchao and Wang, Yukang and Zhou, Wei and Wang, Yongpan and Yang, Zhibo and Bai, Xiang},
  journal={arXiv preprint arXiv:1812.01393},
  year={2018}
}

Prerequisite

Caffe and SynthText pretrained model [Link]
Datasets: [Total-Text], [ICDAR2015]
OpenCV 3.4.3
MATLAB

Usage

1. Install Caffe

cp Makefile.config.example Makefile.config
# adjust Makefile.config (for example, enable python layer)
make all -j16
# make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make pycaffe

Please refer to Caffe Installation to ensure other dependencies.

2. Data and model preparation

# download datasets and pretrained model then
mkdir data && mv [your_dataset_folder] data/
mkdir models && mv [your_pretrained_model] models/

3. Training scripts

# an example on Total-Text dataset
cd examples/TextField/
python train.py --gpu [your_gpu_id] --dataset total --initmodel ../../models/synth_iter_800000.caffemodel

4. Evaluation scripts

# an example on Total-Text dataset
cd evaluation/total/
./eval.sh

Results and Trained Models

Total-Text

Recall	Precision	F-measure	Link
0.816	0.824	0.820	[Google drive]

*lambda=0.50 for post-processing

ICDAR2015

Recall	Precision	F-measure	Link
0.811	0.846	0.828	[Google drive]

*lambda=0.75 for post-processing

Comments

How to provide train datasets?

Thank you for sharing the approach. Could you please give an example or commit a demo on how you provide your datasets for training? Your train.py script used the DataLayer and reads data from data_dir='/home/wangyukang/dataset/' I think? But how is it formatted, what is the structure (image and gt in one folder? gt as images or as txt?)

opened by juicebox18 2
Any solution for this error?

CXX tools/caffe.cpp CXX tools/test_net.cpp CXX tools/finetune_net.cpp CXX examples/TextField/inference.cpp CXX .build_release/src/caffe/proto/caffe.pb.cc examples/TextField/inference.cpp: In function ‘int main(int, char**)’: examples/TextField/inference.cpp:43:3: error: reference to ‘shared_ptr’ is ambiguous shared_ptr <Net> net_; ^~~~~~~~~~

opened by BakingBrains 2
some questions

I managed to test your model which was already trained on total-text dataset, but I cannot reach your score. I want to know what is your environment? Such as GCC version, opencv-python version. Thanks!

opened by Yun-960 1
loss当做的num是什么

源代码 top[0].data[...] = np.sum((self.distL1**2)*(self.weightPos + self.weightNeg)) / bottom[0].num / 2. / np.sum(self.weightPos + self.weightNeg) 当中bottom[0].num中的num是什么，好像Python当中没有这个

opened by wangwangww 0

Question about the post processing.

Hi, TextField is a great work, but I'm confused about the post processing:

We apply a simple dilation δ (with 3 × 3 structuring element) to group the representatives of the same text instance. This is followed by a connected component labeling that forms candidate text instances. The text superpixel grouping is depicted in line 17- 21 of Algorithm 1.

and I found these in your post processing code:

for (int row = 0; row < rows_; row++)
        {
            float* ending_p = ending.ptr<float>(row);
            float* parent_p = parent.ptr<float>(row);
            float* dict_p = dict.ptr<float>(row);
            for (int col = 0; col < cols_; col++)
            {
                if (ending_p[col] == 1)
                {
                    for (int dilDepth = 1; dilDepth <= min((int)(1*dict_p[2*col+1]-16), 12); dilDepth++)  //
                    {
                        p.x = row+(int)parent_p[2*col]*dilDepth;
                        p.y = col+(int)parent_p[2*col+1]*dilDepth;
                        if (p.x >= 0 && p.x <= rows_-1 && pt.y >= 0 && pt.y <= cols_-1)
                        {
                            float* merged_ending_p = merged_ending.ptr<float>(p.x);
                            merged_ending_p[p.y] = 1;
                        }
                    }
                }
            }
        }

I understand that the dilate element kernel_size = 3x3 is for grouping the representatives of text instance, and the above code is for grouping the rest children pixels belonging to this instance, but I don't understand this: for (int dilDepth = 1; dilDepth <= min((int)(1*dict_p[2*col+1]-16), 12); dilDepth++), and dilDepth stands for the deepest depth, but the magic numbers 16 and 12 represent for what meaning?

Thanks in advance for your reply!

opened by zeyu-hello 1

Inference speed

I managed to retrain your approach with my own dataset and it performance quit well! However, the runtime/inference speed seems to be slower than compared with several other approaches (EAST e.g.) - especially if ported to a non GPU version. To you have any hints/ideas on how to improve the inference speed? Could the model retrained differently to better fit smaller inference scales?

opened by juicebox18 2

Owner

Yukang Wang

GitHub

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

30 Oct 22, 2022

A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT

151 Dec 12, 2022

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

102 Jun 29, 2022

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

80 Dec 28, 2021

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022

Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

155 Dec 6, 2022

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

309 Dec 6, 2022

A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

43 Mar 15, 2022

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

763 Jan 1, 2023

A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

170 Dec 26, 2022

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

AdvancedEAST AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Dete

1.2k Dec 29, 2022

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

Related tags

Overview

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

Introduction

Citation

Prerequisite

Usage

1. Install Caffe

2. Data and model preparation

3. Training scripts

4. Evaluation scripts

Results and Trained Models

Total-Text

ICDAR2015

Comments

How to provide train datasets?

Any solution for this error?

some questions

loss当做的num是什么

Question about the post processing.

Inference speed

Owner

Yukang Wang

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

A novel region proposal network for more general object detection ( including scene text detection ).

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

End-to-end pipeline for real-time scene text detection and recognition.

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Scene text detection and recognition based on Extremal Region(ER)

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

A curated list of papers and resources for scene text detection and recognition

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

A toolbox of scene text detection and recognition

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

OCR, Scene-Text-Understanding, Text Recognition

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Scene text recognition