TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

Overview

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

Introduction

The code and trained models of:

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection, TIP 2019 [Paper]

Citation

Please cite the related works in your publications if it helps your research:


@article{xu2018textfield,
  title={TextField: Learning A Deep Direction Field for Irregular Scene Text Detection},
  author={Xu, Yongchao and Wang, Yukang and Zhou, Wei and Wang, Yongpan and Yang, Zhibo and Bai, Xiang},
  journal={arXiv preprint arXiv:1812.01393},
  year={2018}
}

Prerequisite

Usage

1. Install Caffe

cp Makefile.config.example Makefile.config
# adjust Makefile.config (for example, enable python layer)
make all -j16
# make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make pycaffe

Please refer to Caffe Installation to ensure other dependencies.

2. Data and model preparation

# download datasets and pretrained model then
mkdir data && mv [your_dataset_folder] data/
mkdir models && mv [your_pretrained_model] models/

3. Training scripts

# an example on Total-Text dataset
cd examples/TextField/
python train.py --gpu [your_gpu_id] --dataset total --initmodel ../../models/synth_iter_800000.caffemodel

4. Evaluation scripts

# an example on Total-Text dataset
cd evaluation/total/
./eval.sh

Results and Trained Models

Total-Text

Recall Precision F-measure Link
0.816 0.824 0.820 [Google drive]

*lambda=0.50 for post-processing

ICDAR2015

Recall Precision F-measure Link
0.811 0.846 0.828 [Google drive]

*lambda=0.75 for post-processing

Comments
  • How to provide train datasets?

    How to provide train datasets?

    Thank you for sharing the approach. Could you please give an example or commit a demo on how you provide your datasets for training? Your train.py script used the DataLayer and reads data from data_dir='/home/wangyukang/dataset/' I think? But how is it formatted, what is the structure (image and gt in one folder? gt as images or as txt?)

    opened by juicebox18 2
  • Any solution for this error?

    Any solution for this error?

    CXX tools/caffe.cpp CXX tools/test_net.cpp CXX tools/finetune_net.cpp CXX examples/TextField/inference.cpp CXX .build_release/src/caffe/proto/caffe.pb.cc examples/TextField/inference.cpp: In function ‘int main(int, char**)’: examples/TextField/inference.cpp:43:3: error: reference to ‘shared_ptr’ is ambiguous shared_ptr <Net> net_; ^~~~~~~~~~

    opened by BakingBrains 2
  • some questions

    some questions

    I managed to test your model which was already trained on total-text dataset, but I cannot reach your score. I want to know what is your environment? Such as GCC version, opencv-python version. Thanks!

    opened by Yun-960 1
  • loss当做的num是什么

    loss当做的num是什么

    源代码 top[0].data[...] = np.sum((self.distL1**2)*(self.weightPos + self.weightNeg)) / bottom[0].num / 2. / np.sum(self.weightPos + self.weightNeg) 当中bottom[0].num中的num是什么,好像Python当中没有这个

    opened by wangwangww 0
  • Question about the post processing.

    Question about the post processing.

    Hi, TextField is a great work, but I'm confused about the post processing:

    We apply a simple dilation δ (with 3 × 3 structuring element) to group the representatives of the same text instance. This is followed by a connected component labeling that forms candidate text instances. The text superpixel grouping is depicted in line 17- 21 of Algorithm 1.

    and I found these in your post processing code:

    for (int row = 0; row < rows_; row++)
            {
                float* ending_p = ending.ptr<float>(row);
                float* parent_p = parent.ptr<float>(row);
                float* dict_p = dict.ptr<float>(row);
                for (int col = 0; col < cols_; col++)
                {
                    if (ending_p[col] == 1)
                    {
                        for (int dilDepth = 1; dilDepth <= min((int)(1*dict_p[2*col+1]-16), 12); dilDepth++)  //
                        {
                            p.x = row+(int)parent_p[2*col]*dilDepth;
                            p.y = col+(int)parent_p[2*col+1]*dilDepth;
                            if (p.x >= 0 && p.x <= rows_-1 && pt.y >= 0 && pt.y <= cols_-1)
                            {
                                float* merged_ending_p = merged_ending.ptr<float>(p.x);
                                merged_ending_p[p.y] = 1;
                            }
                        }
                    }
                }
            }
    

    I understand that the dilate element kernel_size = 3x3 is for grouping the representatives of text instance, and the above code is for grouping the rest children pixels belonging to this instance, but I don't understand this: for (int dilDepth = 1; dilDepth <= min((int)(1*dict_p[2*col+1]-16), 12); dilDepth++), and dilDepth stands for the deepest depth, but the magic numbers 16 and 12 represent for what meaning?

    Thanks in advance for your reply!

    opened by zeyu-hello 1
  • Inference speed

    Inference speed

    I managed to retrain your approach with my own dataset and it performance quit well! However, the runtime/inference speed seems to be slower than compared with several other approaches (EAST e.g.) - especially if ported to a non GPU version. To you have any hints/ideas on how to improve the inference speed? Could the model retrained differently to better fit smaller inference scales?

    opened by juicebox18 2
Owner
Yukang Wang
Yukang Wang
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

Fangneng Zhan 89 Aug 4, 2022
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

null 428 Nov 22, 2022
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Jan Zdenek 43 Mar 15, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 1, 2023
A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

FudanVIC Team 170 Dec 26, 2022
huoyijie 1.2k Dec 29, 2022
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
Scene text recognition

AttentionOCR for Arbitrary-Shaped Scene Text Recognition Introduction This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2

null 777 Jan 9, 2023