Single Shot Text Detector with Regional Attention

Pan He

Last update: Dec 7, 2022

Related tags

Overview

Single Shot Text Detector with Regional Attention

Introduction

SSTD is initially described in our ICCV 2017 spotlight paper.

A third-party implementation of SSTD + Focal Loss. Thanks, Ho taek Han

If you find it useful in your research, please consider citing:

@inproceedings{panhe17singleshot,
      Title   = {Single Shot Text Detector with Regional Attention},
      Author  = {He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin},
      Note    = {Proceedings of Internatioanl Conference on Computer Vision (ICCV)},
      Year    = {2017}
      }
@inproceedings{panhe16readText,
      Title   = {Reading Scene Text in Deep Convolutional Sequences},
      Author  = {He, Pan and Huang, Weilin and Qiao, Yu and Loy, Chen Change and Tang, Xiaoou},
      Note    = {Proceedings of AAAI Conference on Artificial Intelligence, (AAAI)},
      Year    = {2016}
      }
@inproceedings{liu16ssd,
      Title   = {{SSD}: Single Shot MultiBox Detector},
      Author  = {Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
      Note    = {Proceedings of European Conference on Computer Vision (ECCV)},
      Year    = {2016}
      }

Installation

Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT

git clone https://github.com/BestSonny/SSTD.git
cd SSTD

Build the code. Please follow Caffe instruction to install all necessary packages and build it.

# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
make test -j8
# (Optional)
make runtest -j8
# build nms
cd examples/text
make
cd ..

Run the demo code. Download Model google drive, baiduyun and put it in text/model folder

cd examples
sh text/download.sh
mkdir text/result
python text/demo_test.py

Comments

when uses more than 2 graphic card to run, it brings problem.

@BestSonny ,Hi. when i use more than 2 graphic card to run , it wrong. like following: Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered. Check failed: error == status == CUBLAS_STATUS_SUCCESS (11 vs. 0)CUBLAS_STATUS_MAPPING_ERROR

I guess that, maybe "Annotated_mask_data_layer" is something wrong. but i am not sure. Have met the problem? please give me some advice. Thanks in advance.

opened by shiyuangogogo 10
i had cudnn version error

my cuda is 8.0 and cudnn is 5, but when i compile this code failed.

==== CXX .build_release/src/caffe/proto/caffe.pb.cc CXX src/caffe/blob.cpp In file included from ./include/caffe/util/device_alternate.hpp:40:0, from ./include/caffe/common.hpp:19, from ./include/caffe/blob.hpp:8, from src/caffe/blob.cpp:4: ./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’: ./include/caffe/util/cudnn.hpp:112:3: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’ CUDNN_CHECK(cudnnSetConvolution2dDescriptor(*conv, ^ In file included from ./include/caffe/util/cudnn.hpp:5:0, from ./include/caffe/util/device_alternate.hpp:40, from ./include/caffe/common.hpp:19, from ./include/caffe/blob.hpp:8, from src/caffe/blob.cpp:4: /usr/local/cuda/include/cudnn.h:537:27: note: declared here cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc, ^ Makefile:577: recipe for target '.build_release/src/caffe/blob.o' failed make: *** [.build_release/src/caffe/blob.o] Error 1

====

please told me your cudnn and cuda version

opened by afterimagex 4
opencv3?

Excuse me, I wanted to ask if I need opencv3 for this repository? I am getting errors which apparently are related to opencv3. I am using only opencv 2.4.9. And also do I have to use the caffe u ship? I have a caffe version on my ubuntu installation. Is yours modified somehow?

'make' command generates

~/Desktop/A_M-arbeit/G_Code/G_SSTD$ make -j8 LD -o .build_release/lib/libcaffe.so.1.0.0-rc3 /usr/bin/ld: cannot find -lopencv_imgcodecs /usr/bin/ld: cannot find -lopencv_videoio

opened by idefix92 4
when I run python text/demp_test.py,a error occured

warn("The default mode, 'constant', will be changed to 'reflect' in " F1206 10:15:04.766489 447 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped) How should I do for this?

opened by kmustriver 3
Orientation in cuda implementation

SSTD article says: "we use a softmax function for binary classification of text or non-text, and apply the smooth-l1 loss for regressing 5 parameters for each word bounding box, including a parameter for box orientation"

I can find the box orientation parameter in the .cpp implementation But I cannot find the box orientation parameter in the .cu implementation

The .cpp implementation (on CPU) cannot run because "mask_resize_layer.cpp:42] Not Implemented Yet"

Is there a way to get orientation from .cu implementation?

Have you already implemented the mask_resize_layer.cpp?

opened by SHaiHosh 2
I can't use python to create a MaskResize layer
i build caffe src code with maskresize layer successfully. but i can't use python to create a MaskResize layer. Could you tell me what should i do to correct it ?

Following is my python code:

name = '{}_mask_resize'.format(from_layers[i])

mask_resize_param = { 'output_height': 1, 'output_width': 1, 'factor_height': factors[i], 'factor_width': factors[i], } net[name] = L.MaskRisize(net.slice1, mask_resize_param=mask_resize_param)

run the python code , it shows ：

I0108 23:21:11.590718 20850 layer_factory.hpp:77] Creating layer conv4_3_mask_resize

F0108 23:21:11.590739 20850 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: MaskRisize (known types: AbsVal, Accuracy, AnnotatedData, AnnotatedDataMask, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MaskPooling, MaskResize, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData) *** Check failure stack trace: ***
opened by shiyuangogogo 2
In training period, How can you produce segmentation loss and seg result?

In training period, How can you produce segmentation loss and seg result? you use one softmaxwithlosslayer to calculate seg loss and one softmaxlayer to produce seg result? Or one softmaxlayer to produce seg result and a MultinomialLogisticLossLayer to produce seg loss?

opened by shiyuangogogo 2
how to convert data （look like [image, mask, bbox_label]）into lmdb?

#hi,BestSonny. Thanks for your good shared code. when i use the tool "convert_annoset_mask" to convert database into lmdb, i was OK. But when i train my net, i get a problem. It seems the tool "convert_annoset_mask" convert mask(single channel) into 3 channels. I don't knew where i did wrong. could you tell me why or share your "convert" shell ? Here is my "convert" command:

./build/tools/convert_annoset_mask --anno_type=detection --label_type=xml --label_map_file=/home/shi/caffe-ssd/data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/shi/data/ ### /home/shi/data/all_train.txt /home/shi/data/VOC12_AUG/lmdb/VOC12_train_lmdb

here is the training wrong what showed in Caffe:

I0105 14:35:56.477653 16084 net.cpp:100] Creating Layer seg_loss

I0105 14:35:56.477654 16084 net.cpp:434] seg_loss <- upscore I0105 14:35:56.477658 16084 net.cpp:434] seg_loss <- mask I0105 14:35:56.477661 16084 net.cpp:408] seg_loss -> seg_loss I0105 14:35:56.477670 16084 layer_factory.hpp:77] Creating layer seg_loss I0105 14:35:56.485399 16084 softmax_loss_layer.cpp:47] softmaxwithloss bottom[0] size: 2,21,320,320 I0105 14:35:56.485427 16084 softmax_loss_layer.cpp:50] softmaxwithloss bottom[1] size: 2,3,320,320 F0105 14:35:56.485437 16084 softmax_loss_layer.cpp:53] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (204800 vs. 614400) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.

opened by shiyuangogogo 2
Error with nms?

I am gettin following error when running dem_test:

File "text/demo_test.py", line 21, in from nms.gpu_nms import gpu_nms ImportError: No module named gpu_nm

and the contained files in nms are:

cpu_nms.pyx gpu_nms.pyx nms_kernel.cu gpu_nms.hpp init.py py_cpu_nms.py

I also executed: sh text/download.sh before starting demo_test.py Best!

Valentin

opened by idefix92 2
Model details different from original paper

Hi there,

I read the original paper and this implementation, and that's awsome!

However, I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally speaking, groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

(2) The original paper uses deconv3_3, conv1_1 to establish attention map. I see that you're using deconv16_16 and two conv3_3 to do it. Does it mean that this implementation is better than that in the original paper?

It's a very nice code and I really appretite your comment!

Thanks

opened by weishuanglong 1
Oriented Text detection

Hi, Thank you very much for sharing your code. Currently I'm trying to reproduce result of ICDAR2015 in your paper but I cannot find prototxt for oriented texts (especifically for ICDAR2015 sets). It would be wonderful if you can share the model and pre-trained model for oriented texts.

opened by bado-lee 1
About GPU memory Usage

Hello @BestSonny . Thank you for your contribution. i want to port your software to mobile. So currently, how many MB of GPU memory is used during test time.

opened by gachiemchiep 0

Owner

Pan He

Computer Vision Ph.D. Student @ UF MALT Lab

GitHub

TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

930 Jan 4, 2023

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

24 Apr 28, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 2, 2023

Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

208 Nov 15, 2022

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

544 Dec 20, 2022

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022

python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

38 Dec 5, 2022

Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 6, 2022

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

76 Dec 6, 2022

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

11 Jan 2, 2022

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

595 Dec 27, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

127 Dec 3, 2022

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

3.3k Dec 30, 2022

Single Shot Text Detector with Regional Attention

Related tags

Overview

Single Shot Text Detector with Regional Attention

Introduction

Installation

Comments

name = '{}_mask_resize'.format(from_layers[i])

I0108 23:21:11.590718 20850 layer_factory.hpp:77] Creating layer conv4_3_mask_resize

here is the training wrong what showed in Caffe:

I0105 14:35:56.477653 16084 net.cpp:100] Creating Layer seg_loss

Owner

Pan He

TextBoxes++: A Single-Shot Oriented Scene Text Detector

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

A tensorflow implementation of EAST text detector

Implementation of EAST scene text detector in Keras

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

python ocr using tesseract/ with EAST opencv detector

Augmenting Anchors by the Detector Itself

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network