Single Shot Text Detector with Regional Attention

Overview

License

Single Shot Text Detector with Regional Attention

Introduction

SSTD is initially described in our ICCV 2017 spotlight paper.

A third-party implementation of SSTD + Focal Loss. Thanks, Ho taek Han

If you find it useful in your research, please consider citing:

@inproceedings{panhe17singleshot,
      Title   = {Single Shot Text Detector with Regional Attention},
      Author  = {He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin},
      Note    = {Proceedings of Internatioanl Conference on Computer Vision (ICCV)},
      Year    = {2017}
      }
@inproceedings{panhe16readText,
      Title   = {Reading Scene Text in Deep Convolutional Sequences},
      Author  = {He, Pan and Huang, Weilin and Qiao, Yu and Loy, Chen Change and Tang, Xiaoou},
      Note    = {Proceedings of AAAI Conference on Artificial Intelligence, (AAAI)},
      Year    = {2016}
      }
@inproceedings{liu16ssd,
      Title   = {{SSD}: Single Shot MultiBox Detector},
      Author  = {Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
      Note    = {Proceedings of European Conference on Computer Vision (ECCV)},
      Year    = {2016}
      }

Installation

  1. Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT
git clone https://github.com/BestSonny/SSTD.git
cd SSTD
  1. Build the code. Please follow Caffe instruction to install all necessary packages and build it.
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
make test -j8
# (Optional)
make runtest -j8
# build nms
cd examples/text
make
cd ..
  1. Run the demo code. Download Model google drive, baiduyun and put it in text/model folder
cd examples
sh text/download.sh
mkdir text/result
python text/demo_test.py
Comments
  • when uses more than 2 graphic card to run, it brings problem.

    when uses more than 2 graphic card to run, it brings problem.

    @BestSonny ,Hi. when i use more than 2 graphic card to run , it wrong. like following: Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered. Check failed: error == status == CUBLAS_STATUS_SUCCESS (11 vs. 0)CUBLAS_STATUS_MAPPING_ERROR

    I guess that, maybe "Annotated_mask_data_layer" is something wrong. but i am not sure. Have met the problem? please give me some advice. Thanks in advance.

    opened by shiyuangogogo 10
  • i had cudnn version error

    i had cudnn version error

    my cuda is 8.0 and cudnn is 5, but when i compile this code failed.

    ==== CXX .build_release/src/caffe/proto/caffe.pb.cc CXX src/caffe/blob.cpp In file included from ./include/caffe/util/device_alternate.hpp:40:0, from ./include/caffe/common.hpp:19, from ./include/caffe/blob.hpp:8, from src/caffe/blob.cpp:4: ./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’: ./include/caffe/util/cudnn.hpp:112:3: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’ CUDNN_CHECK(cudnnSetConvolution2dDescriptor(*conv, ^ In file included from ./include/caffe/util/cudnn.hpp:5:0, from ./include/caffe/util/device_alternate.hpp:40, from ./include/caffe/common.hpp:19, from ./include/caffe/blob.hpp:8, from src/caffe/blob.cpp:4: /usr/local/cuda/include/cudnn.h:537:27: note: declared here cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc, ^ Makefile:577: recipe for target '.build_release/src/caffe/blob.o' failed make: *** [.build_release/src/caffe/blob.o] Error 1

    ====

    please told me your cudnn and cuda version

    opened by afterimagex 4
  • opencv3?

    opencv3?

    Excuse me, I wanted to ask if I need opencv3 for this repository? I am getting errors which apparently are related to opencv3. I am using only opencv 2.4.9. And also do I have to use the caffe u ship? I have a caffe version on my ubuntu installation. Is yours modified somehow?

    'make' command generates

    ~/Desktop/A_M-arbeit/G_Code/G_SSTD$ make -j8 LD -o .build_release/lib/libcaffe.so.1.0.0-rc3 /usr/bin/ld: cannot find -lopencv_imgcodecs /usr/bin/ld: cannot find -lopencv_videoio

    opened by idefix92 4
  • when I run python text/demp_test.py,a error occured

    when I run python text/demp_test.py,a error occured

    warn("The default mode, 'constant', will be changed to 'reflect' in " F1206 10:15:04.766489 447 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted (core dumped) How should I do for this?

    opened by kmustriver 3
  • Orientation in cuda implementation

    Orientation in cuda implementation

    SSTD article says: "we use a softmax function for binary classification of text or non-text, and apply the smooth-l1 loss for regressing 5 parameters for each word bounding box, including a parameter for box orientation"

    I can find the box orientation parameter in the .cpp implementation But I cannot find the box orientation parameter in the .cu implementation

    The .cpp implementation (on CPU) cannot run because "mask_resize_layer.cpp:42] Not Implemented Yet"

    Is there a way to get orientation from .cu implementation?

    Have you already implemented the mask_resize_layer.cpp?

    opened by SHaiHosh 2
  • I can't use python to create a MaskResize layer

    I can't use python to create a MaskResize layer

         i build caffe src code with maskresize  layer  successfully. but  i can't  use python to create a MaskResize layer.  Could you tell me what should i do to correct it ?
    

    Following is my python code:

    name = '{}_mask_resize'.format(from_layers[i])

            mask_resize_param = {
                    'output_height': 1,
                    'output_width': 1,
                    'factor_height': factors[i],
                    'factor_width': factors[i],
                    }
            net[name] = L.MaskRisize(net.slice1, mask_resize_param=mask_resize_param)
    

    run the python code , it shows :

    I0108 23:21:11.590718 20850 layer_factory.hpp:77] Creating layer conv4_3_mask_resize

    F0108 23:21:11.590739 20850 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: MaskRisize (known types: AbsVal, Accuracy, AnnotatedData, AnnotatedDataMask, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MaskPooling, MaskResize, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData) *** Check failure stack trace: ***

    opened by shiyuangogogo 2
  • In training period, How can you produce segmentation loss and seg result?

    In training period, How can you produce segmentation loss and seg result?

    In training period, How can you produce segmentation loss and seg result? you use one softmaxwithlosslayer to calculate seg loss and one softmaxlayer to produce seg result? Or one softmaxlayer to produce seg result and a MultinomialLogisticLossLayer to produce seg loss?

    opened by shiyuangogogo 2
  • how to convert data (look like [image, mask, bbox_label])into lmdb?

    how to convert data (look like [image, mask, bbox_label])into lmdb?

    #hi,BestSonny. Thanks for your good shared code. when i use the tool "convert_annoset_mask" to convert database into lmdb, i was OK. But when i train my net, i get a problem. It seems the tool "convert_annoset_mask" convert mask(single channel) into 3 channels. I don't knew where i did wrong. could you tell me why or share your "convert" shell ? Here is my "convert" command:

    ./build/tools/convert_annoset_mask --anno_type=detection --label_type=xml --label_map_file=/home/shi/caffe-ssd/data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/shi/data/ ### /home/shi/data/all_train.txt /home/shi/data/VOC12_AUG/lmdb/VOC12_train_lmdb

    here is the training wrong what showed in Caffe:

    I0105 14:35:56.477653 16084 net.cpp:100] Creating Layer seg_loss

    I0105 14:35:56.477654 16084 net.cpp:434] seg_loss <- upscore I0105 14:35:56.477658 16084 net.cpp:434] seg_loss <- mask I0105 14:35:56.477661 16084 net.cpp:408] seg_loss -> seg_loss I0105 14:35:56.477670 16084 layer_factory.hpp:77] Creating layer seg_loss I0105 14:35:56.485399 16084 softmax_loss_layer.cpp:47] softmaxwithloss bottom[0] size: 2,21,320,320 I0105 14:35:56.485427 16084 softmax_loss_layer.cpp:50] softmaxwithloss bottom[1] size: 2,3,320,320 F0105 14:35:56.485437 16084 softmax_loss_layer.cpp:53] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (204800 vs. 614400) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.

    opened by shiyuangogogo 2
  • Error with nms?

    Error with nms?

    I am gettin following error when running dem_test:

    File "text/demo_test.py", line 21, in from nms.gpu_nms import gpu_nms ImportError: No module named gpu_nm

    and the contained files in nms are:

    cpu_nms.pyx gpu_nms.pyx nms_kernel.cu gpu_nms.hpp init.py py_cpu_nms.py

    I also executed: sh text/download.sh before starting demo_test.py Best!

    Valentin

    opened by idefix92 2
  • Model details different from original paper

    Model details different from original paper

    Hi there,

    I read the original paper and this implementation, and that's awsome!

    However, I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

    (1) In the deconvolution part, I see that you use groups=64 to upsample. But generally speaking, groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

    (2) The original paper uses deconv3_3, conv1_1 to establish attention map. I see that you're using deconv16_16 and two conv3_3 to do it. Does it mean that this implementation is better than that in the original paper?

    It's a very nice code and I really appretite your comment!

    Thanks

    opened by weishuanglong 1
  • Oriented Text detection

    Oriented Text detection

    Hi, Thank you very much for sharing your code. Currently I'm trying to reproduce result of ICDAR2015 in your paper but I cannot find prototxt for oriented texts (especifically for ICDAR2015 sets). It would be wonderful if you can share the model and pre-trained model for oriented texts.

    opened by bado-lee 1
  • About GPU memory Usage

    About GPU memory Usage

    Hello @BestSonny . Thank you for your contribution. i want to port your software to mobile. So currently, how many MB of GPU memory is used during test time.

    opened by gachiemchiep 0
Owner
Pan He
Computer Vision Ph.D. Student @ UF MALT Lab
Pan He
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 4, 2023
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

null 2.9k Jan 2, 2023
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

Dejia Song 544 Dec 20, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

null 365 Dec 20, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

null 188 Dec 28, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 5, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

null 4 Nov 6, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Siva Prakash 11 Jan 2, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022