Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

Overview

SSTDNet

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch.

This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)

[How to use]

  1. you need dataset.
  • dataset structure is..

    /train/0.jpg, /train/0.txt, /valid/0.jpg, /valid/0.txt, ....

  • 0.txt contain position and label of objects like below

    (xmin, ymin, xmax, ymax, label)

    1273.0 935.0 1407.0 1017.0 v1

    911.0 893.0 979.0 953.0 v1

    984.0 889.0 1053.0 948.0 v1

  • To encode label name to integer number, you should define labels in the 'class_lable_map.xlsx"

    v1 1

    v2 2

    ....

    * start from 1. not from 0. 0 will be background (in the loss.py).
  1. need some settings for dataset reader.

    - see train.py. you can find some code for reading dataset

    
      'trainset = ListDataset(root="../train", gt_extension=".txt", labelmap_path="class_label_map.xlsx", is_train=True, transform=transform, input_image_size=512, num_crops=n_crops, original_img_size=2048)'
      
    • you should set the 'input_image_size' and 'original_img_size'. 'input_image_size' is size of (cropped) image for train. And 'original_img_size' is size of (original) image. I made this parameter to handle high resolution image. if you don't need crop function, -1 for num_crops.
  2. Train with your dataset!

    you should define some parameter like learning rate, which optimizer to use, size of batch etc.

Comments
  • Error while training

    Error while training

    Traceback (most recent call last): File "train.py", line 192, in train(epoch) File "train.py", line 133, in train loss = ((loc_loss + cls_loss) / num_matched_anchors) + mask_loss RuntimeError: invalid argument 3: divide by zero at /pytorch/torch/lib/THC/generic/THCTensorMathPairwise.cu:88

    The error occurs while training the model...how should i solve it?

    opened by xsvonjhs 6
  • Prepare dataset

    Prepare dataset

    Hi, I've downloaded a public dataset with annotation, and I've followed the instructions on README, but i'm not sure whether I can just proceed like that. I see there is a resize function on datagen.py, does it mean I can include image with different sizes/rectangular image? Also, if there is a resize function, will the annotation be affected? Should I change it to relative value instead?

    Thanks in advance!

    opened by xsvonjhs 4
  • anchor areas

    anchor areas

    I have a question about anchor_areas, the anchor_areas in encoder.py of your code is [1616., 3232., ..., 256*256.], and I want to know the reason you set them. I think they are correlated with feature maps, but I can't get the explicit relation.

    opened by ran337287 4
  • SSTD net details problem

    SSTD net details problem

    Hi, HotaekHan, thanks for sharing the code.

    I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

    (1) In the deconvolution part, I see that you use groups=64 to upsample. But generally groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

    (2) The original paper uses deconv33, conv11 to eastablish attention map. I see that you're using deconv1616 and two conv33 to do it. Does it mean that this implementation is better than that in the original paper?

    It's a very nice code and I really appretite your comment!

    Thanks

    opened by weishuanglong 4
  • training label

    training label

    so nice to share the code here. I have a question, the text bounding box may be incline in one image. so to determine a inline bounding box, (xmin, ymin, xmax, ymax) is not enough, for example, we may need three points to determine a bounding box. why here, you only use (xmin, ymin, xmax, ymax) for training labels? thank!

    opened by mensaochun 4
  • Decoding is very slow

    Decoding is very slow

    I tested your code with image size 512, and is take a lot of time to decode.

    Elapsed time of pred : 91.725ms
    Decoding..
    Elapsed time of decode : 114360.36300000001ms
    Avg. elapsed time of pred : 153.09623809523805ms
    Avg. elapsed time of decode : 65703.0309047619ms
    

    I learned that NSM function will run slowly in image with many objects. How can i improve its performance.

    opened by hajaulee 2
  • Type Error

    Type Error

    Epoch: 0 Traceback (most recent call last): File "train.py", line 194, in train(epoch) File "train.py", line 118, in train for batch_idx, (inputs, loc_targets, cls_targets, mask_targets) in enumerate(trainloader): File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next return self._process_next_batch(batch) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/xendity/SSTDNet/datagen.py", line 492, in collate_fn loc_target, cls_target = self.data_encoder.encode(boxes[i], labels[i], input_size=(max_w,max_h)) File "/home/xendity/SSTDNet/encoder.py", line 92, in encode anchor_boxes = self._get_anchor_boxes(input_size) File "/home/xendity/SSTDNet/encoder.py", line 66, in _get_anchor_boxes xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2) RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'

    Hi, I ran train.py and got two or three type error like this. How should I modify the code?

    opened by xsvonjhs 0
  • about text detetion

    about text detetion

    The ori-paper works for text detection,but why this repo say “This code is work for general object detection problem. not for (oriented) text detection problem”?

    opened by jamesbondzhou 0
  • how to gen train data?

    how to gen train data?

    How to prepare training data? After I run python3 datagen.py, errors happens

    Traceback (most recent call last):
      File "datagen.py", line 540, in <module>
        test()
      File "datagen.py", line 531, in test
        for images, loc_targets, cls_targets, mask_targets in dataloader:
      File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 310, in __iter__
        return DataLoaderIter(self)
      File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 180, in __init__
        self._put_indices()
      File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
        indices = next(self.sample_iter, None)
      File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 119, in __iter__
        for idx in self.sampler:
      File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 50, in __iter__
        return iter(torch.randperm(len(self.data_source)).long())
    RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184
    

    Thanks!

    opened by FantDing 1
Owner
HotaekHan
HotaekHan
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 4, 2023
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

null 17 Dec 8, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

null 2.9k Jan 2, 2023
Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

Jan Zdenek 208 Nov 15, 2022
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

Dejia Song 544 Dec 20, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

null 365 Dec 20, 2022
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

null 188 Dec 28, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

CVLab@StonyBrook 354 Jan 1, 2023
TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

Gu Xiaodong 44 Dec 29, 2022
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 3, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 5, 2022
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

null 4 Nov 6, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

Siva Prakash 11 Jan 2, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022