Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

HotaekHan

Last update: Jan 5, 2022

Related tags

Computer Vision SSTDNet

Overview

SSTDNet

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch.

This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)

[How to use]

you need dataset.

dataset structure is..

/train/0.jpg, /train/0.txt, /valid/0.jpg, /valid/0.txt, ....
0.txt contain position and label of objects like below

(xmin, ymin, xmax, ymax, label)

1273.0 935.0 1407.0 1017.0 v1

911.0 893.0 979.0 953.0 v1

984.0 889.0 1053.0 948.0 v1
To encode label name to integer number, you should define labels in the 'class_lable_map.xlsx"
v1 1

v2 2

....
* start from 1. not from 0. 0 will be background (in the loss.py).

need some settings for dataset reader.

- see train.py. you can find some code for reading dataset
```
  'trainset = ListDataset(root="../train", gt_extension=".txt", labelmap_path="class_label_map.xlsx", is_train=True, transform=transform, input_image_size=512, num_crops=n_crops, original_img_size=2048)'
  
```
- you should set the 'input_image_size' and 'original_img_size'. 'input_image_size' is size of (cropped) image for train. And 'original_img_size' is size of (original) image. I made this parameter to handle high resolution image. if you don't need crop function, -1 for num_crops.
Train with your dataset!
you should define some parameter like learning rate, which optimizer to use, size of batch etc.

Comments

Error while training

Traceback (most recent call last): File "train.py", line 192, in train(epoch) File "train.py", line 133, in train loss = ((loc_loss + cls_loss) / num_matched_anchors) + mask_loss RuntimeError: invalid argument 3: divide by zero at /pytorch/torch/lib/THC/generic/THCTensorMathPairwise.cu:88

The error occurs while training the model...how should i solve it?

opened by xsvonjhs 6
Prepare dataset

Hi, I've downloaded a public dataset with annotation, and I've followed the instructions on README, but i'm not sure whether I can just proceed like that. I see there is a resize function on datagen.py, does it mean I can include image with different sizes/rectangular image? Also, if there is a resize function, will the annotation be affected? Should I change it to relative value instead?

Thanks in advance!

opened by xsvonjhs 4
anchor areas

I have a question about anchor_areas， the anchor_areas in encoder.py of your code is [1616., 3232., ..., 256*256.], and I want to know the reason you set them. I think they are correlated with feature maps, but I can't get the explicit relation.

opened by ran337287 4
SSTD net details problem

Hi, HotaekHan, thanks for sharing the code.

I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

(2) The original paper uses deconv33, conv11 to eastablish attention map. I see that you're using deconv1616 and two conv33 to do it. Does it mean that this implementation is better than that in the original paper?

It's a very nice code and I really appretite your comment!

Thanks

opened by weishuanglong 4
training label

so nice to share the code here. I have a question, the text bounding box may be incline in one image. so to determine a inline bounding box, (xmin, ymin, xmax, ymax) is not enough, for example, we may need three points to determine a bounding box. why here, you only use (xmin, ymin, xmax, ymax) for training labels? thank!

opened by mensaochun 4
Decoding is very slow
I tested your code with image size 512, and is take a lot of time to decode.

Elapsed time of pred : 91.725ms Decoding.. Elapsed time of decode : 114360.36300000001ms Avg. elapsed time of pred : 153.09623809523805ms Avg. elapsed time of decode : 65703.0309047619ms

I learned that NSM function will run slowly in image with many objects. How can i improve its performance.
opened by hajaulee 2
Type Error

Epoch: 0 Traceback (most recent call last): File "train.py", line 194, in train(epoch) File "train.py", line 118, in train for batch_idx, (inputs, loc_targets, cls_targets, mask_targets) in enumerate(trainloader): File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next return self._process_next_batch(batch) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/xendity/SSTDNet/datagen.py", line 492, in collate_fn loc_target, cls_target = self.data_encoder.encode(boxes[i], labels[i], input_size=(max_w,max_h)) File "/home/xendity/SSTDNet/encoder.py", line 92, in encode anchor_boxes = self._get_anchor_boxes(input_size) File "/home/xendity/SSTDNet/encoder.py", line 66, in _get_anchor_boxes xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2) RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'

Hi, I ran train.py and got two or three type error like this. How should I modify the code?

opened by xsvonjhs 0
about text detetion

The ori-paper works for text detection，but why this repo say “This code is work for general object detection problem. not for (oriented) text detection problem”？

opened by jamesbondzhou 0

how to gen train data?

How to prepare training data? After I run python3 datagen.py, errors happens

Traceback (most recent call last):
  File "datagen.py", line 540, in <module>
    test()
  File "datagen.py", line 531, in test
    for images, loc_targets, cls_targets, mask_targets in dataloader:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 310, in __iter__
    return DataLoaderIter(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 180, in __init__
    self._put_indices()
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
    indices = next(self.sample_iter, None)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 119, in __iter__
    for idx in self.sampler:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 50, in __iter__
    return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184

Thanks!

opened by FantDing 1

Owner

HotaekHan

GitHub

TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

930 Jan 4, 2023

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

DeepSceneTextReader This is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images. Prerequsites The proj

49 Sep 10, 2022

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

24 Apr 28, 2022

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

LayoutAnalysisEvaluator Layout Analysis Evaluator for: ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records ICD

17 Dec 8, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

A tensorflow implementation of EAST text detector

EAST: An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow re-implementation of EAST: An Efficient and Accurate Scene Text

2.9k Jan 2, 2023

Implementation of EAST scene text detector in Keras

EAST: An Efficient and Accurate Scene Text Detector This is a Keras implementation of EAST based on a Tensorflow implementation made by argman. The or

208 Nov 15, 2022

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

544 Dec 20, 2022

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

354 Jan 1, 2023

TextBoxes re-implement using tensorflow

TextBoxes-TensorFlow TextBoxes re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modified ba

44 Dec 29, 2022

An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

28 Nov 3, 2022

python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

38 Dec 5, 2022

Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 6, 2022

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

Related tags

Overview

SSTDNet

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch.

This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)

Comments

Error while training

Prepare dataset

anchor areas

SSTD net details problem

training label

Decoding is very slow

Type Error

about text detetion

how to gen train data?

Owner

HotaekHan

TextBoxes++: A Single-Shot Oriented Scene Text Detector

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法，textBoxes_note记录了之前整理的笔记。

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

A tensorflow implementation of EAST text detector

Implementation of EAST scene text detector in Keras

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

TextBoxes re-implement using tensorflow

An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

python ocr using tesseract/ with EAST opencv detector

Augmenting Anchors by the Detector Itself

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Image Detector and Convertor App created using python's Pillow, OpenCV, cvlib, numpy and streamlit packages.

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition