Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Overview

Scene-Text-Detection-with-SPCNET

Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.08605] with tensorflow.

参考代码

网络实现主要借鉴Keras版本的Mask-RCNN,训练数据接口参考了argman/EAST.论文作者在知乎的文章介绍SPCNet.

训练

1、训练数据准备

训练数据放在data/下,训练数据准备在data/icdar.py:

data

icdar2017

Annotaions //image_1.txt
JPEGImages //image_1.jpg
train.txt //存储训练图片的名称,例如:image_1

2、参数修改

修改./train.py中的学习率、batch、模型存储路径等参数,如果需要调整网络参数,在nets/config.py中修改。

3、执行训练

python train.py

代码运行环境:Python2.7 tensorflow-gpu1.13 单张1080Ti

测试

修改demo.py中的模型文件夹路径、测试图片路径,然后执行python demo.py

测试结果:论文中还有一些地方我也不确定,因此目前没有在公开数据集测试。值得注意的是,按照原文中的训练说明,最好在多卡上训练,请加大你的batch size.

值得注意的地方

1、global text segmentation(gts)的训练

计算gts训练时损失函数时,我采用的方法是将feature pyramid的各个level产生的gts分别与全局mask gt计算softmax loss,然后取平均作为Loss_gts。因为没找到与原文关于这一块的描述,因此可能是其他的计算方法:每个level准备不同的mask_gt、将多个level的gts预测融合计算loss等等。感兴趣的可以去问问作者或者自己试试。

2、实现Rescore 时gts的选取

计算predict box对应的pyramid level,然后选取对应的gts计算。还有一种思路是:融合P2,P3,P4,P5的gts,然后计算box rescore.

3、Bounding Box的生成

MASK RCNN中是先对输出的box进行阈值过滤以及NMS,然后将剩余的回归之后的box对应的rois送入mask branch计算mask,目的是减少计算量同时获得更准确的mask。SPCNet为了减小FP与FN,对Inference流程做了修改:先对模型输出的box与mask进行Rescore,然后经过threshold filter,再对剩下的mask求Bounding Box,然后利用Poly NMS减少重叠,输出剩下的。

在目前代码(nets/models.py utils.py)里:是先对模型输出的box与mask进行Rescore,然后经过threshold filter与NMS,再对剩下的mask求Bounding Box,然后直接输出。

You might also like...
End-to-end pipeline for real-time scene text detection and recognition.
End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

Scene text detection and recognition based on Extremal Region(ER)
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection (TIP 2019)

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Introduction The code and trained models of: TextField: Learning A Deep

A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Comments
  • Can't training this model

    Can't training this model

    I was trying for training this git but the program stop here so long time while i just push 1 image for testing. Generator use 10 batches for buffering, this may take a while 1 training images in data/train_form_new/ How can i reslove it bro?

    opened by cong235 0
  • 训练出model.ckpt文件来了,请问您有检测h-mean的思路吗,现在还不知如何着手

    训练出model.ckpt文件来了,请问您有检测h-mean的思路吗,现在还不知如何着手

    训练出model.ckpt文件来了,请问您有检测h-mean的思路吗,现在还不知如何着手

    Originally posted by @weiwei23456789 in https://github.com/AirBernard/Scene-Text-Detection-with-SPCNET/issues/6#issuecomment-513608517

    可以在ICDAR官网里找ICDAR2015,点My methods在线测试hmean。

    opened by Dilidulu523 0
  • IC15的训练结果计算hmean有点差

    IC15的训练结果计算hmean有点差

    训练集只用IC15,默认代码里的300000次step; loss普遍能下降几个数量级,但是结果不太好,使用IC15的测试集,计算hmean结果为: {'precision': 0.023610427939006393, 'recall': 0.023110255175734232, 'hmean': 0.023357664233576644, 'AP': 0}

    可以使用其他repo里的hmean计算脚本,注意修改bbox的数据代码 在./nets/utils.py中的unmold_detections函数中。将bbox的代码改为:

    ’‘’ points[:,[0,1]] = points[:,[1,0]] rect = cv2.minAreaRect(points) # (x1, y1, x2, y2, x3, y3, x4, y4) boundbox = cv2.boxPoints(rect) bound_boxes.append([boundbox[1], boundbox[2], boundbox[3], boundbox[0]]) full_masks.append(full_mask) full_masks = np.stack(full_masks, axis=-1)
    if full_masks else np.empty(original_image_shape[:2] + (0,)) bound_boxes = np.array(bound_boxes)
    if bound_boxes else np.empty((0,8)) ‘’‘

    运行推理的时候记得删除gts 具体的计算脚本可以找其他repo中的cal_recall_precison_f1

    opened by TuKJet 0
  • problem of convergence?

    problem of convergence?

    I have trained the model on a single GPU, batch_size=4, for a very long time. However the total loss seems to be around 1.0 and the performance of detection is very bad.

    opened by FredlinT 0
Owner
null
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 5, 2023
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022