PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Overview

News

#f03c15 Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch.

[1] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9336–9345, 2019.
[2] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comp. Vis., pages 8440–8449, 2019.
[3] Paper is in preparation.

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

  • Python 2.7
  • PyTorch v0.4.1+
  • pyclipper
  • Polygon2
  • OpenCV 3.4 (for c++ version pse)
  • opencv-python 3.4

Introduction

Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_ic15.py

Testing

CUDA_VISIBLE_DEVICES=0 python test_ic15.py --scale 1 --resume [path of model]

Eval script for ICDAR 2015 and SCUT-CTW1500

cd eval
sh eval_ic15.sh
sh eval_ctw1500.sh

Performance (new version paper)

ICDAR 2015

Method Extra Data Precision (%) Recall (%) F-measure (%) FPS (1080Ti) Model
PSENet-1s (ResNet50) - 81.49 79.68 80.57 1.6 baiduyun(extract code: rxti); OneDrive
PSENet-1s (ResNet50) pretrain on IC17 MLT 86.92 84.5 85.69 1.6 baiduyun(extract code: aieo); OneDrive
PSENet-4s (ResNet50) pretrain on IC17 MLT 86.1 83.77 84.92 3.8 baiduyun(extract code: aieo); OneDrive

SCUT-CTW1500

Method Extra Data Precision (%) Recall (%) F-measure (%) FPS (1080Ti) Model
PSENet-1s (ResNet50) - 80.57 75.55 78.0 3.9 baiduyun(extract code: ksv7); OneDrive
PSENet-1s (ResNet50) pretrain on IC17 MLT 84.84 79.73 82.2 3.9 baiduyun(extract code: z7ac); OneDrive
PSENet-4s (ResNet50) pretrain on IC17 MLT 82.09 77.84 79.9 8.4 baiduyun(extract code: z7ac); OneDrive

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 87.98 83.87 85.88
PSENet-2s (ResNet152) 89.30 85.22 87.21
PSENet-1s (ResNet152) 88.71 85.51 87.08

ICDAR 2017 MLT

Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 75.98 67.56 71.52
PSENet-2s (ResNet152) 76.97 68.35 72.40
PSENet-1s (ResNet152) 77.01 68.40 72.45

SCUT-CTW1500

Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 80.49 78.13 79.29
PSENet-2s (ResNet152) 81.95 79.30 80.60
PSENet-1s (ResNet152) 82.50 79.89 81.17

ICPR MTWI 2018 Challenge 2

Method Precision (%) Recall (%) F-measure (%)
PSENet-1s (ResNet152) 78.5 72.1 75.2

Results

Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT-CTW1500

Paper Link

[new version paper] https://arxiv.org/abs/1903.12473

[old version paper] https://arxiv.org/abs/1806.02559

Other Implements

[tensorflow version (thanks @liuheng92)] https://github.com/liuheng92/tensorflow_PSENet

Citation

@inproceedings{wang2019shape,
  title={Shape Robust Text Detection With Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}
Comments
  • about random_crop()

    about random_crop()

    作者您好,我发现random_crop()函数得到的training mask中,很多裁剪结果没有包含任何valid text instances,即train_ic15.py/ohem_single()中的pos_num的值为0,我用您的训练代码跑了10个epoch,测试部分代码以及结果如下:

    ...
    total_sample = 0
    zero_sample = 0
    def ohem_single(score, gt_text, training_mask):
        pos_num = (int)(np.sum(gt_text > 0.5)) - (int)(np.sum((gt_text > 0.5) & (training_mask <= 0.5)))
        print('pos_num: ', pos_num)
        global total_sample, zero_sample
        total_sample += 1
        if pos_num==0:
            zero_sample += 1
        print('black_ratio: {}/{} = {:.4f}%'.format(zero_sample, total_sample, zero_sample*100.0/total_sample))
    ...
    

    部分打印结果如下:

    ...
    ('pos_num: ', 0)
    black_ratio: 1899/9999 = 18.9919%
    ('pos_num: ', 17474)
    black_ratio: 1899/10000 = 18.9900%
    
    Epoch: [11 | 600] LR: 0.001000
    ('pos_num: ', 4287)
    black_ratio: 1899/10001 = 18.9881%
    ('pos_num: ', 26144)
    black_ratio: 1899/10002 = 18.9862%
    ...
    

    也就是说,训练时,有接近1/5的裁剪是完全是无效的裁剪,请问您这样设置是故意为了做augmentation吗? 期待您的解释@whai362

    opened by YanShuang17 10
  • python3.6的跑不起来....

    python3.6的跑不起来....

    先是 include/pybind11/detail/common.h:113:10: fatal error: Python.h: No such file or directory #include <Python.h> ^~~~~~~~~~ compilation terminated. make: *** [adaptor.so] Error 1 这个错误 Makefile:10: recipe for target 'adaptor.so' failed

    opened by HongChow 6
  • any way to speed up the progressive scae expansion algorithm?

    any way to speed up the progressive scae expansion algorithm?

    thanks for your great work and I found that as the number of text area increses, the progressive sacle expansion part consumes a lot of time. any advice to speed up the part?

    opened by yyjabiding 6
  • long_size problem

    long_size problem

    To gain good performance, i have to set large long_size(>2200) which lower the efficiency. How could i balance the efficiency and performance ? Please give me some advise.

    opened by luckydog5 4
  • [Question] Will the source code be available ?

    [Question] Will the source code be available ?

    Hello sir @whai362 This repository is 5 months old and even without any source code inside it still get tons of stars. So when will you update the source code ?

    opened by gachiemchiep 4
  • In file ‘icdar2015_loader.py’      ri = 1 −(1 − m) × (n − i)/(n − 1)

    In file ‘icdar2015_loader.py’ ri = 1 −(1 − m) × (n − i)/(n − 1)

    作者,您好! 首先感谢您的巨大工作,和贡献代码,我在阅读您的代码时发现icdar2015_loader.py文件中 rate = 1.0 - (1.0 - self.min_scale) / (self.kernel_num - 1) * i 这句代码和论文描述的公式不一样。

    您能给一些解释吗?谢谢!为什么不一样?ri = 1 −(1 − m) × (n − i)/(n − 1) ;

    opened by zgsxwsdxg 3
  • TypeError: 'module' object is not callable

    TypeError: 'module' object is not callable

    Traceback (most recent call last): File "/home/yan/disk/All_python_Text/PSENet-master/test_ic15.py", line 220, in test(args) File "/home/yan/disk/All_python_Text/PSENet-master/test_ic15.py", line 154, in test pred = pse(kernels, args.min_kernel_area / (args.scale * args.scale)) File "/home/yan/disk/All_python_Text/PSENet-master/pse/init.py", line 14, in pse ret = np.array(cpse(polys, min_area), dtype='int32') TypeError: 'module' object is not callable

    我在测试的时候就出现这个问题了,能帮帮我吗?

    opened by jun214384468 3
  • Segmentation fault

    Segmentation fault

    hello! I run tne test_ic15.py with C++ pse cversion: CUDA_VISIBLE_DEVICES='5' python test_ic15.py --scale 1 --resume /home/ltm/PSENet-master/ic15_res50.pth

    then: make: Entering directory /home/ltm/PSENet-master/PSENet-master/pse' make:adaptor.so' is up to date. make: Leaving directory `/home/ltm/PSENet-master/PSENet-master/pse' Loading model and optimizer from checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' Loaded checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' (epoch 600) progress: 0 / 500 /usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:2351: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Segmentation fault

    besides,the python pse version is ok

    please help, thanks

    opened by ltm920716 3
  • 关于训练和预测时kernels的获取方式不同的问题

    关于训练和预测时kernels的获取方式不同的问题

    作者您好,感谢您的工作,这里有个问题想请问一下:

    请问为什么在训练时用的是kernels = outputs[:, 1:, :, :],具体体现在train_ic15.py 的122行,

    而测试时用的是kernels = outputs[:, 0:args.kernel_num, :, :] * text,具体体现在test_ic15.py的132行,

    那岂不是测试时outputs的最后一个channel完全没有用到?

    opened by KosukeHao 2
  • _pickle.UnpicklingError: invalid load key,  \xef .

    _pickle.UnpicklingError: invalid load key, \xef .

    配置好框架之后 train.py训练 报这个错 请问大家 我是哪里还没配好吗 File "train.py", line 213, in main(args) File "train.py", line 164, in main model = build_model(cfg.model) File "/home/liyanyan/PSE_origial/models/builder.py", line 11, in build_model model = models.dictcfg.type File "/home/liyanyan/PSE_origial/models/psenet.py", line 20, in init self.backbone = build_backbone(backbone) File "/home/liyanyan/PSE_origial/models/backbone/builder.py", line 11, in build_backbone backbone = models.backbone.dictcfg.type File "/home/liyanyan/PSE_origial/models/backbone/resnet.py", line 210, in resnet50 model.load_state_dict(load_url(model_urls['resnet50']), strict=False) File "/home/liyanyan/PSE_origial/models/backbone/resnet.py", line 234, in load_url return torch.load(cached_file, map_location=map_location) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '\xef'.

    opened by Captainlululu 0
  • 验证精度始终为0

    验证精度始终为0

    我按照readme文件中的指令训练完成后使用script.py文件进行精度验证,发现精度始终为0 Calculated!{"precision": 0.0, "recall": 0.0, "hmean": 0, "AP": 0} 查看网络输出后发现输出的res_img_xx.txt文件内,只有坐标,没有transcription 而计算精度时又需要用到transcription,所以精度一直为0.请问怎么解决这个问题?

    opened by 1826133674 1
  • 这里结果不对吧? kernels大于0判断,为啥去大于0的,这里的kernels阈值怎么会是0呢

    这里结果不对吧? kernels大于0判断,为啥去大于0的,这里的kernels阈值怎么会是0呢

    score = torch.sigmoid(out[:, 0, :, :]) # out = (torch.sign(out - 1) + 1) / 2 # 0 1 # # text_mask = out[:, 0, :, :] # kernels = out[:, 1:cfg.test_cfg.kernel_num, :, :] * text_mask

        kernels = out[:, :cfg.test_cfg.kernel_num, :, :] > 0
        text_mask = kernels[:, :1, :, :]
        kernels[:, 1:, :, :] = kernels[:, 1:, :, :] * text_mask
    
    opened by cqray1990 1
Owner
null
Shape Detection - It's a shape detection project with OpenCV and Python.

Shape Detection It's a shape detection project with OpenCV and Python. Setup pip install opencv-python for doing AI things. pip install simpleaudio fo

null 1 Nov 26, 2022
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

null 123 Dec 25, 2022
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022