PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Last update: Dec 24, 2022

Related tags

Computer Vision PSENet

Overview

News

Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch.

[1] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9336–9345, 2019.
[2] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comp. Vis., pages 8440–8449, 2019.
[3] Paper is in preparation.

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Python 2.7
PyTorch v0.4.1+
pyclipper
Polygon2
OpenCV 3.4 (for c++ version pse)
opencv-python 3.4

Introduction

Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_ic15.py

Testing

CUDA_VISIBLE_DEVICES=0 python test_ic15.py --scale 1 --resume [path of model]

Eval script for ICDAR 2015 and SCUT-CTW1500

cd eval
sh eval_ic15.sh
sh eval_ctw1500.sh

Performance (new version paper)

ICDAR 2015

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	81.49	79.68	80.57	1.6	baiduyun(extract code: rxti); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	86.92	84.5	85.69	1.6	baiduyun(extract code: aieo); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	86.1	83.77	84.92	3.8	baiduyun(extract code: aieo); OneDrive

SCUT-CTW1500

Method	Extra Data	Precision (%)	Recall (%)	F-measure (%)	FPS (1080Ti)	Model
PSENet-1s (ResNet50)	-	80.57	75.55	78.0	3.9	baiduyun(extract code: ksv7); OneDrive
PSENet-1s (ResNet50)	pretrain on IC17 MLT	84.84	79.73	82.2	3.9	baiduyun(extract code: z7ac); OneDrive
PSENet-4s (ResNet50)	pretrain on IC17 MLT	82.09	77.84	79.9	8.4	baiduyun(extract code: z7ac); OneDrive

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	87.98	83.87	85.88
PSENet-2s (ResNet152)	89.30	85.22	87.21
PSENet-1s (ResNet152)	88.71	85.51	87.08

ICDAR 2017 MLT

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	75.98	67.56	71.52
PSENet-2s (ResNet152)	76.97	68.35	72.40
PSENet-1s (ResNet152)	77.01	68.40	72.45

SCUT-CTW1500

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-4s (ResNet152)	80.49	78.13	79.29
PSENet-2s (ResNet152)	81.95	79.30	80.60
PSENet-1s (ResNet152)	82.50	79.89	81.17

ICPR MTWI 2018 Challenge 2

Method	Precision (%)	Recall (%)	F-measure (%)
PSENet-1s (ResNet152)	78.5	72.1	75.2

Results

Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT-CTW1500

Paper Link

[new version paper] https://arxiv.org/abs/1903.12473

[old version paper] https://arxiv.org/abs/1806.02559

Other Implements

[tensorflow version (thanks @liuheng92)] https://github.com/liuheng92/tensorflow_PSENet

Citation

@inproceedings{wang2019shape,
  title={Shape Robust Text Detection With Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}

Comments

about random_crop()

作者您好，我发现random_crop()函数得到的training mask中，很多裁剪结果没有包含任何valid text instances，即train_ic15.py/ohem_single()中的pos_num的值为0，我用您的训练代码跑了10个epoch，测试部分代码以及结果如下：

...
total_sample = 0
zero_sample = 0
def ohem_single(score, gt_text, training_mask):
    pos_num = (int)(np.sum(gt_text > 0.5)) - (int)(np.sum((gt_text > 0.5) & (training_mask <= 0.5)))
    print('pos_num: ', pos_num)
    global total_sample, zero_sample
    total_sample += 1
    if pos_num==0:
        zero_sample += 1
    print('black_ratio: {}/{} = {:.4f}%'.format(zero_sample, total_sample, zero_sample*100.0/total_sample))
...

部分打印结果如下:

...
('pos_num: ', 0)
black_ratio: 1899/9999 = 18.9919%
('pos_num: ', 17474)
black_ratio: 1899/10000 = 18.9900%

Epoch: [11 | 600] LR: 0.001000
('pos_num: ', 4287)
black_ratio: 1899/10001 = 18.9881%
('pos_num: ', 26144)
black_ratio: 1899/10002 = 18.9862%
...

也就是说，训练时，有接近1/5的裁剪是完全是无效的裁剪，请问您这样设置是故意为了做augmentation吗？期待您的解释@whai362

opened by YanShuang17 10

python3.6的跑不起来....

先是 include/pybind11/detail/common.h:113:10: fatal error: Python.h: No such file or directory #include <Python.h> ^~~~~~~~~~ compilation terminated. make: *** [adaptor.so] Error 1 这个错误 Makefile:10: recipe for target 'adaptor.so' failed

opened by HongChow 6
any way to speed up the progressive scae expansion algorithm?

thanks for your great work and I found that as the number of text area increses, the progressive sacle expansion part consumes a lot of time. any advice to speed up the part?

opened by yyjabiding 6
long_size problem

To gain good performance, i have to set large long_size(>2200) which lower the efficiency. How could i balance the efficiency and performance ? Please give me some advise.

opened by luckydog5 4
[Question] Will the source code be available ?

Hello sir @whai362 This repository is 5 months old and even without any source code inside it still get tons of stars. So when will you update the source code ?

opened by gachiemchiep 4
In file ‘icdar2015_loader.py’ ri = 1 −(1 − m) × (n − i)/(n − 1)

作者，您好！首先感谢您的巨大工作，和贡献代码，我在阅读您的代码时发现icdar2015_loader.py文件中 rate = 1.0 - (1.0 - self.min_scale) / (self.kernel_num - 1) * i 这句代码和论文描述的公式不一样。

您能给一些解释吗？谢谢！为什么不一样？ri = 1 −(1 − m) × (n − i)/(n − 1) ;

opened by zgsxwsdxg 3
TypeError: 'module' object is not callable

Traceback (most recent call last): File "/home/yan/disk/All_python_Text/PSENet-master/test_ic15.py", line 220, in test(args) File "/home/yan/disk/All_python_Text/PSENet-master/test_ic15.py", line 154, in test pred = pse(kernels, args.min_kernel_area / (args.scale * args.scale)) File "/home/yan/disk/All_python_Text/PSENet-master/pse/init.py", line 14, in pse ret = np.array(cpse(polys, min_area), dtype='int32') TypeError: 'module' object is not callable

我在测试的时候就出现这个问题了,能帮帮我吗?

opened by jun214384468 3
Segmentation fault

hello! I run tne test_ic15.py with C++ pse cversion: CUDA_VISIBLE_DEVICES='5' python test_ic15.py --scale 1 --resume /home/ltm/PSENet-master/ic15_res50.pth

then: make: Entering directory /home/ltm/PSENet-master/PSENet-master/pse' make:adaptor.so' is up to date. make: Leaving directory `/home/ltm/PSENet-master/PSENet-master/pse' Loading model and optimizer from checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' Loaded checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' (epoch 600) progress: 0 / 500 /usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:2351: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Segmentation fault

besides，the python pse version is ok

please help, thanks

opened by ltm920716 3
关于训练和预测时kernels的获取方式不同的问题

作者您好，感谢您的工作，这里有个问题想请问一下：

请问为什么在训练时用的是kernels = outputs[:, 1:, :, :]，具体体现在train_ic15.py 的122行，

而测试时用的是kernels = outputs[:, 0:args.kernel_num, :, :] * text，具体体现在test_ic15.py的132行，

那岂不是测试时outputs的最后一个channel完全没有用到？

opened by KosukeHao 2
$_pickle.UnpicklingError: invalid load key, \xef .$

_pickle.UnpicklingError: invalid load key, \xef .

配置好框架之后 train.py训练报这个错请问大家我是哪里还没配好吗 File "train.py", line 213, in main(args) File "train.py", line 164, in main model = build_model(cfg.model) File "/home/liyanyan/PSE_origial/models/builder.py", line 11, in build_model model = models.dictcfg.type File "/home/liyanyan/PSE_origial/models/psenet.py", line 20, in init self.backbone = build_backbone(backbone) File "/home/liyanyan/PSE_origial/models/backbone/builder.py", line 11, in build_backbone backbone = models.backbone.dictcfg.type File "/home/liyanyan/PSE_origial/models/backbone/resnet.py", line 210, in resnet50 model.load_state_dict(load_url(model_urls['resnet50']), strict=False) File "/home/liyanyan/PSE_origial/models/backbone/resnet.py", line 234, in load_url return torch.load(cached_file, map_location=map_location) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '\xef'.

opened by Captainlululu 0
验证精度始终为0

我按照readme文件中的指令训练完成后使用script.py文件进行精度验证，发现精度始终为0 Calculated!{"precision": 0.0, "recall": 0.0, "hmean": 0, "AP": 0} 查看网络输出后发现输出的res_img_xx.txt文件内，只有坐标，没有transcription 而计算精度时又需要用到transcription，所以精度一直为0.请问怎么解决这个问题？

opened by 1826133674 1
这里结果不对吧? kernels大于0判断,为啥去大于0的,这里的kernels阈值怎么会是0呢
score = torch.sigmoid(out[:, 0, :, :]) # out = (torch.sign(out - 1) + 1) / 2 # 0 1 # # text_mask = out[:, 0, :, :] # kernels = out[:, 1:cfg.test_cfg.kernel_num, :, :] * text_mask

kernels = out[:, :cfg.test_cfg.kernel_num, :, :] > 0 text_mask = kernels[:, :1, :, :] kernels[:, 1:, :, :] = kernels[:, 1:, :, :] * text_mask
opened by cqray1990 1

PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

Related tags

Overview

News

Shape Robust Text Detection with Progressive Scale Expansion Network

Requirements

Introduction

Training

Testing

Eval script for ICDAR 2015 and SCUT-CTW1500

Performance (new version paper)

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Results

Paper Link

Other Implements

Citation

Comments

Owner

Shape Detection - It's a shape detection project with OpenCV and Python.

Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

A novel region proposal network for more general object detection ( including scene text detection ).

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition