PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.



#f03c15 Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at

[1] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao. Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 9336–9345, 2019.
[2] W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comp. Vis., pages 8440–8449, 2019.
[3] Paper is in preparation.

Shape Robust Text Detection with Progressive Scale Expansion Network


  • Python 2.7
  • PyTorch v0.4.1+
  • pyclipper
  • Polygon2
  • OpenCV 3.4 (for c++ version pse)
  • opencv-python 3.4


Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.




CUDA_VISIBLE_DEVICES=0 python --scale 1 --resume [path of model]

Eval script for ICDAR 2015 and SCUT-CTW1500

cd eval

Performance (new version paper)

ICDAR 2015

Method Extra Data Precision (%) Recall (%) F-measure (%) FPS (1080Ti) Model
PSENet-1s (ResNet50) - 81.49 79.68 80.57 1.6 baiduyun(extract code: rxti); OneDrive
PSENet-1s (ResNet50) pretrain on IC17 MLT 86.92 84.5 85.69 1.6 baiduyun(extract code: aieo); OneDrive
PSENet-4s (ResNet50) pretrain on IC17 MLT 86.1 83.77 84.92 3.8 baiduyun(extract code: aieo); OneDrive


Method Extra Data Precision (%) Recall (%) F-measure (%) FPS (1080Ti) Model
PSENet-1s (ResNet50) - 80.57 75.55 78.0 3.9 baiduyun(extract code: ksv7); OneDrive
PSENet-1s (ResNet50) pretrain on IC17 MLT 84.84 79.73 82.2 3.9 baiduyun(extract code: z7ac); OneDrive
PSENet-4s (ResNet50) pretrain on IC17 MLT 82.09 77.84 79.9 8.4 baiduyun(extract code: z7ac); OneDrive

Performance (old version paper)

ICDAR 2015 (training with ICDAR 2017 MLT)

Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 87.98 83.87 85.88
PSENet-2s (ResNet152) 89.30 85.22 87.21
PSENet-1s (ResNet152) 88.71 85.51 87.08


Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 75.98 67.56 71.52
PSENet-2s (ResNet152) 76.97 68.35 72.40
PSENet-1s (ResNet152) 77.01 68.40 72.45


Method Precision (%) Recall (%) F-measure (%)
PSENet-4s (ResNet152) 80.49 78.13 79.29
PSENet-2s (ResNet152) 81.95 79.30 80.60
PSENet-1s (ResNet152) 82.50 79.89 81.17

ICPR MTWI 2018 Challenge 2

Method Precision (%) Recall (%) F-measure (%)
PSENet-1s (ResNet152) 78.5 72.1 75.2


Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT-CTW1500

Paper Link

[new version paper]

[old version paper]

Other Implements

[tensorflow version (thanks @liuheng92)]


  title={Shape Robust Text Detection With Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  about random_crop()

    about random_crop()

    作者您好,我发现random_crop()函数得到的training mask中,很多裁剪结果没有包含任何valid text instances,即中的pos_num的值为0,我用您的训练代码跑了10个epoch,测试部分代码以及结果如下:

    total_sample = 0
    zero_sample = 0
    def ohem_single(score, gt_text, training_mask):
        pos_num = (int)(np.sum(gt_text > 0.5)) - (int)(np.sum((gt_text > 0.5) & (training_mask <= 0.5)))
        print('pos_num: ', pos_num)
        global total_sample, zero_sample
        total_sample += 1
        if pos_num==0:
            zero_sample += 1
        print('black_ratio: {}/{} = {:.4f}%'.format(zero_sample, total_sample, zero_sample*100.0/total_sample))


    ('pos_num: ', 0)
    black_ratio: 1899/9999 = 18.9919%
    ('pos_num: ', 17474)
    black_ratio: 1899/10000 = 18.9900%
    Epoch: [11 | 600] LR: 0.001000
    ('pos_num: ', 4287)
    black_ratio: 1899/10001 = 18.9881%
    ('pos_num: ', 26144)
    black_ratio: 1899/10002 = 18.9862%

    也就是说,训练时,有接近1/5的裁剪是完全是无效的裁剪,请问您这样设置是故意为了做augmentation吗? 期待您的解释@whai362

    opened by YanShuang17 10
  • python3.6的跑不起来....


    先是 include/pybind11/detail/common.h:113:10: fatal error: Python.h: No such file or directory #include <Python.h> ^~~~~~~~~~ compilation terminated. make: *** [] Error 1 这个错误 Makefile:10: recipe for target '' failed

    opened by HongChow 6
  any way to speed up the progressive scae expansion algorithm?

    any way to speed up the progressive scae expansion algorithm?

    thanks for your great work and I found that as the number of text area increses, the progressive sacle expansion part consumes a lot of time. any advice to speed up the part?

    opened by yyjabiding 6
  long_size problem

    long_size problem

    To gain good performance, i have to set large long_size(>2200) which lower the efficiency. How could i balance the efficiency and performance ? Please give me some advise.

    opened by luckydog5 4
  [Question] Will the source code be available ?

    [Question] Will the source code be available ?

    Hello sir @whai362 This repository is 5 months old and even without any source code inside it still get tons of stars. So when will you update the source code ?

    opened by gachiemchiep 4
  In file '' ri = 1 −(1 − m) × (n − i)/(n − 1)

    In file ‘’ ri = 1 −(1 − m) × (n − i)/(n − 1)

    作者,您好! 首先感谢您的巨大工作,和贡献代码,我在阅读您的代码时发现icdar2015_loader.py文件中 rate = 1.0 - (1.0 - self.min_scale) / (self.kernel_num - 1) * i 这句代码和论文描述的公式不一样。

    您能给一些解释吗?谢谢!为什么不一样?ri = 1 −(1 − m) × (n − i)/(n − 1) ;

    opened by zgsxwsdxg 3
  TypeError: 'module' object is not callable

    TypeError: 'module' object is not callable

    Traceback (most recent call last): File "/home/yan/disk/All_python_Text/PSENet-master/", line 220, in test(args) File "/home/yan/disk/All_python_Text/PSENet-master/", line 154, in test pred = pse(kernels, args.min_kernel_area / (args.scale * args.scale)) File "/home/yan/disk/All_python_Text/PSENet-master/pse/", line 14, in pse ret = np.array(cpse(polys, min_area), dtype='int32') TypeError: 'module' object is not callable


    opened by jun214384468 3
  Segmentation fault

    Segmentation fault

    hello! I run tne with C++ pse cversion: CUDA_VISIBLE_DEVICES='5' python --scale 1 --resume /home/ltm/PSENet-master/ic15_res50.pth

    then: make: Entering directory /home/ltm/PSENet-master/PSENet-master/pse'' is up to date. make: Leaving directory `/home/ltm/PSENet-master/PSENet-master/pse' Loading model and optimizer from checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' Loaded checkpoint '/home/ltm/PSENet-master/ic15_res50.pth' (epoch 600) progress: 0 / 500 /usr/local/python3/lib/python3.6/site-packages/torch/nn/ UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /usr/local/python3/lib/python3.6/site-packages/torch/nn/ UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Segmentation fault

    besides,the python pse version is ok

    please help, thanks

    opened by ltm920716 3
  关于训练和预测时kernels的获取方式不同的问题



    请问为什么在训练时用的是kernels = outputs[:, 1:, :, :],具体体现在 的122行,

    而测试时用的是kernels = outputs[:, 0:args.kernel_num, :, :] * text,具体体现在test_ic15.py的132行,


    opened by KosukeHao 2
  _pickle.UnpicklingError: invalid load key, \xef .

    _pickle.UnpicklingError: invalid load key, \xef .

    配置好框架之后 train.py训练 报这个错 请问大家 我是哪里还没配好吗 File "", line 213, in main(args) File "", line 164, in main model = build_model(cfg.model) File "/home/liyanyan/PSE_origial/models/", line 11, in build_model model = models.dictcfg.type File "/home/liyanyan/PSE_origial/models/", line 20, in init self.backbone = build_backbone(backbone) File "/home/liyanyan/PSE_origial/models/backbone/", line 11, in build_backbone backbone = models.backbone.dictcfg.type File "/home/liyanyan/PSE_origial/models/backbone/", line 210, in resnet50 model.load_state_dict(load_url(model_urls['resnet50']), strict=False) File "/home/liyanyan/PSE_origial/models/backbone/", line 234, in load_url return torch.load(cached_file, map_location=map_location) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/liyanyan/.conda/envs/PSE-env/lib/python3.7/site-packages/torch/", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '\xef'.

    opened by Captainlululu 0
  验证精度始终为0


    我按照readme文件中的指令训练完成后使用script.py文件进行精度验证,发现精度始终为0 Calculated!{"precision": 0.0, "recall": 0.0, "hmean": 0, "AP": 0} 查看网络输出后发现输出的res_img_xx.txt文件内,只有坐标,没有transcription 而计算精度时又需要用到transcription,所以精度一直为0.请问怎么解决这个问题?

    opened by 1826133674 1
  这里结果不对吧? kernels大于0判断,为啥去大于0的,这里的kernels阈值怎么会是0呢

    这里结果不对吧? kernels大于0判断,为啥去大于0的,这里的kernels阈值怎么会是0呢

    score = torch.sigmoid(out[:, 0, :, :]) # out = (torch.sign(out - 1) + 1) / 2 # 0 1 # # text_mask = out[:, 0, :, :] # kernels = out[:, 1:cfg.test_cfg.kernel_num, :, :] * text_mask

        kernels = out[:, :cfg.test_cfg.kernel_num, :, :] > 0
        text_mask = kernels[:, :1, :, :]
        kernels[:, 1:, :, :] = kernels[:, 1:, :, :] * text_mask
    opened by cqray1990 1
