keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

mick.yi

Last update: Jan 9, 2023

Related tags

Overview

keras-ctpn

[TOC]

说明
预测
训练
例子
4.1 ICDAR2015
4.1.1 带侧边细化
4.1.2 不带带侧边细化
4.1.3 做数据增广-水平翻转
4.2 ICDAR2017
4.3 其它数据集
toDoList
总结

说明

本工程是keras实现的CPTN: Detecting Text in Natural Image with Connectionist Text Proposal Network . 本工程实现主要参考了keras-faster-rcnn ; 并在ICDAR2015和ICDAR2017数据集上训练和测试。

工程地址: keras-ctpn

cptn论文翻译:CTPN.md

效果：

使用ICDAR2015的1000张图像训练在500张测试集上结果为：Recall: 37.07 % Precision: 42.94 % Hmean: 39.79 %; 原文中的F值为61%；使用了额外的3000张图像训练。

关键点说明:

a.骨干网络使用的是resnet50

b.训练输入图像大小为720*720; 将图像的长边缩放到720,保持长宽比,短边padding;原文是短边600;预测时使用1024*1024

c.batch_size为4, 每张图像训练128个anchor,正负样本比为1:1;

d.分类、边框回归以及侧边细化的损失函数权重为1:1:1;原论文中是1:1:2

e.侧边细化与边框回归选择一样的正样本anchor;原文中应该是分开选择的

f.侧边细化还是有效果的(注:网上很多人说没有啥效果)

g.由于有双向GRU，水平翻转会影响效果(见样例做数据增广-水平翻转)

h.随机裁剪做数据增广，网络不收敛

预测

a. 工程下载

git clone https://github.com/yizt/keras-ctpn

b. 预训练模型下载

ICDAR2015训练集上训练好的模型下载地址： google drive，百度云盘取码:wm47

c.修改配置类config.py中如下属性

	WEIGHT_PATH = '/tmp/ctpn.h5'

d. 检测文本

python predict.py --image_path image_3.jpg

评估

a. 执行如下命令,并将输出的txt压缩为zip包

python evaluate.py --weight_path /tmp/ctpn.100.h5 --image_dir /opt/dataset/OCR/ICDAR_2015/test_images/ --output_dir /tmp/output_2015/

b. 提交在线评估将压缩的zip包提交评估，评估地址:http://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1

训练

a. 训练数据下载

#icdar2013
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task12_Images.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task1_GT.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Test_Task12_Images.zip

#icdar2015
wget http://rrc.cvc.uab.es/downloads/ch4_training_images.zip
wget http://rrc.cvc.uab.es/downloads/ch4_training_localization_transcription_gt.zip
wget http://rrc.cvc.uab.es/downloads/ch4_test_images.zip

#icdar2017
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_images_1~8.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_localization_transcription_gt_v2.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_test_images.zip

b. resnet50与训练模型下载

wget https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5

c. 修改配置类config.py中，如下属性

	# 预训练模型
    PRE_TRAINED_WEIGHT = '/opt/pretrained_model/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

    # 数据集路径
    IMAGE_DIR = '/opt/dataset/OCR/ICDAR_2015/train_images'
    IMAGE_GT_DIR = '/opt/dataset/OCR/ICDAR_2015/train_gt'

d.训练

python train.py --epochs 50

例子

ICDAR2015

带侧边细化

不带侧边细化

做数据增广-水平翻转

ICDAR2017

其它数据集

toDoList

侧边细化(已完成)
ICDAR2017数据集训练(已完成)
检测文本行坐标映射到原图(已完成)
精度评估(已完成)
侧边回归,限制在边框内(已完成)
增加水平翻转(已完成)
增加随机裁剪(已完成)

总结

ctpn对水平文字检测效果不错
整个网络对于数据集很敏感;在2017上训练的模型到2015上测试效果很不好；同样2015训练的在2013上测试效果也很差
推测由于双向GRU，网络有存储记忆的缘故？在使用随机裁剪作数据增广时网络不收敛，使用水平翻转时预测结果也水平对称出现

Comments

target.py正样本问题？不同的gt选择同一个anchor?
合并两部分正样本索引

positive_bool_matrix = tf.logical_or(gt_iou_max_bool, anchors_iou_max_bool)

感觉按照target.py文件前面的规则，有可能不同的gt选择了同一个anchor；

而后面的逻辑选择正样本anchor时，每次随机抽取一部分（正样本anchor），其中会不会多次出现同一个anchor选择了不同的gt? 或者不同epoch中，同一张图片里面，相同索引的anchor选择了不同的gt?
opened by chenying99 2
text_proposals.py文件apply_regress函数的侧边精调代码是不是有问题？第38行

text_proposals.py文件apply_regress函数第38行代码 cx += dx * w

而target.py文件中side_regress_target函数中代码（第83行），dx计算方式为： dx = (gt_center_x - center_x) * 2 / w

貌似不能这么计算吧？ cx += dx * w

后面有cx + w * 0.5， cx - w * 0.5 ；
这里w是anchor_box的width，也不是预测box的width，是不是有问题？

opened by chenying99 2
看源码时有一个疑问
def smooth_l1_loss(y_true, y_predict, sigma2=9.0): abs_diff = tf.abs(y_true - y_predict, name='abs_diff') loss = tf.where(tf.less(abs_diff, 1. / sigma2), 0.5 * sigma2 * tf.pow(abs_diff, 2), abs_diff - 0.5 / sigma2) return loss

在源码中有一个关于smooth l1的loss的函数，我网上查询到的smooth l1的定义是当函数sigma2=1的时候和网上定义的是一致的，但是我看大佬这里用的是9.0的默认值，说实话以我的水平第一眼觉得是不是应该是0.9？想请问大佬这里sigma2的默认值为什么要设置成9.0，有什么用意或者是经过测试这个数字比较好吗
opened by kaixinbaba 2
deltas维度错误

https://github.com/yizt/keras-ctpn/blob/6b962f833bc3abc6e2b19aafa288738130c6b735/ctpn/layers/losses.py#L68

应为 deltas = tf.gather_nd(deltas[..., :-1], positive_indices)，deltas为3维

opened by RabbearSu 2
标志位问题

https://github.com/yizt/keras-ctpn/blob/6b962f833bc3abc6e2b19aafa288738130c6b735/ctpn/layers/losses.py#L17

true_cls_ids和indices的标志位说明是否错误？在target.py文件中，cls_ids的正负样本tag均为1，而padding样本为0；indices的tag生成为，正样本为1，负样本为-1，padding样本为0。请查看，谢谢！

opened by RabbearSu 2
关于多分类的问题
代码中和训练有关的代码反复看了好几遍，算大致搞懂了流程，想基于这个模型做点调整，代码中文本分类只有两类，文字和背景，我想修改成文字多分类，比如有10类，维度什么就跟着调整了下从原来的2 变成了11，但是训练开始没多久loss就开始无限增大了，估计是和修改的地方有关系。自己折腾了很久也没看出哪里不对。。大佬有什么建议么

有个疑问一开始的这个Input 参数2 代表的是2个分类还是第一索引是分类，第二索引是padding？

gt_class_ids = Input(shape=(config.MAX_GT_INSTANCES, 2), name='gt_class_ids')
opened by kaixinbaba 1
正负样本编号问题

https://github.com/yizt/keras-ctpn/blob/9427448873ec187eb35379ce382a5529ecc5d84f/ctpn/layers/target.py#L161 为什么正样本要在原始的anchor_indices中(tf.gather)，而negative的indices不需要变换？

opened by RabbearSu 1
Change config.py file

Thanks for your sharing! But I'm very confusing when change TRAIN_ANCHORS_PER_IMAGE and TEXT_PROPOSALS_MAX_NUM. How can I decide exactly what it is with my dataset?

opened by Andiez-Nguyen 0

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Related tags

Overview

keras-ctpn

说明

预测

评估

训练

例子

ICDAR2015

带侧边细化

不带侧边细化

做数据增广-水平翻转

ICDAR2017

其它数据集

toDoList

总结

Comments

合并两部分正样本索引

Owner

mick.yi

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

A novel region proposal network for more general object detection ( including scene text detection ).

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Corner-based Region Proposal Network

governance proposal to make fei redeemable for eth

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Recognizing cropped text in natural images.

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text