keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

Overview

keras-ctpn

[TOC]

  1. 说明
  2. 预测
  3. 训练
  4. 例子
    4.1 ICDAR2015
    4.1.1 带侧边细化
    4.1.2 不带带侧边细化
    4.1.3 做数据增广-水平翻转
    4.2 ICDAR2017
    4.3 其它数据集
  5. toDoList
  6. 总结

说明

​ 本工程是keras实现的CPTN: Detecting Text in Natural Image with Connectionist Text Proposal Network . 本工程实现主要参考了keras-faster-rcnn ; 并在ICDAR2015和ICDAR2017数据集上训练和测试。

​ 工程地址: keras-ctpn

​ cptn论文翻译:CTPN.md

效果

​ 使用ICDAR2015的1000张图像训练在500张测试集上结果为:Recall: 37.07 % Precision: 42.94 % Hmean: 39.79 %; 原文中的F值为61%;使用了额外的3000张图像训练。

关键点说明:

a.骨干网络使用的是resnet50

b.训练输入图像大小为720*720; 将图像的长边缩放到720,保持长宽比,短边padding;原文是短边600;预测时使用1024*1024

c.batch_size为4, 每张图像训练128个anchor,正负样本比为1:1;

d.分类、边框回归以及侧边细化的损失函数权重为1:1:1;原论文中是1:1:2

e.侧边细化与边框回归选择一样的正样本anchor;原文中应该是分开选择的

f.侧边细化还是有效果的(注:网上很多人说没有啥效果)

g.由于有双向GRU,水平翻转会影响效果(见样例做数据增广-水平翻转)

h.随机裁剪做数据增广,网络不收敛

预测

a. 工程下载

git clone https://github.com/yizt/keras-ctpn

b. 预训练模型下载

​ ICDAR2015训练集上训练好的模型下载地址: google drive百度云盘 取码:wm47

c.修改配置类config.py中如下属性

	WEIGHT_PATH = '/tmp/ctpn.h5'

d. 检测文本

python predict.py --image_path image_3.jpg

评估

a. 执行如下命令,并将输出的txt压缩为zip包

python evaluate.py --weight_path /tmp/ctpn.100.h5 --image_dir /opt/dataset/OCR/ICDAR_2015/test_images/ --output_dir /tmp/output_2015/

b. 提交在线评估 将压缩的zip包提交评估,评估地址:http://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1

训练

a. 训练数据下载

#icdar2013
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task12_Images.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Training_Task1_GT.zip
wget http://rrc.cvc.uab.es/downloads/Challenge2_Test_Task12_Images.zip
#icdar2015
wget http://rrc.cvc.uab.es/downloads/ch4_training_images.zip
wget http://rrc.cvc.uab.es/downloads/ch4_training_localization_transcription_gt.zip
wget http://rrc.cvc.uab.es/downloads/ch4_test_images.zip
#icdar2017
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_images_1~8.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_training_localization_transcription_gt_v2.zip
wget -c -t 0 http://datasets.cvc.uab.es/rrc/ch8_test_images.zip

b. resnet50与训练模型下载

wget https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5

c. 修改配置类config.py中,如下属性

	# 预训练模型
    PRE_TRAINED_WEIGHT = '/opt/pretrained_model/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

    # 数据集路径
    IMAGE_DIR = '/opt/dataset/OCR/ICDAR_2015/train_images'
    IMAGE_GT_DIR = '/opt/dataset/OCR/ICDAR_2015/train_gt'

d.训练

python train.py --epochs 50

例子

ICDAR2015

带侧边细化

不带侧边细化

做数据增广-水平翻转

ICDAR2017

其它数据集

toDoList

  1. 侧边细化(已完成)
  2. ICDAR2017数据集训练(已完成)
  3. 检测文本行坐标映射到原图(已完成)
  4. 精度评估(已完成)
  5. 侧边回归,限制在边框内(已完成)
  6. 增加水平翻转(已完成)
  7. 增加随机裁剪(已完成)

总结

  1. ctpn对水平文字检测效果不错
  2. 整个网络对于数据集很敏感;在2017上训练的模型到2015上测试效果很不好;同样2015训练的在2013上测试效果也很差
  3. 推测由于双向GRU,网络有存储记忆的缘故?在使用随机裁剪作数据增广时网络不收敛,使用水平翻转时预测结果也水平对称出现
Comments
  • target.py正样本问题?不同的gt选择同一个anchor?

    target.py正样本问题?不同的gt选择同一个anchor?

    合并两部分正样本索引

    positive_bool_matrix = tf.logical_or(gt_iou_max_bool, anchors_iou_max_bool)
    

    感觉按照target.py文件前面的规则,有可能不同的gt选择了同一个anchor;

    而后面的逻辑选择正样本anchor时,每次随机抽取一部分(正样本anchor),其中会不会多次出现同一个anchor选择了不同的gt? 或者不同epoch中,同一张图片里面,相同索引的anchor选择了不同的gt?

    opened by chenying99 2
  • text_proposals.py文件apply_regress函数的侧边精调代码是不是有问题?第38行

    text_proposals.py文件apply_regress函数的侧边精调代码是不是有问题?第38行

    text_proposals.py文件apply_regress函数第38行代码 cx += dx * w

    而target.py文件中side_regress_target函数中代码(第83行),dx计算方式为: dx = (gt_center_x - center_x) * 2 / w

    貌似 不能这么计算吧? cx += dx * w

    后面有cx + w * 0.5, cx - w * 0.5 ;
    这里w是anchor_box的width,也不是预测box的width,是不是有问题?

    opened by chenying99 2
  • 看源码时有一个疑问

    看源码时有一个疑问

    def smooth_l1_loss(y_true, y_predict, sigma2=9.0):
        abs_diff = tf.abs(y_true - y_predict, name='abs_diff')
        loss = tf.where(tf.less(abs_diff, 1. / sigma2), 0.5 * sigma2 * tf.pow(abs_diff, 2), abs_diff - 0.5 / sigma2)
        return loss
    

    在源码中有一个关于smooth l1的loss的函数,我网上查询到的smooth l1的定义是 image 当函数sigma2=1的时候和网上定义的是一致的, 但是我看大佬这里用的是9.0的默认值,说实话以我的水平第一眼觉得是不是应该是0.9? 想请问大佬这里sigma2的默认值为什么要设置成9.0,有什么用意或者是经过测试这个数字比较好吗

    opened by kaixinbaba 2
  • deltas维度错误

    deltas维度错误

    https://github.com/yizt/keras-ctpn/blob/6b962f833bc3abc6e2b19aafa288738130c6b735/ctpn/layers/losses.py#L68

    应为 deltas = tf.gather_nd(deltas[..., :-1], positive_indices),deltas为3维

    opened by RabbearSu 2
  • 标志位问题

    标志位问题

    https://github.com/yizt/keras-ctpn/blob/6b962f833bc3abc6e2b19aafa288738130c6b735/ctpn/layers/losses.py#L17

    true_cls_ids和indices的标志位说明是否错误?在target.py文件中,cls_ids的正负样本tag均为1,而padding样本为0;indices的tag生成为,正样本为1,负样本为-1,padding样本为0。 请查看,谢谢!

    opened by RabbearSu 2
  • 关于多分类的问题

    关于多分类的问题

    代码中和训练有关的代码反复看了好几遍,算大致搞懂了流程,想基于这个模型做点调整,代码中文本分类只有两类,文字和背景,我想修改成文字多分类,比如有10类,维度什么就跟着调整了下从原来的2 变成了11,但是训练开始没多久loss就开始无限增大了,估计是和修改的地方有关系 。自己折腾了很久也没看出哪里不对。。大佬有什么建议么

    有个疑问 一开始的这个Input 参数2 代表的是2个分类还是第一索引是分类,第二索引是padding?

    gt_class_ids = Input(shape=(config.MAX_GT_INSTANCES, 2), name='gt_class_ids')
    
    opened by kaixinbaba 1
  • 正负样本编号问题

    正负样本编号问题

    https://github.com/yizt/keras-ctpn/blob/9427448873ec187eb35379ce382a5529ecc5d84f/ctpn/layers/target.py#L161 为什么正样本要在原始的anchor_indices中(tf.gather),而negative的indices不需要变换?

    opened by RabbearSu 1
  • Change config.py file

    Change config.py file

    Thanks for your sharing! But I'm very confusing when change TRAIN_ANCHORS_PER_IMAGE and TEXT_PROPOSALS_MAX_NUM. How can I decide exactly what it is with my dataset?

    opened by Andiez-Nguyen 0
Owner
mick.yi
keyword:数据挖掘,深度学习,计算机视觉
mick.yi
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 4, 2022
governance proposal to make fei redeemable for eth

Feil Proposal ?? Abstract Migrate all ETH from Fei protocol-controlled value into Yearn ETH Vault. Allow redemptions of outstanding FEI for yvETH. At

null 13 Mar 31, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 1, 2021
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

null 1 Nov 8, 2021
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 3, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022
Recognizing cropped text in natural images.

ASTER: Attentional Scene Text Recognizer with Flexible Rectification ASTER is an accurate scene text recognizer with flexible rectification mechanism.

Baoguang Shi 681 Jan 2, 2023
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022