CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras

Overview

简介

基于Tensorflow和Keras实现端到端的不定长中文字符检测和识别

  • 文本检测:CTPN
  • 文本识别:DenseNet + CTC

环境部署

sh setup.sh
  • 注:CPU环境执行前需注释掉for gpu部分,并解开for cpu部分的注释

Demo

将测试图片放入test_images目录,检测结果会保存到test_result中

python demo.py

模型训练

CTPN训练

详见ctpn/README.md

DenseNet + CTC训练

1. 数据准备

数据集:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码:lu7m)

  • 共约364万张图片,按照99:1划分成训练集和验证集
  • 数据利用中文语料库(新闻 + 文言文),通过字体、大小、灰度、模糊、透视、拉伸等变化随机生成
  • 包含汉字、英文字母、数字和标点共5990个字符
  • 每个样本固定10个字符,字符随机截取自语料库中的句子
  • 图片分辨率统一为280x32

图片解压后放置到train/images目录下,描述文件放到train目录下

2. 训练

cd train
python train.py

3. 结果

val acc predict model
0.983 8ms 18.9MB
  • GPU: GTX TITAN X
  • Keras Backend: Tensorflow

4. 生成自己的样本

可参考SynthText_Chinese_versionTextRecognitionDataGeneratortext_renderer

效果展示

参考

[1] https://github.com/eragonruan/text-detection-ctpn

[2] https://github.com/senlinuc/caffe_ocr

[3] https://github.com/chineseocr/chinese-ocr

[4] https://github.com/xiaomaxiao/keras_ocr

Comments
  • 自己训练的模型识别结果是乱码

    自己训练的模型识别结果是乱码

    请问楼主,我采用您的代码,自己做了几个数据集,但是每次都是乱码,请问这是怎么回事呢,应该怎么解决呀 Recognition Result:

    蝇俛蝇俛蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇 蝇蝇蝇蝇蝇健蝇蝇蝇蝇蝇拽蝇蝇蝇蝇蝇蝇蝇健蝇蝇蝇健蝇蝇蝇蝇蝇蝇 蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇蝇膜蝇蝇蝇蝇蝇蝇蝇

    opened by daoyijushi 26
  • 自己训练出的模型直接使用会报错?

    自己训练出的模型直接使用会报错?

    自己训练出的模型是包括ctc和loss参数的,直接在densenet/models下使用会报错: ValueError: Dimension 1 in both shapes must be equal, but are 5990 and 17765 for 'Assign_159' (op: 'Assign') with input shapes: [768,5990], [768,17765].

    求大神指导怎么修改模型?

    opened by juventi 13
  • Exception when training with SynthText data

    Exception when training with SynthText data

    Hi All,

    I have tried training SynthText data and i observe that after some epochs its getting in to below exception.

    tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at ctc_loss_op.cc:168 : Invalid argument: All labels must be nonnegative integers, batch: 94 labels: -1 Traceback (most recent call last): File "train.py", line 181, in callbacks = [checkpoint, earlystop, changelr, tensorboard]) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call return self._call(inputs) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1439, in call run_metadata_ptr) File "/home/ubuntu/srijith/dl/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: All labels must be nonnegative integers, batch: 94 labels: -1 [[{{node ctc/CTCLoss}} = CTCLoss[_class=["loc:@training/Adam/gradients/ctc/CTCLoss_grad/mul"], ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ctc/Log, ctc/ToInt64, ctc/ToInt32_2, ctc/ToInt32_1)]]

    Thank You, Srijith

    opened by srijiths 12
  • 请问训练代码里的 basemodel 是什么作用?

    请问训练代码里的 basemodel 是什么作用?

    请问训练代码里的 basemodel 是什么作用? 我看代码在 loadweights 之后,没有后续的操作,请问这个 basemodel 是干嘛的呢? 另外,我想在作者提供的 weights 基础之上,添加训练集,继续训练,应该怎么训练呢?

    modelPath = './models/pretrain_model/keras.h5' if os.path.exists(modelPath): print("Loading model weights...") basemodel.load_weights(modelPath) print('done!')

    opened by jewelcai 11
  • 训练出现问题

    训练出现问题

    Not enough time for target transition sequence (required: 38, available: 35)58You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

    不知道是不是因为数据集的原因,我的数据集是不定字长的,然后resize成了280*32,然后运行train.py的时候出错了。

    opened by Lesley96-11 8
  • ctpn安装错误

    ctpn安装错误

    ++ python setup.py build_ext --inplace running build_ext skipping 'bbox.c' Cython extension (up-to-date) building 'utils.bbox' extension creating build creating build/temp.linux-x86_64-3.6 {'gcc': ['-Wno-cpp', '-Wno-unused-function']} gcc -pthread -B /home/test/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/test/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/home/test/anaconda3/include/python3.6m -c bbox.c -o build/temp.linux-x86_64-3.6/bbox.o -Wno-cpp -Wno-unused-function creating /home/test/orc_gc/chinese_ocr-master/ctpn/lib/utils/utils gcc -pthread -shared -B /home/test/anaconda3/compiler_compat -L/home/test/anaconda3/lib -Wl,-rpath=/home/test/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/bbox.o -o /home/test/orc_gc/chinese_ocr-master/ctpn/lib/utils/utils/bbox.cpython-36m-x86_64-linux-gnu.so /home/test/anaconda3/compiler_compat/ld: cannot find -lpthread /home/test/anaconda3/compiler_compat/ld: cannot find -lc collect2: error: ld returned 1 exit status error: command 'gcc' failed with exit status 1

    请问我是缺少了什么必要的库吗

    opened by JanzZhu 7
  • 环境部署问题

    环境部署问题

    windows下用cygwin运行setup.sh报出如下错误,请问该怎么解决呢?

    Traceback (most recent call last): File "setup_cpu.py", line 57, in cmdclass={'build_ext': custom_build_ext}, File "D:\Program Files\Python\lib\distutils\core.py", line 148, in setup dist.run_commands() File "D:\Program Files\Python\lib\distutils\dist.py", line 955, in run_commands self.run_command(cmd) File "D:\Program Files\Python\lib\distutils\dist.py", line 974, in run_command cmd_obj.run() File "D:\Program Files\Python\lib\site-packages\Cython\Distutils\build_ext.py", line 164, in run _build_ext.build_ext.run(self) File "D:\Program Files\Python\lib\distutils\command\build_ext.py", line 339, in run self.build_extensions() File "setup_cpu.py", line 37, in build_extensions customize_compiler_for_nvcc(self.compiler) File "setup_cpu.py", line 23, in customize_compiler_for_nvcc default_compiler_so = self.compiler_so AttributeError: 'MSVCCompiler' object has no attribute 'compiler_so' mv: 无法获取'utils/*' 的文件状态(stat): No such file or directory

    opened by darknessp 6
  • 用作者已经训练好的.h5模型作为预训练模型,加载不进去是为什么?

    用作者已经训练好的.h5模型作为预训练模型,加载不进去是为什么?

    train.py 147-151行 modelPath = './models/pretrain_model/keras.h5' if os.path.exists(modelPath): print("Loading model weights...") basemodel.load_weights(modelPath) print('done!')

    load_weight 加载不进去啊 ,虽然输出了done,训练的时候acc还是从0开始

    opened by JJoohhnnn 5
  • 用自己训练好的模型运行时报错?大神帮忙看一下

    用自己训练好的模型运行时报错?大神帮忙看一下

    ValueError: Dimension 1 in both shapes must be equal, but are 5988 and 5987 for 'Assign_159' (op: 'Assign') with input shapes: [768,5988], [768,5987] 算上最开始的空格,我的char_std_5990.txt 一共5987个字符,请问该怎么改啊?

    opened by JJoohhnnn 5
  • 关于训练后模型不能用问题

    关于训练后模型不能用问题

    我用46*120进行训练得到的h5文件 在model加载时 报错 ValueError: Dimension 0 in both shapes must be equal, but are 768 and 960. Shapes are [768,5990] and [960,5990]. for 'Assign_159' (op: 'Assign') with input shapes: [768,5990], [960,5990].

    怎么解决啊 @YCG09

    opened by cellphonef 4
  • 请问有谁碰到这个错误么Labels length is zero in batch 8[[{{node ctc/CTCLoss}}]]

    请问有谁碰到这个错误么Labels length is zero in batch 8[[{{node ctc/CTCLoss}}]]

    训练时报错 tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 8 [[{{node ctc/CTCLoss}}]]

    发现batchsize设置很小的时候(比如说2)可以正常运行,但是batchsize设置为32,64或者更大的时候就会报这个错,图片和label都能对上,有时可以正常跑完一个epoch但是第二个epoch报这个错...卡了几天了,我的label_length都是10没有是0的情况...求大神指点

    opened by Cocoalate 0
  • How to understand the line-nms?

    How to understand the line-nms?

    I find that the line-nms delete some boxes. It seems a little confusing. The left-top and right-top point are used as left-top and right-bottom point , respectively. The x-coordinate of left-bottom point is used as confidence. It is a little difficult to understand it.

    https://github.com/YCG09/chinese_ocr/blob/9af64905f7e1a226d902feac5c0dfd64399ce6b2/ctpn/lib/text_connector/detectors.py#L42

    opened by lhao0301 0
  • 识别部分的一些使用心得

    识别部分的一些使用心得

    首先很感谢repo主无私的分享,该repo非常经典。在使用该repo过程中走过很多坑以下简要介绍:(1)模型推断阶段BN层未关闭,(2)模型推断时dropout未关闭(但测试了很多,发现未关闭也没有出现影响),(3)模型中图像宽度缩小8倍进入ctc loss计算这一点限制很大,使得你训练使用的图像宽度和图像中的文字序列长度有较好的比例,不然就会导致ctc loss无法计算出现ctc loss 为inf的情况,即模型结构使得图像宽度缩小8倍很坑,(4)模型采用的原始训练数据是很规整的同一32*280的图片,统一的10个字符长度,但如果我们拿自己的图像(或者开源的数据)是不定长图像宽度以及字符长度,所以为了保证我们同一个batch中图像的宽度是一致的,我们有2种策略选择:一是直接统一resize成同一高和宽(一定要确保宽度满足条件:宽//8>=字符长度+连续字符重复总个数);二是在每个batch中以最大宽度为基准进行pad。这两种策略我测试了在一些图像宽度变化较大,仿射变换严重的数据集上都不太好,首先策略1直接resize会造成图像中文字发生变形,其次策略2在每个batch中以最大宽度为基准进行pad会造成原始图像(特别是一些图与具有最大宽度的图之间宽度相差较大)出现较多非文字空白区域,使得模型难以识别这样的图。 综上:如果想要在宽度一致性较差,宽高比例不好以及图像倾斜度较大的数据集上进行训练建议放弃该repo,选择其他的repo(比如crnn的)

    opened by shining-love 5
Owner
Yang Chenguang
Yang Chenguang
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jerod Weinman 489 Dec 21, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

English | 简体中文 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 3, 2023
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

null 646 Nov 11, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 7, 2023
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 1, 2023
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 3, 2023
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 9, 2022