text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

Last update: Dec 24, 2022

Related tags

Deep Learning ocr deep-learning text-recognition attention-mechanism ctc scene-text-recognition pytorch-implementation

Overview

text recognition toolbox

1. 项目介绍

该项目是基于pytorch深度学习框架，以统一的改写方式实现了以下6篇经典的文字识别论文，论文的详情如下。该项目会持续进行更新，欢迎大家提出问题以及对代码进行贡献。

模型	论文标题	发表年份	模型方法划分
CRNN	《An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition》	2017	CNN+BiLSTM+CTC
GRCNN	《Gated recurrent convolution neural network for OCR》	2017	Gated Recurrent Convulution Layer + BiSTM + CTC
FAN	《Focusing attention: Towards accurate text recognition in natural images》	2017	focusing network+1D attention
SAR	《Show, attend and read: A simple and strong baseline for irregular text recognition》	2019	ResNet+2D attention
DAN	《Decoupled attention network for text recognition》	2020	FCN+convolutional alignment module
SATRN	《On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention》	2020	Transformer

2. 如何使用

2.1 环境要求

torch==1.3.0
numpy==1.17.3
lmdb==0.98
opencv-python==3.4.5.20

2.2 训练

数据准备

首先需要准备训练数据，目前只支持lmdb格式的数据，数据转换的步骤如下：

准备图片数据集，图片是根据检测框进行切分后的数据
准备label.txt，标注文件需保持如下的格式

1.jpg 文字检测
2.jpg 文字识别

进行lmdb格式数据集的转换

python3 tools/create_lmdb_dataset.py --inputPath {图片数据集路径} --gtFile {标注文件路径} --outputPath {lmdb格式数据集保存路径}

配置文件

目前每个模型都单独配备了一个配置文件，这里以CRNN为例，配置文件主要参数的含义如下：

一级参数	二级参数	参数含义	备注
TrainReader	dataloader	自定义的DataLoader类
	select_data	选择使用的lmdb格式数据集	默认为'/'，即使用{lmdb_sets_dir}路径下所有的lmdb数据集。如果想控制同一个batch里不同数据集的比例，可以配合{batch_ratio}使用，并将数据集名称用'-'进行分割，例如设置成'数据集1-数据集2-数据集3'
	batch_ratio	控制在一个batch中，各个lmdb格式数据集的比例	配合{select_data}进行使用，将比例用'-'进行分割，例如设置成'0.3-0.3-0.4'。即数据集1使用batch_size * 0.3的比例，剩余的数据集以此类推。
	total_data_usage_ratio	控制使用的整体数据集比例	默认为1.0，即使用全部的数据集
	padding	是否对数据进行padding补齐	默认为True，设置为False即采用resize的方式
Global	highest_acc_save_type	是否只保存识别率最高的模型	默认为False
	resumed_optimizer	是否加载之前保存的optimizer	默认为False
	batch_max_length	最大的字符串长度	超过这个字符串长度的训练数据会被过滤掉
	eval_batch_step	保存模型的间隔步数
Architecture	function	使用的模型	此处为'CRNN'
SeqRNN	input_size	LSTM输入的尺寸	即backbone输出的通道个数
	hidden_size	LSTM隐藏层的尺寸

模型训练

完成上述配置后，使用以下命令即可开始模型的训练：

python train.py -c configs/CRNN.yml

2.3 预测

配置文件

同样地，针对模型预测，也都单独配备了一个配置文件，这里以CRNN为例，需要修改的配置参数如下：

一级参数	二级参数	参数含义	备注
Global	pretrain_weights	模型文件路径	剩余配置参数和训练保持一致即可
	infer_img	待预测的图片，可以是文件夹或者是图片路径

模型预测

完成上述配置后，使用以下命令即可开始模型的预测：

python predict.py -c configs/CRNN.yml

3. 预训练模型

以下是5个开源的中文自然场景数据集，可以直接根据上述的模型配置进行模型训练：

数据集	网盘地址	备注
一共包括5个自然场景训练集： ArT_train, LSVT_train, MTWI_train, RCTW17_train, ReCTS_train 以及一个自然场景验证集：ReCTS_val	链接: https://pan.baidu.com/s/1fvExHzeojA_Yhj3_wDflwA 提取码: kzrd	"train"是训练集，"val"是验证集

以下为5个算法的预训练模型，训练的明细请见第4部分里的实验设定：

模型	网盘地址	备注
一共包含5个预训练模型：CRNN.pth, GRCNN.pth, FAN.pth, DAN.pth, SAR.pth 以及一个字典文件：keys.txt	链接: https://pan.baidu.com/s/1IG-1lxytrOqry9c5Nc1GzQ 提取码: k3ij

4. 实验结果

针对目前已复现的5个算法，我用统一的数据集以及参数设定进行了实验对比，实验设定以及实验结果如下：

实验设定

实验设定	明细	备注
训练集	ArT_train：44663 LSVT_train：218552 MTWI_train：79964 RCTW17_train：33342 ReCTS_train：83119	这5个均为开源自然场景数据集，其中做了剔除模糊数据等处理
验证集	ReCTS_val：9231	测试集为从ReCTS中按照9:1比例划分的验证集，注意ReCTS以水平文本居多
batch_size	128
img_shape	[1, 32, 256]	尺寸进行等比例放缩，小于256的进行padding，大于256的resize至256
optimizer	function: adam base_lr: 0.001 momentum: 0.9 weight_decay: 1.0e-4
iter	60000	一共训练了60000步，每2000步会进行一次验证

实验结果

算法	最高识别率	最大正则编辑距离	模型大小
CRNN	59.89	0.7959	120M
GRCNN	70.51	0.8597	78M
FAN	75.78	0.8924	764M
SAR	78.13	0.9037	722M
DAN	78.99	0.9064	639M

下图为各个算法在验证集上的识别率，每2000步会进行验证：

预测结果示例

算法	预测结果	备注
CRNN		预测结果均取自验证集识别率最高的模型，左边一列为预测结果，右边为标注结果
GRCNN
FAN
SAR
DAN

Comments

run predict.py error

model_infor networks.DAN,DAN 当前处理的图片是: 2.jpg Traceback (most recent call last): File "predict.py", line 110, in preds_str = text_recognizer(image) File "predict.py", line 87, in call preds_str = self.predict(image_tensor) File "predict.py", line 79, in predict outputs = self.model(image_tensor) File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 0 on device 0. Original Traceback (most recent call last): File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/ntt/Anaconda3/envs/ocr-py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'text'

opened by yl0911 3
日志打印问题

请问怎么禁止打印&存贮日志呢 """ 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,946 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 4435 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IHDR' 16 13 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 4663 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 7366 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 4360 2022-08-27 16:54:38,947 [DEBUG] STREAM b'IDAT' 4655 3808 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 5427 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 3699 2022-08-27 16:54:38,947 [DEBUG] STREAM b'zTXt' 41 4984 2022-08-27 16:54:38,948 [DEBUG] STREAM b'zTXt' 41 4005 """

opened by Fyzjym 2
使用python predict.py -c configs/FAN.yml报错

你好，我通过使用python predict.py -c configs/FAN.yml 这个句代码，报错了。报错内容为应该怎么解决呢？我看你在另一个问题下回复说CRNN和FAN是测试过的，请问还需要修改哪里吗？报错log： File "predict.py", line 87, in call preds_str = self.predict(image_tensor) File "predict.py", line 79, in predict outputs = self.model(image_tensor) File "D:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "D:\anaconda\lib\site-packages\torch\nn\parallel\data_parallel.py", line 149, in forward return self.module(*inputs, **kwargs) File "D:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'text'

opened by ZZHHogan 1
what is the value of 'num_steps ' for predict a new image

https://github.com/chibohe/text_recognition_toolbox/blob/5ef1261ec436ee564eb977dcad84ca60b19eaa93/networks/DAN.py#L298

maybe this DAN model have some bug for predict a new image ？？？

opened by Johnson-yue 1

issue

2022-05-30 21:06:41,349 [INFO ] dataset_root: G:/LRK/text_recognition_toolbox-main/dataset1/train dataset: / sub-directory: /. num samples: 0 num total samples of total dataset is 0

运行时无法读取到lmdb文件

报错在 def get_batch(self): batch = {'img': [], 'label': []}

    for i, data_loader_iter in enumerate(self.dataloader_iter_list): # 将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列
        try:
            image, text = data_loader_iter.next()  # next() 返回迭代器的下一个项目。
            batch['img'].append(image)
            batch['label'] += text
        except Exception:
            self.dataloader_iter_list[i] = iter(self.data_loader_list[i])
            image, text = self.dataloader_iter_list[i].next()
            batch['img'].append(image)
            batch['label'] += text
        # except ValueError:
        #     pass

请问能否解答一下，谢谢

opened by kkrl1111 0

Setting compress_layer: True, train DAN module Failed.

My image size is [3, 128, 256] Setting compress_layer as True When run python train.py -c config/DAN.yml Error message is : assert (scales[i-1][1] / scales[i][1]) % 1 == 0, 'layer scale error from {} to {}'.format(i-1, scales[i-1][1] , i, scales[i][1]) AssertError: layer scale error from 1 32 to 2 30

But change compress_layer to False python train.py -c config/DAN.yml process is running well

opened by maxh2010 1
predict failed for DAN model

hi， I trained the DAN model and got good performance ， but when I predicted with pretrained model there is a error : forward() missing 1 required positional augument: 'text'

And I check it in networks/DAN.py in line 42 :

why perdict need target text ??

opened by Johnson-yue 4
运行demo报错

RuntimeError: Error(s) in loading state_dict for DAN: Missing key(s) in state_dict: "feature_extractor.conv1.weight", "feature_extractor.bn1.weight", "

DAN设置如下： TrainReader: dataloader: dataset,BatchBalancedDataset select_data: '/' batch_ratio: '1.0' total_data_usage_ratio: 1.0 padding: True augment: False batch_size: 64 shuffle: True num_workers: 0 lmdb_sets_dir: train_set###百度网盘下载的训练文件夹

EvalReader: dataloader: dataset,evaldataloader select_data: '/' batch_size: 2 padding: True shuffle: True num_workers: 0 lmdb_sets_dir: test_set ###百度网盘下载的测试文件夹

TestReader: dataloader: dataset,evaldataloader select_data: '/' batch_size: 64 padding: True shuffle: True num_workers: 0 lmdb_sets_dir:

Global: algorithm: DAN use_gpu: True gpu_num: '0' device: cuda:0 num_iters: 800000 highest_acc_save_type: False data_filtering_off: False resumed_optimizer: False batch_max_length: 50 print_batch_step: 10 save_model_dir: output/DAN eval_batch_step: 2000 image_shape: [1, 32, 256] character_type: ch loss_type: attn use_space_char: false character_dict_path: keys.txt seed: 1234 pretrain_weights: models/DAN.pth ####百度下载的模型文件 save_inference_dir: results infer_img: test_pic ###存放测试图片的文件夹

Architecture: function: networks.DAN,DAN compress_layer: False layers: [3, 4, 6, 6, 3]

CAM: depth: 8 num_channel: 512

Loss: function: loss,AttnLoss blank_idx: 0

Optimizer: function: adam base_lr: 0.001 momentum: 0.9 weight_decay: 1.0e-4 lr_decay_epoch: 10 max_epoch: 1000

pytorch版本1.3 python3.7

opened by xycim 3

Owner

GitHub

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

259 Dec 28, 2022

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

100 Dec 1, 2022

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

255 Dec 29, 2022

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

166 Jan 4, 2023

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

671 Dec 31, 2022

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Introduction This repository contains my unofficial reimplementation of the standard ECAPA-TDNN, which is the speaker recognition in VoxCeleb2 dataset

277 Dec 31, 2022

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

270 Dec 31, 2022

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

198 Dec 27, 2022

Scene-Text-Detection-and-Recognition (Pytorch)

Scene-Text-Detection-and-Recognition (Pytorch) Competition URL: https://tbrain.t

9 Jan 2, 2023

Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Human Attention for Text Classification Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Foc

15 Dec 13, 2021

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

Hierarchical Uniform Manifold Approximation and Projection

HUMAP Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP for hierarchical non-linear dimensionality reduction. HU

160 Jan 6, 2023

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

CoulombGas This code implements the neural canonical transformation approach to the thermodynamic properties of uniform electron gas. Building on JAX,

9 Mar 3, 2022

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

Nonuniform-to-Uniform Quantization This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quanti

60 Dec 28, 2022