Text recognition (optical character recognition) with deep learning methods.

Clova AI Research

Last update: Jan 4, 2023

Related tags

Computer Vision ocr recognition deep-learning text-recognition rosetta ocr-recognition rare crnn scene-text scene-text-recognition grcnn r2am star-net iccv2019

Overview

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

Official PyTorch implementation of our four-stage STR framework, that most existing STR models fit into.
Using this framework allows for the module-wise contributions to performance in terms of accuracy, speed, and memory demand, under one consistent set of training and evaluation datasets.
Such analyses clean up the hindrance on the current comparisons to understand the performance gain of the existing modules.

Honors

Based on this framework, we recorded the 1st place of ICDAR2013 focused scene text, ICDAR2019 ArT and 3rd place of ICDAR2017 COCO-Text, ICDAR2019 ReCTS (task1).
The difference between our paper and ICDAR challenge is summarized here.

Updates

Aug 3, 2020: added guideline to use Baidu warpctc which reproduces CTC results of our paper.
Dec 27, 2019: added FLOPS in our paper, and minor updates such as log_dataset.txt and ICDAR2019-NormalizedED.
Oct 22, 2019: added confidence score, and arranged the output form of training logs.
Jul 31, 2019: The paper is accepted at International Conference on Computer Vision (ICCV), Seoul 2019, as an oral talk.
Jul 25, 2019: The code for floating-point 16 calculation, check @YacobBY's pull request
Jul 16, 2019: added ST_spe.zip dataset, word images contain special characters in SynthText (ST) dataset, see this issue
Jun 24, 2019: added gt.txt of failure cases that contains path and label of each image, see image_release_190624.zip
May 17, 2019: uploaded resources in Baidu Netdisk also, added Run demo. (check @sharavsambuu's colab demo also)
May 9, 2019: PyTorch version updated from 1.0.1 to 1.1.0, use torch.nn.CTCLoss instead of torch-baidu-ctc, and various minor updated.

Getting Started

Dependency

This work was tested with PyTorch 1.3.1, CUDA 10.1, python 3.6 and Ubuntu 16.04.
You may need pip3 install torch==1.3.1.
In the paper, expriments were performed with PyTorch 0.4.1, CUDA 9.0.
requirements : lmdb, pillow, torchvision, nltk, natsort

pip3 install lmdb pillow torchvision nltk natsort

Download lmdb dataset for traininig and evaluation from here

data_lmdb_release.zip contains below.
training datasets : MJSynth (MJ)[1] and SynthText (ST)[2]
validation datasets : the union of the training sets IC13[3], IC15[4], IIIT[5], and SVT[6].
evaluation datasets : benchmark evaluation datasets, consist of IIIT[5], SVT[6], IC03[7], IC13[3], IC15[4], SVTP[8], and CUTE[9].

Run demo with pretrained model

Download pretrained model from here
Add image files to test into demo_image/
Run demo.py (add --sensitive option if you use case-sensitive model)

CUDA_VISIBLE_DEVICES=0 python3 demo.py \
--Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn \
--image_folder demo_image/ \
--saved_model TPS-ResNet-BiLSTM-Attn.pth

prediction results

demo images	TRBA (TPS-ResNet-BiLSTM-Attn)	TRBA (case-sensitive version)
	available	Available
	shakeshack	SHARESHACK
	london	Londen
	greenstead	Greenstead
	toast	TOAST
	merry	MERRY
	underground	underground
	ronaldo	RONALDO
	bally	BALLY
	university	UNIVERSITY

Training and evaluation

Train CRNN[10] model

CUDA_VISIBLE_DEVICES=0 python3 train.py \
--train_data data_lmdb_release/training --valid_data data_lmdb_release/validation \
--select_data MJ-ST --batch_ratio 0.5-0.5 \
--Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC

Test CRNN[10] model. If you want to evaluate IC15-2077, check data filtering part.

CUDA_VISIBLE_DEVICES=0 python3 test.py \
--eval_data data_lmdb_release/evaluation --benchmark_all_eval \
--Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC \
--saved_model saved_models/None-VGG-BiLSTM-CTC-Seed1111/best_accuracy.pth

Try to train and test our best accuracy model TRBA (TPS-ResNet-BiLSTM-Attn) also. (download pretrained model)

CUDA_VISIBLE_DEVICES=0 python3 train.py \
--train_data data_lmdb_release/training --valid_data data_lmdb_release/validation \
--select_data MJ-ST --batch_ratio 0.5-0.5 \
--Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

CUDA_VISIBLE_DEVICES=0 python3 test.py \
--eval_data data_lmdb_release/evaluation --benchmark_all_eval \
--Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn \
--saved_model saved_models/TPS-ResNet-BiLSTM-Attn-Seed1111/best_accuracy.pth

Arguments

--train_data: folder path to training lmdb dataset.
--valid_data: folder path to validation lmdb dataset.
--eval_data: folder path to evaluation (with test.py) lmdb dataset.
--select_data: select training data. default is MJ-ST, which means MJ and ST used as training data.
--batch_ratio: assign ratio for each selected data in the batch. default is 0.5-0.5, which means 50% of the batch is filled with MJ and the other 50% of the batch is filled ST.
--data_filtering_off: skip data filtering when creating LmdbDataset.
--Transformation: select Transformation module [None | TPS].
--FeatureExtraction: select FeatureExtraction module [VGG | RCNN | ResNet].
--SequenceModeling: select SequenceModeling module [None | BiLSTM].
--Prediction: select Prediction module [CTC | Attn].
--saved_model: assign saved model to evaluation.
--benchmark_all_eval: evaluate with 10 evaluation dataset versions, same with Table 1 in our paper.

Download failure cases and cleansed label from here

image_release.zip contains failure case images and benchmark evaluation images with cleansed label.

When you need to train on your own dataset or Non-Latin language datasets.

Create your own lmdb dataset.

pip3 install fire
python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/

The structure of data folder as below.

data
├── gt.txt
└── test
    ├── word_1.png
    ├── word_2.png
    ├── word_3.png
    └── ...

At this time, gt.txt should be {imagepath}\t{label}\n
For example

test/word_1.png Tiredness
test/word_2.png kills
test/word_3.png A
...

Modify --select_data, --batch_ratio, and opt.character, see this issue.

Acknowledgements

This implementation has been based on these repository crnn.pytorch, ocr_attention.

Reference

[1] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.
[2] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.
[3] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[4] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.
[5] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.
[6] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.
[7] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.
[8] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.
[9] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.
[10] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.

Citation

Please consider citing this work in your publications if it helps your research.

@inproceedings{baek2019STRcomparisons,
  title={What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis},
  author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2019},
  pubstate={published},
  tppubtype={inproceedings}
}

Contact

Feel free to contact us if there is any question:
for code/paper Jeonghun Baek [email protected]; for collaboration [email protected] (our team leader).

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments

To train on my own dataset

Hi. I created lmdb dataset on my own data by running create_lmdb_dataset.py. then I run the train command on it and got the following output:

CUDA_VISIBLE_DEVICES=0 python3 train.py --train_data result/train --valid_data result/test --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

dataset_root: result/train opt.select_data: ['MJ', 'ST'] opt.batch_ratio: ['0.5', '0.5']

dataset_root: result/train dataset: MJ Traceback (most recent call last): File "train.py", line 283, in train(opt) File "train.py", line 26, in train train_dataset = Batch_Balanced_Dataset(opt) File "/home/mor-ai/Work/deep-text-recognition-benchmark/dataset.py", line 37, in init _dataset = hierarchical_dataset(root=opt.train_data, opt=opt, select_data=[selected_d]) File "/home/mor-ai/Work/deep-text-recognition-benchmark/dataset.py", line 106, in hierarchical_dataset concatenated_dataset = ConcatDataset(dataset_list) File "/home/mor-ai/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 187, in init assert len(datasets) > 0, 'datasets should not be an empty iterable' AssertionError: datasets should not be an empty iterable

Can you help me resolve this?

opened by xxxpsyduck 24
recog error using TPS-ResNet、VGG-BiLSTM-Attn

The sample images 和兰豪华感，而风上的 wrong 内饰设局觉觉温馨范儿 wrong 后扭力梁非独立悬架 correct 变之水波落务变得更出 wrong

I train the model using 32X256, then set batch_max_length=64(test and train)，I feel something has wrong，when the character has many in the sample，the result is wrong。

The traing datasets is normal。

Thanks

opened by AnddyWang 16
Accuracy difference between local retraining model and pretrained one

First, thanks for your great work :) ! You've done a good job!

Here's my question, I've retrained the model with the option as: "--select_data MJ-ST --batch_ratio 0.5-0.5 --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC" , corresponding to the original version of CRNN. The rest parameters are set as default and the model is trained on MJ and ST datasets.

However, when testing with my local retrained best_accuracy model, the result accuracy is shown as below: in IC13_857: only 88.45% while 91.1% in paper. in IC13_1015: 87.68% while 89.2% in paper. in IC15_1811: 66.37% while 69.4% in paper. in IC15_2077: 64.07% while 64.2% in paper.

It seems like there is still something inappropriate in my retraining process. Should I reset the learning rate or expand my training iteration? Do you guys have any idea about improving the performance to align with the public results illustrated in the paper?

And I've attempted to train only on MJ dataset, whose model seems to have a higher accuracy in IC13_857. When I extend the training on both MJ and ST, is it necessary to add up the iteration number, so that I can get a better accuracy?

Expect for your reply ^_^

opened by 1LOVESJohnny 11

How to train with custom data - AssertionError: datasets should not be an empty iterable

I use this command to train: !python3 '/content/deep-text-recognition-benchmark/create_lmdb_dataset.py' --inputPath '/content/deep-text-recognition-benchmark/train' --gtFile '/content/gt.txt' --outputPath '/content/deep-text-recognition-benchmark/result' I want to use it to detect license plates.

Input_pathhave inputPath with input images, with names like 'CJFY10,jpg' outputPath it is an empty folder. gtFile it is an txt with this format:

C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DVHS56.png DVHS56
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DYVS72.png DYVS72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HDYP18.png HDYP18
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HKHT72.png HKHT72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HPXC69.png HPXC69

And error when I run the command it is:

Traceback (most recent call last):
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 87, in <module>
    fire.Fire(createDataset)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 47, in createDataset
    imagePath, label = datalist[i].strip('\n').split('\t')
ValueError: not enough values to unpack (expected 2, got 1)

What I'm doing wrong? Any help is welcome

opened by pendex900x 8

validation wrong

when the code run at valid_loss, current_accuracy, current_norm_ED, preds, confidence_score, labels, infer_time, length_of_data = validation(model, criterion, valid_loader, converter, opt) , it stop here and raise no errors, Program cannot continue

opened by cqray1990 8
Can't training model with own lmdb dataset

I have a problem training model with own lmdb dataset. I use create_lmdb_dataset.py with 1000 sample Vietnamese to create database. When I training model, dataset_root: data/training opt.select_data: ['ST'] opt.batch_ratio: ['0.5']

dataset_root: data/training dataset: ST sub-directory: /ST num samples: 3 num total samples of ST: 3 x 1.0 (total_data_usage_ratio) = 3 num samples of ST per batch: 192 x 0.5 (batch_ratio) = 96

Total_batch_size: 96 = 96

Can you please tell me how to training own database. Thank you

opened by thangtran480 8
Difference in performance between online demo website and the offline code

Hi, I have been trying to run the code on these two images:

I get correct results on the demo website https://demo.ocr.clova.ai/, This is the result I get through the offline code: First Image: he11505973 Second Image: hijo86

opened by AyushP123 7
miss match in size

RuntimeError: Error(s) in loading state_dict for DataParallel: size mismatch for module.Prediction.attention_cell.rnn.weight_ih: copying a param with shape torch.Size([1024, 352]) from checkpoint, the shape in current model is torch.Size([1024, 294]). size mismatch for module.Prediction.generator.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([38]). size mismatch for module.Prediction.generator.weight: copying a param with shape torch.Size([96, 256]) from checkpoint, the shape in current model is torch.Size([38, 256]).

opened by GSATHYANARAYANA 7

Case Sensitive Option Not Working while running demo.py

I tried both the commands py demo.py --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --image_folder demo_image/ --saved_model TPS-ResNet-BiLSTM-Attn-case-sensitive.pth & py demo.py --sensitive --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --image_folder demo_image/ --saved_model TPS-ResNet-BiLSTM-Attn.pth

but getting this Error:
> RuntimeError: Error(s) in loading state_dict for DataParallel:
>         size mismatch for module.Prediction.attention_cell.rnn.weight_ih: copying a param with shape torch.Size([1024, 352]) from checkpoint, the shape in current model is torch.Size([1024, 294]).
>         size mismatch for module.Prediction.generator.weight: copying a param with shape torch.Size([96, 256]) from checkpoint, the shape in current model is torch.Size([38, 256]).
>         size mismatch for module.Prediction.generator.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([38]).

Please tell if I'm mistakening

opened by abdur75648 6

Size mismatch while loading pretrained model for fine-tuning with additional characters

Hi,

I have followed all the necessary steps for FT as given in the other threads but size mismatch error keeps recurring as the num of characters in the pre-trained model and custom dataset is not same. Can somebody please guide me as to how to load the pre-trained model with a modified prediction layer such it can be used for fine-tuning?

Thanks in advance!

opened by PrithaGanguly 6
path does not exist error while creating custom dataset

Hi I wanted to create my own dataset I used these instructions https://github.com/clovaai/deep-text-recognition-benchmark#when-you-need-to-create-lmdb-dataset When I issue the command python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt.txt --outputPath result/ I get the eror

it says the path does not exist but it actually exists. What can I do to prevent this error?

opened by omersert 6
Unable to run demo.py(demo.py runtime error size mismatch for module.Prediction.weight)

After learning, I try to test using demo.py using weights, but this error pops up and it does not run. --character of train.py and demo.py are all unified to the same character, but still not executed. please help help me

Error RuntimeError: Error(s) in loading state_dict for DataParallel: size mismatch for module.Prediction.weight: copying a param with shape torch.Size([11520, 256]) from checkpoint, the shape in current model is torch.Size([11522, 256]). size mismatch for module.Prediction.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([11522]).

opened by hic9507 0

Poor predictions when deploying a custom model on Arabic

Following this link instructions https://github.com/JaidedAI/EasyOCR/blob/master/custom_model.md. I have trained a custom model on my own dataset.

Here is the .yml file I used:

network_params:
  hidden_size: 512
  input_channel: 1
  output_channel: 512
  hidden_size: 512
  
imgH: 64
imgW: 600

lang_list:
         - 'en'
character_list: "0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ٠١٢٣٤٥٦٧٨٩«»؟،؛ءآأؤإئااًبةتثجحخدذرزسشصضطظعغفقكلمنهوىيٱٹپچڈڑژکڭگںھۀہۂۃۆۇۈۋیېےۓە"
number: '1234567890١٢٣٤٥٦٧٨٩٠'

The .py file:

import torch
import torch.nn as nn
import torch.nn.init as init
import torchvision
from torchvision import models
from collections import namedtuple
from packaging import version


def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            init.xavier_uniform_(m.weight.data)
            if m.bias is not None:
                m.bias.data.zero_()
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)
            m.bias.data.zero_()
        elif isinstance(m, nn.Linear):
            m.weight.data.normal_(0, 0.01)
            m.bias.data.zero_()

class vgg16_bn(torch.nn.Module):
    def __init__(self, pretrained=True, freeze=True):
        super(vgg16_bn, self).__init__()
        if version.parse(torchvision.__version__) >= version.parse('0.13'):
            vgg_pretrained_features = models.vgg16_bn(
                weights=models.VGG16_BN_Weights.DEFAULT if pretrained else None
            ).features
        else: #torchvision.__version__ < 0.13
            models.vgg.model_urls['vgg16_bn'] = models.vgg.model_urls['vgg16_bn'].replace('https://', 'http://')
            vgg_pretrained_features = models.vgg16_bn(pretrained=pretrained).features

        self.slice1 = torch.nn.Sequential()
        self.slice2 = torch.nn.Sequential()
        self.slice3 = torch.nn.Sequential()
        self.slice4 = torch.nn.Sequential()
        self.slice5 = torch.nn.Sequential()
        for x in range(12):         # conv2_2
            self.slice1.add_module(str(x), vgg_pretrained_features[x])
        for x in range(12, 19):         # conv3_3
            self.slice2.add_module(str(x), vgg_pretrained_features[x])
        for x in range(19, 29):         # conv4_3
            self.slice3.add_module(str(x), vgg_pretrained_features[x])
        for x in range(29, 39):         # conv5_3
            self.slice4.add_module(str(x), vgg_pretrained_features[x])

        # fc6, fc7 without atrous conv
        self.slice5 = torch.nn.Sequential(
                nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
                nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6),
                nn.Conv2d(1024, 1024, kernel_size=1)
        )

        if not pretrained:
            init_weights(self.slice1.modules())
            init_weights(self.slice2.modules())
            init_weights(self.slice3.modules())
            init_weights(self.slice4.modules())

        init_weights(self.slice5.modules())        # no pretrained model for fc6 and fc7

        if freeze:
            for param in self.slice1.parameters():      # only first conv
                param.requires_grad= False

    def forward(self, X):
        h = self.slice1(X)
        h_relu2_2 = h
        h = self.slice2(h)
        h_relu3_2 = h
        h = self.slice3(h)
        h_relu4_3 = h
        h = self.slice4(h)
        h_relu5_3 = h
        h = self.slice5(h)
        h_fc7 = h
        vgg_outputs = namedtuple("VggOutputs", ['fc7', 'relu5_3', 'relu4_3', 'relu3_2', 'relu2_2'])
        out = vgg_outputs(h_fc7, h_relu5_3, h_relu4_3, h_relu3_2, h_relu2_2)
        return out

class BidirectionalLSTM(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):
        super(BidirectionalLSTM, self).__init__()
        self.rnn = nn.LSTM(input_size, hidden_size, bidirectional=True, batch_first=True)
        self.linear = nn.Linear(hidden_size * 2, output_size)

    def forward(self, input):
        """
        input : visual feature [batch_size x T x input_size]
        output : contextual feature [batch_size x T x output_size]
        """
        try: # multi gpu needs this
            self.rnn.flatten_parameters()
        except: # quantization doesn't work with this 
            pass
        recurrent, _ = self.rnn(input)  # batch_size x T x input_size -> batch_size x T x (2*hidden_size)
        output = self.linear(recurrent)  # batch_size x T x output_size
        return output

class VGG_FeatureExtractor(nn.Module):

    def __init__(self, input_channel, output_channel=512):
        super(VGG_FeatureExtractor, self).__init__()
        self.output_channel = [int(output_channel / 8), int(output_channel / 4),
                               int(output_channel / 2), output_channel]
        self.ConvNet = nn.Sequential(
            nn.Conv2d(input_channel, self.output_channel[0], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(self.output_channel[0], self.output_channel[1], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(self.output_channel[1], self.output_channel[2], 3, 1, 1), nn.ReLU(True),
            nn.Conv2d(self.output_channel[2], self.output_channel[2], 3, 1, 1), nn.ReLU(True),
            nn.MaxPool2d((2, 1), (2, 1)),
            nn.Conv2d(self.output_channel[2], self.output_channel[3], 3, 1, 1, bias=False),
            nn.BatchNorm2d(self.output_channel[3]), nn.ReLU(True),
            nn.Conv2d(self.output_channel[3], self.output_channel[3], 3, 1, 1, bias=False),
            nn.BatchNorm2d(self.output_channel[3]), nn.ReLU(True),
            nn.MaxPool2d((2, 1), (2, 1)),
            nn.Conv2d(self.output_channel[3], self.output_channel[3], 2, 1, 0), nn.ReLU(True))

    def forward(self, input):
        return self.ConvNet(input)

class ResNet_FeatureExtractor(nn.Module):
    """ FeatureExtractor of FAN (http://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_Focusing_Attention_Towards_ICCV_2017_paper.pdf) """

    def __init__(self, input_channel, output_channel=512):
        super(ResNet_FeatureExtractor, self).__init__()
        self.ConvNet = ResNet(input_channel, output_channel, BasicBlock, [1, 2, 5, 3])

    def forward(self, input):
        return self.ConvNet(input)

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = self._conv3x3(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = self._conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def _conv3x3(self, in_planes, out_planes, stride=1):
        "3x3 convolution with padding"
        return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                         padding=1, bias=False)

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)

        return out

class ResNet(nn.Module):

    def __init__(self, input_channel, output_channel, block, layers):
        super(ResNet, self).__init__()

        self.output_channel_block = [int(output_channel / 4), int(output_channel / 2), output_channel, output_channel]

        self.inplanes = int(output_channel / 8)
        self.conv0_1 = nn.Conv2d(input_channel, int(output_channel / 16),
                                 kernel_size=3, stride=1, padding=1, bias=False)
        self.bn0_1 = nn.BatchNorm2d(int(output_channel / 16))
        self.conv0_2 = nn.Conv2d(int(output_channel / 16), self.inplanes,
                                 kernel_size=3, stride=1, padding=1, bias=False)
        self.bn0_2 = nn.BatchNorm2d(self.inplanes)
        self.relu = nn.ReLU(inplace=True)

        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.layer1 = self._make_layer(block, self.output_channel_block[0], layers[0])
        self.conv1 = nn.Conv2d(self.output_channel_block[0], self.output_channel_block[
                               0], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(self.output_channel_block[0])

        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.layer2 = self._make_layer(block, self.output_channel_block[1], layers[1], stride=1)
        self.conv2 = nn.Conv2d(self.output_channel_block[1], self.output_channel_block[
                               1], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(self.output_channel_block[1])

        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=(2, 1), padding=(0, 1))
        self.layer3 = self._make_layer(block, self.output_channel_block[2], layers[2], stride=1)
        self.conv3 = nn.Conv2d(self.output_channel_block[2], self.output_channel_block[
                               2], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.output_channel_block[2])

        self.layer4 = self._make_layer(block, self.output_channel_block[3], layers[3], stride=1)
        self.conv4_1 = nn.Conv2d(self.output_channel_block[3], self.output_channel_block[
                                 3], kernel_size=2, stride=(2, 1), padding=(0, 1), bias=False)
        self.bn4_1 = nn.BatchNorm2d(self.output_channel_block[3])
        self.conv4_2 = nn.Conv2d(self.output_channel_block[3], self.output_channel_block[
                                 3], kernel_size=2, stride=1, padding=0, bias=False)
        self.bn4_2 = nn.BatchNorm2d(self.output_channel_block[3])

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv0_1(x)
        x = self.bn0_1(x)
        x = self.relu(x)
        x = self.conv0_2(x)
        x = self.bn0_2(x)
        x = self.relu(x)

        x = self.maxpool1(x)
        x = self.layer1(x)
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.maxpool2(x)
        x = self.layer2(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)

        x = self.maxpool3(x)
        x = self.layer3(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu(x)

        x = self.layer4(x)
        x = self.conv4_1(x)
        x = self.bn4_1(x)
        x = self.relu(x)
        x = self.conv4_2(x)
        x = self.bn4_2(x)
        x = self.relu(x)

        return x


class Model(nn.Module):

    def __init__(self, input_channel, output_channel, hidden_size, num_class):
        super(Model, self).__init__()
        """ FeatureExtraction """
        self.FeatureExtraction = ResNet_FeatureExtractor(input_channel, output_channel)
        self.FeatureExtraction_output = output_channel  # int(imgH/16-1) * 512
        self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1))  # Transform final (imgH/16-1) -> 1

        """ Sequence modeling"""
        self.SequenceModeling = nn.Sequential(
            BidirectionalLSTM(self.FeatureExtraction_output, hidden_size, hidden_size),
            BidirectionalLSTM(hidden_size, hidden_size, hidden_size))
        self.SequenceModeling_output = hidden_size

        """ Prediction """
        self.Prediction = nn.Linear(self.SequenceModeling_output, num_class)


    def forward(self, input, text):
        """ Feature extraction stage """
        visual_feature = self.FeatureExtraction(input)
        visual_feature = self.AdaptiveAvgPool(visual_feature.permute(0, 3, 1, 2))  # [b, c, h, w] -> [b, w, c, h]
        visual_feature = visual_feature.squeeze(3)

        """ Sequence modeling stage """
        contextual_feature = self.SequenceModeling(visual_feature)

        """ Prediction stage """
        prediction = self.Prediction(contextual_feature.contiguous())

        return prediction

Then i predict like this:

ar_reader = easyocr.Reader(['en'], recog_network='arabic')
ar_reader.readtext(image_path,paragraph=True)

opened by MohieEldinMuhammad 10

use --sensitive ,characters have been changed

character: 动力电池总成系统种类物料编码压额定容量重产品号ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstvuwxyzAhkgKV-0123456789 sensitive: False PAD: False data_filtering_off: True Transformation: TPS FeatureExtraction: ResNet SequenceModeling: BiLSTM Prediction: Attn

============================================================== character: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[]^_`{|}~ sensitive: True PAD: False data_filtering_off: True Transformation: TPS FeatureExtraction: ResNet SequenceModeling: BiLSTM Prediction: Attn

I add the --sensitive into the config,the characters have been changed. What should I do if I want the model to recognize chinese , letter(sensitive) and numbers?

opened by Carolyn0822 0
How to know that my data set is well done?

I am trying to train a model to use it with EasyOCR in an ANPR project, as a first test I used a dataset that is provided in the EasyOCR repository, this is an example of the images used. The dataset contains approximately 1000 similar images in different sizes, after a training of 10,000 iterations I got poor results but since it was a training test I didn't give it more importance.

After that I tried to train my own model following the same steps I used to train the previous model, create my .txt file with the labels and run the script to create the mdb dataset, but after training I don't get any result, it stays on the same prediction and it never gets better. Example of images:

Another thing I noticed is that after the training script runs the dataset filter, I only had about 100 samples left out of 500 that the dataset has.

The training seems as if it was stuck since it never achieved a correct prediction, unlike with the first dataset, the training after 2000 iterations already gave good results.

These are my logs and results of failure training.

exp_name: None-VGG-BiLSTM-CTC-Seed1111 train_data: data valid_data: data/val manualSeed: 1111 workers: 0 batch_size: 192 num_iter: 10000 valInterval: 100 saved_model: FT: False adam: False lr: 1 beta1: 0.9 rho: 0.95 eps: 1e-08 grad_clip: 5 baiduCTC: False select_data: ['train'] batch_ratio: ['1'] total_data_usage_ratio: 1.0 batch_max_length: 25 imgH: 32 imgW: 100 rgb: False character: 0123456789-abcdefghijklmnopqrstuvwxyz sensitive: False PAD: False data_filtering_off: False Transformation: None FeatureExtraction: VGG SequenceModeling: BiLSTM Prediction: CTC num_fiducial: 20 input_channel: 1 output_channel: 512 hidden_size: 256 num_gpu: 0 num_class: 38

dataset_root: data opt.select_data: ['train'] opt.batch_ratio: ['1']

dataset_root: data dataset: train sub-directory: /train num samples: 102 num total samples of train: 102 x 1.0 (total_data_usage_ratio) = 102 num samples of train per batch: 192 x 1.0 (batch_ratio) = 192

Total_batch_size: 192 = 192

dataset_root: data/val dataset: / sub-directory: /. num samples: 1

[8400/10000] Train loss: 0.00001, Valid loss: 9.07022, Elapsed_time: 46794.15422 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0474 False

[8500/10000] Train loss: 0.00001, Valid loss: 9.07421, Elapsed_time: 47353.40529 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0477 False

[8600/10000] Train loss: 0.00001, Valid loss: 9.07817, Elapsed_time: 47907.41048 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0479 False

[8700/10000] Train loss: 0.00001, Valid loss: 9.08219, Elapsed_time: 49519.87970 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0481 False

opened by e-click 3
Random results with 0.0 confidence on demo.py

I re-trained a TPS-ResNet-BiLSTM-Attn model to predict the results for Persian numbers and dates. In other words, my character set contains 0-9 (Persian characters) plus a slash ("/") character.

When I use the test.py script to check the model accuracy on unseen data, I get a good result (about 96.3%).

However, when I use the demo.py script, I get random results for all input images.

I would appreciate it if you could help me with my problem.

opened by ashkanmradi 5

Text recognition (optical character recognition) with deep learning methods.

Related tags

Overview

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis

Honors

Updates

Getting Started

Dependency

Download lmdb dataset for traininig and evaluation from here

Run demo with pretrained model

prediction results

Training and evaluation

Arguments

Download failure cases and cleansed label from here

When you need to train on your own dataset or Non-Latin language datasets.

Acknowledgements

Reference

Links

Citation

Contact

License

Comments

CUDA_VISIBLE_DEVICES=0 python3 train.py --train_data result/train --valid_data result/test --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

dataset_root: result/train opt.select_data: ['MJ', 'ST'] opt.batch_ratio: ['0.5', '0.5']

I have a problem training model with own lmdb dataset. I use create_lmdb_dataset.py with 1000 sample Vietnamese to create database. When I training model, dataset_root: data/training opt.select_data: ['ST'] opt.batch_ratio: ['0.5']

dataset_root: data/training dataset: ST sub-directory: /ST num samples: 3 num total samples of ST: 3 x 1.0 (total_data_usage_ratio) = 3 num samples of ST per batch: 192 x 0.5 (batch_ratio) = 96

dataset_root: data opt.select_data: ['train'] opt.batch_ratio: ['1']

dataset_root: data dataset: train sub-directory: /train num samples: 102 num total samples of train: 102 x 1.0 (total_data_usage_ratio) = 102 num samples of train per batch: 192 x 1.0 (batch_ratio) = 192

Total_batch_size: 192 = 192

dataset_root: data/val dataset: / sub-directory: /. num samples: 1

[8400/10000] Train loss: 0.00001, Valid loss: 9.07022, Elapsed_time: 46794.15422 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0474 False

[8500/10000] Train loss: 0.00001, Valid loss: 9.07421, Elapsed_time: 47353.40529 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0477 False

[8600/10000] Train loss: 0.00001, Valid loss: 9.07817, Elapsed_time: 47907.41048 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0479 False

[8700/10000] Train loss: 0.00001, Valid loss: 9.08219, Elapsed_time: 49519.87970 Current_accuracy : 0.000, Current_norm_ED : 0.14 Best_accuracy : 0.000, Best_norm_ED : 0.14

Ground Truth | Prediction | Confidence Score & T/F

4nt5526 | 2103j6 | 0.0481 False

Owner

Clova AI Research

Optical character recognition for Japanese text, with the main focus being Japanese manga

Extract tables from scanned image PDFs using Optical Character Recognition.

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

Provides OCR (Optical Character Recognition) services through web applications

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

Programa que viabiliza a OCR (Optical Character Reading - leitura óptica de caracteres) de um PDF.

Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Handwritten Number Recognition using CNN and Character Segmentation

make a better chinese character recognition OCR than tesseract

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Character Segmentation using TensorFlow

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

OCR, Scene-Text-Understanding, Text Recognition

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.