MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Canjie Luo

Last update: Dec 27, 2022

Related tags

Overview

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

Python 2.7	Python 3.6

MORAN is a network with rectification mechanism for general scene text recognition. The paper (accepted to appear in Pattern Recognition, 2019) in arXiv, final version is available now.

Here is a brief introduction in Chinese.

Recent Update

2019.03.21 Fix a bug about Fractional Pickup.
Support Python 3.

Improvements of MORAN v2:

More stable rectification network for one-stage training
Replace VGG backbone by ResNet
Use bidirectional decoder (a trick borrowed from ASTER)

Version	IIIT5K	SVT	IC03	IC13	SVT-P	CUTE80	IC15 (1811)	IC15 (2077)
MORAN v1 (curriculum training)*	91.2	88.3	95.0	92.4	76.1	77.4	74.7	68.8
MORAN v2 (one-stage training)	93.4	88.3	94.2	93.2	79.7	81.9	77.8	73.9

*The results of v1 were reported in our paper. If this project is helpful for your research, please cite our Pattern Recognition paper.

Requirements

(Welcome to develop MORAN together.)

We recommend you to use Anaconda to manage your libraries.

Python 2.7 or Python 3.6 (Python 3 is faster than Python 2)
PyTorch 0.3.* (Higher version causes slow training, please ref to issue#8)
TorchVision
OpenCV
PIL (Pillow)
Colour
LMDB
matplotlib

Or use pip to install the libraries. (Maybe the torch is different from the anaconda version. Please check carefully and fix the warnings in training stage if necessary.)

    pip install -r requirements.txt

Data Preparation

Please convert your own dataset to LMDB format by using the tool (run in Python 2.7) provided by @Baoguang Shi.

You can also download the training (NIPS 2014, CVPR 2016) and testing datasets prepared by us.

The raw pictures of testing datasets can be found here.

Training and Testing

Modify the path to dataset folder in train_MORAN.sh:

	--train_nips path_to_dataset \
	--train_cvpr path_to_dataset \
	--valroot path_to_dataset \

And start training: (manually decrease the learning rate for your task)

	sh train_MORAN.sh

The training process should take less than 20s for 100 iterations on a 1080Ti.

Demo

Download the model parameter file demo.pth.

BaiduCloud (password: l8em)
Google Drive
OneDrive

Put it into root folder. Then, execute the demo.py for more visualizations.

	python demo.py

Citation

@article{cluo2019moran,
  author    = {Canjie Luo and Lianwen Jin and Zenghui Sun},
  title     = {MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition},
  journal   = {Pattern Recognition}, 
  volume    = {90}, 
  pages     = {109--118},
  year      = {2019},
  publisher = {Elsevier}
}

Acknowledgment

The repo is developed based on @Jieru Mei's crnn.pytorch and @marvis' ocr_attention. Thanks for your contribution.

Attention

The project is only free for academic research purposes.

Comments

python demo.py 报错

环境是按照readme里提示装的。能够正常训练。但是运行 python demo.py时报错，如下：

`gzh@root0-SCW4350-4:~/ocr/MORAN_v2-master$ python demo.py

loading pretrained model from ./demo.pth Traceback (most recent call last): File "demo.py", line 55, in output = MORAN(image, length, text, text, test=True, debug=True) File "/home/gzh/SoftWare/tf1.4/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/gzh/ocr/MORAN_v2-master/models/moran.py", line 15, in forward x_rectified, demo = self.MORN(x, test, debug=debug) File "/home/gzh/SoftWare/tf1.4/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File "/home/gzh/ocr/MORAN_v2-master/models/morn.py", line 105, in forward img = to_pil_image(img_small) File "/home/gzh/SoftWare/tf1.4/anaconda2/lib/python2.7/site-packages/torchvision/transforms/transforms.py", line 126, in call return F.to_pil_image(pic, self.mode) File "/home/gzh/SoftWare/tf1.4/anaconda2/lib/python2.7/site-packages/torchvision/transforms/functional.py", line 138, in to_pil_image 'not {}'.format(type(npimg))) TypeError: Input pic must be a torch.Tensor or NumPy ndarray, not <class 'torch.FloatTensor'>`

您看看是什么原因呢？

opened by gzhcv 23
训练中的字符集问题

你好，我想请问下，论文模型的训练数据集如果是synth90k的话，这个数据只包含字母/数字，不包含标点符号。那么在进行测试时，比如ICDAR15这种带标点符号的数据集，你们是如何测试的呢？是忽略标点符号来计算准确率，还是说就按照正常的来计算呢，因为这样训练的模型没有标点符号的类，肯定预测不出标点符号。

opened by bjlgcxc 11
got error, could you give me some help?

Traceback (most recent call last): File "demo.py", line 17, in MORAN = MORAN.cuda() File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in cuda return self._apply(lambda t: t.cuda(device)) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply module._apply(fn) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 112, in _apply self.flatten_parameters() File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

opened by passion3394 11
Accuracy during training

Hi:@Canjie-Luo 请问在readme里面展示的在IC15 (2077)上one-stage的准确率有73.9%是在CVPR和NIPS两个数据集上跑训练过程中得到的最高val结果吗？我现在也在这两个数据集上跑，虽然还没有跑完(还有70000次迭代),但是val最高只有66%,并且在这附近震荡.您有什么建议吗？ P.S:中间服务器断过一次,我接着最近保存的模型跑得，断之前也是在65%左右震荡. P.S2:除了扩大了字母表其他没变化。

opened by fsluckymao 10
Someone please upload the training set to google drive

I have been training moran on my own set and it is giving me good results, but i was not able to download the training set from BaiduPan due to some issues,

If anybody has been able to download the set,could they please upload it to google drive and share the link.

Thanks in advance.
help wanted good first issue

opened by DecentMakeover 9
resume training with My own dataset acc problem

thanks. my goal is to resume training with my own dataset. My dataset has only one special character +0-9 anda-z (38 char's). My trainingimages are 316,000 and validationimages are 144,000images. during training, this line of main.py print("correct / total: %d / %d, " % (n_correct, n_total)) prints n_total:64,000. why 64000? None of my trainingand testingimages are 64,000. other note is that accduring training reach 99% but when i load trained model and test some images with demo.py, accis 0 and all of predictions are wrong. I resumetraining with your demo.pth but actually, results of demo.pth is very very better than new trained models.

Can you help me what was wrong? thanks

opened by johnSmith1990 8
MORN question~

您好。我想问一下，在MORN里图片过了CNN出来的size为什么是1311，而不是paper上写的2311呢？还有， https://github.com/Canjie-Luo/MORAN_v2/blob/1eb698848025900aa4638c5ca6e2605cb4681f58/models/morn.py#L66-L69 为什么只对y轴进行了矫正呢？还是我理解错了？

opened by JingLiJJ 8
question about your cnn down sampling

it looks for me that your are down sampling along x-aixs only twice (so the final feature map along x-axis is 1/4 of the image width), but you are down sampling y-axis five times (so the final feature map along y-axis is 1/32 times of the image height ).

I wonder, could we down sample along x-axis for more times, for Chinese characters ? Because Chinese characters are usually much bigger than English characters. If we are using the 1/4 down sampling, the training and inference time would be very significant for wide images as the input (we usually get images with 500px width from the detection stage ).

What is your idea? Thanks!

opened by Jiakui 8
中文辨識以及文本檢測需求！

您好，早上從paperswithcode找到這裏。實測的結果，效果超猛，真是太神啦。(我目前為止試過英數字辨識最魯棒的...扭曲、缺角都認得出來)

不過除了測試英數字的效果，請問有測試訓練過中文嗎？或是有這樣的計劃嗎？

另外我目前的理解是針對文本偵測框選擷取後，再用MORAN進行文字辨識，只是若想要辨識一份文件時，還需要先偵測到其中的文本，這一段是不是需要另外使用像TextBox...之類的，還是我的理解有誤？

不好意思，隨便亂問，不過真的是謝謝。

opened by kaonick 7
Testing datasets for training?

Hi

I am currently downloading the dataset using baidu client disk , the problem is the download speed is very slow, around 100 kbps/sec. my first unrelated question is,if i upgrade to a premium account, will i get a speedup on my download?

Secondly , can i use the testing datasets for training?

Thanks in advance.

opened by DecentMakeover 7
Problems downloading file from drive

Hi I'm trying to download the data from Baidu drive from a linux system , but whenever i hit the download button , it always redirects me to install some baidu disk client, which is an exe file.

Is there some way around this? Do you have any plans of uploading it on Gdrive or Dropbox?

Thanks in advance

opened by DecentMakeover 7
Accuracy too low, any sugguestions?

Step by step guided by the README, i created an env almost the same as the proj.

the env is listed below: conda 4.5.11 pytorch 1.12 cuda 11.7

After changing the path of datasets in the 'train.sh' Using the CVPR2016 and NIPS2014, i got the highest accuracy 79.43 after 9 Epochs iterations. Seems far away from the paper result. Anyone has any ideas or suggestion? 3x a lot !

opened by LateTensor 1
您好，大佬，我有一些问题可以请教吗？

首先是CVPR2016的数据集，我想要images+gts 格式的，请问您可以提供吗，我在网上找了好久都找不到。因为我自己找了一些中文图片(RCTW2017) 但我不知道符不符合要求，因为他是一张图片里面包含好几个文字块。我想用CVPR2016与他做一个对比。其次是有没有中英文+数字的模型？我想先看看效果咋样。谢谢！

opened by sssimpleboy 6
求救！训练自己的数据集时报错，维度问题

RuntimeError: While copying the parameter named ASRN.attentionL2R.char_embeddings, whose dimensions in the model are torch.Size([37, 256]) and whose dimensions in the checkpoint are torch.Size([38, 256]).

opened by dzyanshan 1
Testing trained model with demo.py

I have trained on IAM words dataset and when I am testing the result from demo.py file. The model gives random results and even different results every time on same image.

So is there any decoder issue? Because I have not converted my data into lmdb format. I am loading images from the .txt file.

opened by premtibadiya 1

Owner

Canjie Luo

GitHub

OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

354 Dec 12, 2022

A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT

151 Dec 12, 2022

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

Scene text recognition

AttentionOCR for Arbitrary-Shaped Scene Text Recognition Introduction This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2

777 Jan 9, 2023

End-to-end pipeline for real-time scene text detection and recognition.

Real-time-Scene-Text-Detection-and-Recognition-System End-to-end pipeline for real-time scene text detection and recognition. The detection model use

89 Aug 4, 2022

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

572 Jan 5, 2023

Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

155 Dec 6, 2022

A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

1.6k Dec 22, 2022

A curated list of papers and resources for scene text detection and recognition

Awesome Scene Text A curated list of papers and resources for scene text detection and recognition The year when a paper was first published, includin

43 Mar 15, 2022

Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

763 Jan 1, 2023

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

1000 Dec 27, 2022

A toolbox of scene text detection and recognition

FudanOCR This toolbox contains the implementations of the following papers: Scene Text Telescope: Text-Focused Scene Image Super-Resolution [Chen et a

170 Dec 26, 2022

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

30 Oct 22, 2022

Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

496 Jan 5, 2023

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 8, 2023

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

309 Dec 6, 2022

Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021

AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https://github.com/huoyijie/raspberrypi-car

AdvancedEAST AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST:An Efficient and Accurate Scene Text Dete

1.2k Dec 29, 2022

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

933 Dec 29, 2022