CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Last update: Dec 28, 2022

Related tags

Overview

CRAFT-Reimplementation

Note：If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

Reimplementation：Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee (Submitted on 3 Apr 2019)

The full paper is available at: https://arxiv.org/pdf/1904.01941.pdf

Install Requirements:

1、PyTroch>=0.4.1
2、torchvision>=0.2.1
3、opencv-python>=3.4.2
4、check requiremtns.txt
5、4 nvidia GPUs(we use 4 nvidia titanX)

pre-trained model:

NOTE: There are old pre-trained models, I will upload the new results pre-trained models' link.
Syndata:Syndata for baidu drive || Syndata for google drive
Syndata+IC15:Syndata+IC15 for baidu drive || Syndata+IC15 for google drive
Syndata+IC13+IC17:Syndata+IC13+IC17 for baidu drive|| Syndata+IC13+IC17 for google drive

Training

Note: When you train the IC15-Data or MLT-Data, please see the annotation in data_loader.py line 92 and line 108-112.

Train for Syndata

download the Syndata(I will give the link)
change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

change the path in trainSyndata.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、/data/CRAFT-pytorch/synweights/synweights -> /your_path/real_weights)

Run python trainSyndata.py

Train for IC15 data based on Syndata pre-trained model

download the IC15 data, rename the image file and the gt file for ch4_training_images and ch4_training_localization_transcription_gt,respectively.
change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

change the path in trainic15data.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、/data/CRAFT-pytorch/real_weights -> /your_path/real_weights)

change the path in trainic15data.py file:

(1、/data/CRAFT-pytorch/1-7.pth -> /your_path/your_pre-trained_model_name 2、/data/CRAFT-pytorch/icdar1317 -> /your_ic15data_path/)

Run python trainic15data.py

Train for IC13+17 data based on Syndata pre-trained model

download the MLT data, rename the image file and the gt file,respectively.
change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

change the path in trainic-MLT_data.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、savemodel path-> your savemodel path)

change the path in trainic-MLT_data.py file:

(1、/data/CRAFT-pytorch/1-7.pth -> /your_path/your_pre-trained_model_name 2、/data/CRAFT-pytorch/icdar1317 -> /your_ic15data_path/)

Run python trainic-MLT_data.py

If you want to train for weak supervised use our Syndate pre-trained model:

1、You should first download the pre_trained model trained in the Syndata baidu||google.
2、change the data path and pre-trained model path.
3、run python trainic15data.py

This code supprts for Syndata and icdar2015, and we will release the training code for IC13 and IC17 as soon as possible.

Methods	dataset	Recall	precision	H-mean
Syndata	ICDAR13	71.93%	81.31%	76.33%
Syndata+IC15	ICDAR15	76.12%	84.55%	80.11%
Syndata+MLT(deteval)	ICDAR13	86.81%	95.28%	90.85%
Syndata+MLT(deteval)(new gaussian map method)	ICDAR13	90.67%	94.56%	92.57%
Syndata+IC15(new gaussian map method)	ICDAR15	80.36%	84.25%	82.26%

We have released the latest code with new gaussian map and random crop algorithm.

Note:new gaussian map method can split the inference gaussian region score map
Sample:

Note:We have solved the problem about detecting big word. Now we are training the model. And any issues or advice are welcome.

Sample:

###weChat QR code

Contributing to the project

We will release training code as soon as possible， and we have not yet reached the results given in the author's paper. Any pull requests or issues are welcome. We also hope that you could give us some advice for the project.

Acknowledgement

Thanks for Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee excellent work and code for test. In this repo, we use the author repo's basenet and test code.

License

For commercial use, please contact us.

Comments

LinkRefiner

@backtime92 clovaai just released the LinkRefiner code clovaai/CRAFT-pytorch@3cd65f5 better you implement it, along with option to train . So that we can detect text-lines

opened by ghost 4
question of perspective transform
@backtime92 Thanks for updating the code. I have a few questions.

Why is this function necessary to make the size of the character box 1.5 times larger by using enlargebox?

https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/gaussianMap/gaussian.py#L104

What is the reason for shifting the box using the variable top_left? https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/gaussianMap/gaussian.py#L113-L115
opened by woans0104 3
Gaussian map have some problem with my model

I ported your code to mxnet but when try to trained a model syntext. Gaussian map have some problem with my model.Right of page is lose. @backtime92 can you give some advice ?

opened by dont32 3
loss nan

I trained with the Synthtext data, and the loss became nan values over 20,000 epochs. Can anyone give me advice on this issue?

3000 0.1077 4000 0.1032 5000 0.1003 6000 0.0984 7000 0.0970 8000 0.0956 9000 0.0948 10000 0.0940 11000 0.0933 12000 0.0925 13000 0.0919 14000 0.0914 15000 0.0909 16000 0.0904 17000 0.0900 18000 0.0897 19000 0.0893 20000 0.0889 21000 0.0884 22000 0.0878 23000 nan 24000 nan 25000 nan 26000 nan 27000 nan 28000 nan 29000 nan 30000 nan 31000 nan 32000 nan 33000 nan 34000 nan 35000 nan 36000 nan 37000 nan 38000 nan 39000 nan 40000 nan 41000 nan 42000 nan 43000 nan 44000 nan 45000 nan 46000 nan 47000 nan 48000 nan 49000 nan 50000 nan

opened by woans0104 2
Difference in eval score of ic15 pretrained model

Thanks for the code and model disclosure.

Test results performed with the IC15 pretrained model are different from those recorded.

I downloaded your ic15 pretrained model from Google Drive. After executing test function, evaluation function was executed.

In 'readme', recall = 76%, precision = 84%, hmean = 80% for ICDAR2015 dataset. But the evaluation result I did with the model is recall = 70.5%, precision = 80.3%, hmean = 75.1%.

Why are the results different? Is the model that measured the score different from the model of google drive?

opened by huntu0042 2
can't load `Syndata.pth`: `load_state_dict Missing Keys`
~~How is Syndata.pth different from vgg16_bn-6c64b313.pth? My guess is vgg16_bn-6c64b313.pth is just fro vgg16 and Syndata.pth is for the rest?~~

when I use Syndata.pth in vgg16_bn.py, I get error: load_state_dict Missing keys ... but when I use vgg16_bn-6c64b313.pth, the model load properly.

is Syndata.pth for the same model as vgg16_bn-6c64b313.pth?

Why am I getting this bug?
opened by ThisIsIsaac 2
Have you sorted boundry boxes?

Can you please let me if boundary box sorting is done? if not can you please help me for the same as i am working on something and want to get the boundary box sorted . I really appreciate.

opened by nbhupendra 2
Address the training speed when train ICDAR2015 dataset
Hi, I am trying to train the ICDAR2015 dataset. Thanks for your work! I found the training speed is quite slow. The problem exists in the data loader (the num of workers is set to 0). However, if I set it larger than 1, it would meet the below problem when generating the pseudo label.

File "/home/craft_reimplementation/data_loader.py", line 125, in inference_pursedo_bboxes img_torch = img_torch.type(torch.FloatTensor).cuda() RuntimeError: CUDA error: initialization error

Do you have any good idea to solve this issue?
opened by lianqing11 2
Operands could not be broadcast together with shapes (512,512) (306,220)

Hi author, thanks for your open source.

When I train the icdar15 dataset, I met this problem:

"operands could not be broadcast together with shapes (512,512) (306,220). "

Do you also met this problem? And do you know which line of code causes this warning?

opened by lianqing11 2
question on trainSynth.py result

I experimented with the updated trainSynth.py, and there was no significant difference in performance from the code before the update.

Is there any difference between the updated code and before it is different except for the difference in gaussian map generation, transform, and augmentation(update code only random crop)?

And is the performance difference similar only in my experimental results?

opened by woans0104 1
question on Weakly-Supervised Learning

The fig5 of the paper shows the results of adjustment for each epoch. do we have to make Pseudo-label for each epoch in terms of implementation, or do we create them with Synthetic datasets learned models and then not reflect them in the loss?

opened by woans0104 1
Dimensions of character boxes out of bounds on multiple epochs

Found a small bug: On traversing through same image in multiple epochs and hence doing random_scaling every time, the coordinates in the previous epch get mutliplied by the scale factor every time, thus the dimensions of the character boxes overshoots the image dimension and hence no bounding boxes or region or affinity boxes made for that epoch.

Reason : The list indexing and slicing done in load_image_gt_and_confidencemask function in Synth80k class doesn't make a deep copy of the charbox coordinates and hence the same coordinates get modified.

Solution : using deepcopy for making list copy at line

_charbox = copy.deepcopy(self.charbox[index]).transpose((2, 1, 0))

https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/data/dataset.py#L39

Although I found it working on master branch: https://github.com/backtime92/CRAFT-Reimplementation/blob/d24fd6895fa7153506f1114afc3ced99f07f889a/data_loader.py#L406

As a result in say 10th epoch, all we have is this image as output with no character boxes:

After fixing with deepcopy, then it produces this after 10 epochs and random scaling :

@backtime92 . Pardon me for tagging here but if this is not fixed already(excuse me if that is already fixed), this can certainly increase accuracy since if this is not done, then in the later epochs, all that the network is seeing is a plain image with no input character regions.

opened by vasusharma7 1
What is interim model

Hello in part 3.2.2 of the main paper related to this CRAFT-Reimplementation, the authors say : " When a real image with word-level annotations is provided, the learned interim model predicts the character region score of the cropped word images to generate character-level bounding boxes" As far as I realized this model is used for generating character-level bounding boxes for word images that do not have character annotation. My question is what is the interim model architecture and how is this model trained? Is the interim model trained with the cropped words in the synth-text images?

opened by Alikavari 0
tips for training icdar2015

Hello. I experimented with the icdar2015 dataset by referring to the code you shared. The results of my experiments are not good(ICDAR2015 f1-score 0.78). Do you have any important tips for improving the performance?

opened by woans0104 0

Owner

GitHub

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

10 Jun 30, 2021

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Related tags

Overview

CRAFT-Reimplementation

Note：If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

Reimplementation：Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Install Requirements:

pre-trained model:

Training

Train for Syndata

Train for IC15 data based on Syndata pre-trained model

Train for IC13+17 data based on Syndata pre-trained model

If you want to train for weak supervised use our Syndate pre-trained model:

We have released the latest code with new gaussian map and random crop algorithm.

Contributing to the project

Acknowledgement

License

Comments

Owner

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

A novel region proposal network for more general object detection ( including scene text detection ).

This is a implementation of CRAFT OCR method

caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Scene text detection and recognition based on Extremal Region(ER)

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Rotational region detection based on Faster-RCNN.

Text recognition (optical character recognition) with deep learning methods.

Optical character recognition for Japanese text, with the main focus being Japanese manga

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

Corner-based Region Proposal Network

Character Segmentation using TensorFlow

Handwritten Number Recognition using CNN and Character Segmentation

Extract tables from scanned image PDFs using Optical Character Recognition.

ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

make a better chinese character recognition OCR than tesseract

Provides OCR (Optical Character Recognition) services through web applications