CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

Overview

CRAFT-Reimplementation

Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

Reimplementation:Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee (Submitted on 3 Apr 2019)

The full paper is available at: https://arxiv.org/pdf/1904.01941.pdf

Install Requirements:

1、PyTroch>=0.4.1
2、torchvision>=0.2.1
3、opencv-python>=3.4.2
4、check requiremtns.txt
5、4 nvidia GPUs(we use 4 nvidia titanX)

pre-trained model:

NOTE: There are old pre-trained models, I will upload the new results pre-trained models' link.
Syndata:Syndata for baidu drive || Syndata for google drive
Syndata+IC15:Syndata+IC15 for baidu drive || Syndata+IC15 for google drive
Syndata+IC13+IC17:Syndata+IC13+IC17 for baidu drive|| Syndata+IC13+IC17 for google drive

Training

Note: When you train the IC15-Data or MLT-Data, please see the annotation in data_loader.py line 92 and line 108-112.

Train for Syndata

  • download the Syndata(I will give the link)
  • change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

  • change the path in trainSyndata.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、/data/CRAFT-pytorch/synweights/synweights -> /your_path/real_weights)

  • Run python trainSyndata.py

Train for IC15 data based on Syndata pre-trained model

  • download the IC15 data, rename the image file and the gt file for ch4_training_images and ch4_training_localization_transcription_gt,respectively.
  • change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

  • change the path in trainic15data.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、/data/CRAFT-pytorch/real_weights -> /your_path/real_weights)

  • change the path in trainic15data.py file:

(1、/data/CRAFT-pytorch/1-7.pth -> /your_path/your_pre-trained_model_name 2、/data/CRAFT-pytorch/icdar1317 -> /your_ic15data_path/)

  • Run python trainic15data.py

Train for IC13+17 data based on Syndata pre-trained model

  • download the MLT data, rename the image file and the gt file,respectively.
  • change the path in basernet/vgg16_bn.py file:

(/data/CRAFT-pytorch/vgg16_bn-6c64b313.pth -> /your_path/vgg16_bn-6c64b313.pth).You can download the model here.baidu||google

  • change the path in trainic-MLT_data.py file:

(1、/data/CRAFT-pytorch/SynthText -> /your_path/SynthText 2、savemodel path-> your savemodel path)

  • change the path in trainic-MLT_data.py file:

(1、/data/CRAFT-pytorch/1-7.pth -> /your_path/your_pre-trained_model_name 2、/data/CRAFT-pytorch/icdar1317 -> /your_ic15data_path/)

  • Run python trainic-MLT_data.py

If you want to train for weak supervised use our Syndate pre-trained model:

1、You should first download the pre_trained model trained in the Syndata baidu||google.
2、change the data path and pre-trained model path.
3、run python trainic15data.py

This code supprts for Syndata and icdar2015, and we will release the training code for IC13 and IC17 as soon as possible.

Methods dataset Recall precision H-mean
Syndata ICDAR13 71.93% 81.31% 76.33%
Syndata+IC15 ICDAR15 76.12% 84.55% 80.11%
Syndata+MLT(deteval) ICDAR13 86.81% 95.28% 90.85%
Syndata+MLT(deteval)(new gaussian map method) ICDAR13 90.67% 94.56% 92.57%
Syndata+IC15(new gaussian map method) ICDAR15 80.36% 84.25% 82.26%

We have released the latest code with new gaussian map and random crop algorithm.

Note:new gaussian map method can split the inference gaussian region score map
Sample:

Note:We have solved the problem about detecting big word. Now we are training the model. And any issues or advice are welcome.

Sample:

###weChat QR code

Contributing to the project

We will release training code as soon as possible, and we have not yet reached the results given in the author's paper. Any pull requests or issues are welcome. We also hope that you could give us some advice for the project.

Acknowledgement

Thanks for Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee excellent work and code for test. In this repo, we use the author repo's basenet and test code.

License

For commercial use, please contact us.

Comments
  • LinkRefiner

    LinkRefiner

    @backtime92 clovaai just released the LinkRefiner code clovaai/CRAFT-pytorch@3cd65f5 better you implement it, along with option to train . So that we can detect text-lines

    opened by ghost 4
  • question of  perspective transform

    question of perspective transform

    @backtime92 Thanks for updating the code. I have a few questions.

    1. Why is this function necessary to make the size of the character box 1.5 times larger by using enlargebox?

    https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/gaussianMap/gaussian.py#L104

    1. What is the reason for shifting the box using the variable top_left? https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/gaussianMap/gaussian.py#L113-L115
    opened by woans0104 3
  • Gaussian map have some problem with my model

    Gaussian map have some problem with my model

    I ported your code to mxnet but when try to trained a model syntext. Gaussian map have some problem with my model.Right of page is lose. @backtime92 can you give some advice ? _mask

    opened by dont32 3
  • loss nan

    loss nan

    I trained with the Synthtext data, and the loss became nan values ​​over 20,000 epochs. Can anyone give me advice on this issue?

    3000 0.1077 4000 0.1032 5000 0.1003 6000 0.0984 7000 0.0970 8000 0.0956 9000 0.0948 10000 0.0940 11000 0.0933 12000 0.0925 13000 0.0919 14000 0.0914 15000 0.0909 16000 0.0904 17000 0.0900 18000 0.0897 19000 0.0893 20000 0.0889 21000 0.0884 22000 0.0878 23000 nan 24000 nan 25000 nan 26000 nan 27000 nan 28000 nan 29000 nan 30000 nan 31000 nan 32000 nan 33000 nan 34000 nan 35000 nan 36000 nan 37000 nan 38000 nan 39000 nan 40000 nan 41000 nan 42000 nan 43000 nan 44000 nan 45000 nan 46000 nan 47000 nan 48000 nan 49000 nan 50000 nan

    opened by woans0104 2
  • Difference in eval score of ic15 pretrained model

    Difference in eval score of ic15 pretrained model

    Thanks for the code and model disclosure.

    Test results performed with the IC15 pretrained model are different from those recorded.

    I downloaded your ic15 pretrained model from Google Drive. After executing test function, evaluation function was executed.

    In 'readme', recall = 76%, precision = 84%, hmean = 80% for ICDAR2015 dataset. But the evaluation result I did with the model is recall = 70.5%, precision = 80.3%, hmean = 75.1%.

    Why are the results different? Is the model that measured the score different from the model of google drive?

    opened by huntu0042 2
  • can't load `Syndata.pth`: `load_state_dict Missing Keys`

    can't load `Syndata.pth`: `load_state_dict Missing Keys`

    ~~How is Syndata.pth different from vgg16_bn-6c64b313.pth? My guess is vgg16_bn-6c64b313.pth is just fro vgg16 and Syndata.pth is for the rest?~~

    when I use Syndata.pth in vgg16_bn.py, I get error: load_state_dict Missing keys ... but when I use vgg16_bn-6c64b313.pth, the model load properly.

    1. is Syndata.pth for the same model as vgg16_bn-6c64b313.pth?
    2. Why am I getting this bug?
    opened by ThisIsIsaac 2
  • Have you sorted boundry boxes?

    Have you sorted boundry boxes?

    Can you please let me if boundary box sorting is done? if not can you please help me for the same as i am working on something and want to get the boundary box sorted . I really appreciate.

    opened by nbhupendra 2
  • Address the training speed when train ICDAR2015 dataset

    Address the training speed when train ICDAR2015 dataset

    Hi, I am trying to train the ICDAR2015 dataset. Thanks for your work! I found the training speed is quite slow. The problem exists in the data loader (the num of workers is set to 0). However, if I set it larger than 1, it would meet the below problem when generating the pseudo label.

      File "/home/craft_reimplementation/data_loader.py", line 125, in inference_pursedo_bboxes
        img_torch = img_torch.type(torch.FloatTensor).cuda()
    RuntimeError: CUDA error: initialization error
    

    Do you have any good idea to solve this issue?

    opened by lianqing11 2
  • Operands could not be broadcast together with shapes (512,512) (306,220)

    Operands could not be broadcast together with shapes (512,512) (306,220)

    Hi author, thanks for your open source.

    When I train the icdar15 dataset, I met this problem:

    "operands could not be broadcast together with shapes (512,512) (306,220). "

    Do you also met this problem? And do you know which line of code causes this warning?

    opened by lianqing11 2
  • question on trainSynth.py result

    question on trainSynth.py result

    I experimented with the updated trainSynth.py, and there was no significant difference in performance from the code before the update.

    Is there any difference between the updated code and before it is different except for the difference in gaussian map generation, transform, and augmentation(update code only random crop)?

    And is the performance difference similar only in my experimental results?

    opened by woans0104 1
  • question on Weakly-Supervised Learning

    question on Weakly-Supervised Learning

    The fig5 of the paper shows the results of adjustment for each epoch. do we have to make Pseudo-label for each epoch in terms of implementation, or do we create them with Synthetic datasets learned models and then not reflect them in the loss?

    opened by woans0104 1
  • Dimensions of character boxes out of bounds on multiple epochs

    Dimensions of character boxes out of bounds on multiple epochs

    Found a small bug: On traversing through same image in multiple epochs and hence doing random_scaling every time, the coordinates in the previous epch get mutliplied by the scale factor every time, thus the dimensions of the character boxes overshoots the image dimension and hence no bounding boxes or region or affinity boxes made for that epoch.

    Reason : The list indexing and slicing done in load_image_gt_and_confidencemask function in Synth80k class doesn't make a deep copy of the charbox coordinates and hence the same coordinates get modified.

    Solution : using deepcopy for making list copy at line

    _charbox = copy.deepcopy(self.charbox[index]).transpose((2, 1, 0))

    https://github.com/backtime92/CRAFT-Reimplementation/blob/fbaa63aebd61c2b290752102cdef4758891b1fb7/data/dataset.py#L39

    Although I found it working on master branch: https://github.com/backtime92/CRAFT-Reimplementation/blob/d24fd6895fa7153506f1114afc3ced99f07f889a/data_loader.py#L406

    As a result in say 10th epoch, all we have is this image as output with no character boxes: ballet_106_0

    After fixing with deepcopy, then it produces this after 10 epochs and random scaling : ballet_106_0

    @backtime92 . Pardon me for tagging here but if this is not fixed already(excuse me if that is already fixed), this can certainly increase accuracy since if this is not done, then in the later epochs, all that the network is seeing is a plain image with no input character regions.

    opened by vasusharma7 1
  • What is interim model

    What is interim model

    Hello in part 3.2.2 of the main paper related to this CRAFT-Reimplementation, the authors say : " When a real image with word-level annotations is provided, the learned interim model predicts the character region score of the cropped word images to generate character-level bounding boxes" As far as I realized this model is used for generating character-level bounding boxes for word images that do not have character annotation. My question is what is the interim model architecture and how is this model trained? Is the interim model trained with the cropped words in the synth-text images?

    opened by Alikavari 0
  • tips for training icdar2015

    tips for training icdar2015

    Hello. I experimented with the icdar2015 dataset by referring to the code you shared. The results of my experiments are not good(ICDAR2015 f1-score 0.78). Do you have any important tips for improving the performance?

    opened by woans0104 0
Owner
null
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

null 188 Dec 28, 2022
Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Security camera running OpenCV for object and motion detection. The camera will send email with image of any objects it detects. It also runs a server that provides web interface with live stream video.

Peace 10 Jun 30, 2021
A novel region proposal network for more general object detection ( including scene text detection ).

DeRPN: Taking a further step toward more general object detection DeRPN is a novel region proposal network which concentrates on improving the adaptiv

Deep Learning and Vision Computing Lab, SCUT 151 Dec 12, 2022
This is a implementation of CRAFT OCR method

This is a implementation of CRAFT OCR method

Esaka 0 Nov 1, 2021
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
Scene text detection and recognition based on Extremal Region(ER)

Scene text recognition A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background. This algorithm is

HSIEH, YI CHIA 155 Dec 6, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Rotational region detection based on Faster-RCNN.

R2CNN_Faster_RCNN_Tensorflow Abstract This is a tensorflow re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detecti

UCAS-Det 581 Nov 22, 2022
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 1, 2023
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 4, 2022
Character Segmentation using TensorFlow

Character Segmentation Segment characters and spaces in one text line,from this paper Chinese English mixed Character Segmentation as Semantic Segment

null 26 Aug 25, 2022
Handwritten Number Recognition using CNN and Character Segmentation

Handwritten-Number-Recognition-With-Image-Segmentation Info About this repository This Repository is aimed at reading handwritten images of numbers an

Sparsha Saha 17 Aug 25, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 8, 2021
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

null 174 Dec 31, 2022