TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

Overview

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

Datasets

Train

Pre-train with SynthText

  1. Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.
cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz
  1. Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.

  2. Transform(Pre-process) the SynthText data into the ICDAR data format.

python data_provider/SynthText2ICDAR.py
  1. Train with SynthText for 10 epochs(with 1 GPU).
python train.py \
  --max_steps=715625 \
  --gpu_list='0' \
  --checkpoint_path=ckpt/synthText_10eps/ \
  --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
  --training_img_data_dir=data/SynthText/ \
  --training_gt_data_dir=data/SynthText/ \
  --icdar=False \
  1. Visualize pre-pretraining progress with TensorBoard.
tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

  • Combine ICDAR data before training.

    1. Place ICDAR data under tmp/ foler.
    2. Run the following script to combine the data.
    python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
    
  • ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

    • Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR17MLT/ \
      --pretrained_model_path=ckpt/synthText_10eps/ \
      --train_stage=0 \
      --training_img_data_dir=data/ICDAR17MLT/imgs/ \
      --training_gt_data_dir=data/ICDAR17MLT/gts/
    
  • ICDAR 2015

    • Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR15/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR15+13/imgs/ \
      --training_gt_data_dir=data/ICDAR15+13/gts/
    
  • ICDAR 2013(horizontal text only)

    • Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).
    python train.py \
      --gpu_list='0' \
      --checkpoint_path=ckpt/ICDAR13/ \
      --pretrained_model_path=ckpt/ICDAR17MLT/ \
      --training_img_data_dir=data/ICDAR13/imgs/ \
      --training_gt_data_dir=data/ICDAR13/gts/
    

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

References

Comments
  • how to support it for ICDAR2017

    how to support it for ICDAR2017

    Hi, I like your project and I trained it for ICDAR2017, but I met a problem.

    reading file error: data\ICDAR17MLT\gts\gt_ICDAR17MLT_img_199.txt ثوار substring not found

    ICDAR2017 DATA contains multi languages, such as Arabic, Korean. I know maybe I should modify the config.py, but I‘m not clear about how to do it. Could you please give me any suggestion? Thanks a lot.

    opened by wang-lei-cn 1
  • How to choose lambda correctly?

    How to choose lambda correctly?

    Hi, why did you choose to set lambda to 1 when calculating the total loss? I know that in the FOTS paper they set lambda to 1, but in other FOTS repos the value is often 0,01. Do you know in which interval the values of the recognition loss and the detection loss are?

    opened by daniiki 1
  • Restore error

    Restore error

    When I try to restore training using the previous checkpoint, I get the following error:

    continue training from previous checkpoint ERROR:tensorflow:Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041 E1126 14:28:09.314913 139785224988480 checkpoint_management.py:346] Couldn't match files for checkpoint [my checkpoint path]/model.ckpt-20041 ... File "train.py", line 216, in main saver.restore(sess, ckpt) File "[python path]/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1277, in restore raise ValueError("Can't load save_path when it is None.") ValueError: Can't load save_path when it is None.

    How can I fix it? If I check the checkpoint after training is complete, only checkpoint, events.out.tfevents.1606212446, model.ckpt-20041.data-00000-of-00001 , model.ckpt-20041.index and model.ckpt-20041.meta are created. Even if an absolute path is used for the first training, multiple checkpoints are created and are not restored.

    Or am I misusing the following? python train.py --gpu_list='0' --checkpoint_path=[check point path (absolute path)]/ --training_img_data_dir=[images path]/ --training_gt_data_dir=[gts path]/ (I set FLAG.Restore to True in train.py)

    opened by ddongmogi 0
  • Recognition Error

    Recognition Error

    1. We are applying an additional Korean language when studying on config.py in order to do Korean ocr. Does fosts support multi-language features?

    2. Even though we completed the recognition test through the github readme, there seems to be a problem in detection, but we don’t know that is the cause of the error.

    opened by JaeD-Shin 7
  • CTC Loss label length is zero with ICDAR 2013

    CTC Loss label length is zero with ICDAR 2013

    My error is

    2020-11-15 22:28:43.704077: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at ctc_loss_op.cc:168 : Invalid argument: Labels length is zero in batch 25 Traceback (most recent call last): File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/nplab6/anaconda3/envs/tf115-cu100/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Labels length is zero in batch 25 [[{{node CTCLoss}}]] (1) Invalid argument: Labels length is zero in batch 25 [[{{node CTCLoss}}]] [[gradients/Mean_3_grad/Shape/_595]] 0 successful operations. 0 derived errors ignored.

    Error("CTC Loss label length is zero with ICDAR 2013") occurs when using ICDAR2013.

    ICDAR 2013 text label content for recognition training is 72,271,382,312,"Greenstead" (four digits).

    but ICDAR 2015 and 2017 label content is eight digits.

    Why is it different from ICDAR2013 and ICDAR015/2017?

    opened by JaeD-Shin 0
  • tf version

    tf version

    using your pretrained model, restore the weights, errorKey Conv/biases/ExponentialMovingAverage not found in checkpoint [[node save/RestoreV2 which version tensorflow are you using?

    opened by ggsonic 1
Owner
Masao Taketani
Deep Learning research engineer, currently working in Tokyo, Japan. An ex-boxer, who is highly motivated to train one's mind and body.
Masao Taketani
FOTS Pytorch Implementation

News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio

Ning Lu 599 Dec 19, 2022
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of

Dafang He 30 Oct 22, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

text-detection-ctpn Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be

Shaohui Ruan 3.3k Dec 30, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection For more details, please refer to our paper. Citing Please cite the related works

Minghui Liao 102 Jun 29, 2022
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

null 428 Nov 22, 2022
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

This is the official implementation of "Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation". For more details, please

Pengyuan Lyu 309 Dec 6, 2022
TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector Introduction This is an application for scene text detection (TextBoxes++) and recognition (CR

Minghui Liao 930 Jan 4, 2023
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

null 121 Oct 15, 2021
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022