FOTS: Fast Oriented Text Spotting with a Unified Network
I am still working on this repo. updates and detailed instructions are coming soon!
Table of Contens
TensorFlow Versions
As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.
Other Requirements
GCC >= 6
Trained Models
- tmp pre-trained model
- trained model comming soon
Datasets
- pre-training
Synth800k(The dataset is only available for non-commercial research and educational purposes) - finetuning
ICDAR 2015, 2017MLT, 2013
Train
Pre-train with SynthText
- Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.
cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz
-
Download Synth800k dataset and place it at
data/SynthText/
dir to pre-train the whole net. -
Transform(Pre-process) the SynthText data into the ICDAR data format.
python data_provider/SynthText2ICDAR.py
- Train with SynthText for 10 epochs(with 1 GPU).
python train.py \
--max_steps=715625 \
--gpu_list='0' \
--checkpoint_path=ckpt/synthText_10eps/ \
--pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
--training_img_data_dir=data/SynthText/ \
--training_gt_data_dir=data/SynthText/ \
--icdar=False \
- Visualize pre-pretraining progress with TensorBoard.
tensorboard --logdir=ckpt/synthText_10eps/
Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013
(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/
)
-
Combine ICDAR data before training.
- Place ICDAR data under
tmp/
foler. - Run the following script to combine the data.
python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
- Place ICDAR data under
-
ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)
- Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).
python train.py \ --gpu_list='0' \ --checkpoint_path=ckpt/ICDAR17MLT/ \ --pretrained_model_path=ckpt/synthText_10eps/ \ --train_stage=0 \ --training_img_data_dir=data/ICDAR17MLT/imgs/ \ --training_gt_data_dir=data/ICDAR17MLT/gts/
-
ICDAR 2015
- Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).
python train.py \ --gpu_list='0' \ --checkpoint_path=ckpt/ICDAR15/ \ --pretrained_model_path=ckpt/ICDAR17MLT/ \ --training_img_data_dir=data/ICDAR15+13/imgs/ \ --training_gt_data_dir=data/ICDAR15+13/gts/
-
ICDAR 2013(horizontal text only)
- Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).
python train.py \ --gpu_list='0' \ --checkpoint_path=ckpt/ICDAR13/ \ --pretrained_model_path=ckpt/ICDAR17MLT/ \ --training_img_data_dir=data/ICDAR13/imgs/ \ --training_gt_data_dir=data/ICDAR13/gts/
Test
Place some images in test_imgs/
dir and specify a trained checkpoint path to see the test result.
python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]
References
- Paper
- Repos