CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning
This repository is the official PyTorch implementation of CORE-Text, and contains demo training and evaluation scripts.
Requirements
- mmdetection == 2.13.0
- mmcv == 1.3.5
- pyclipper == 1.3.0
Training Demo
Base (Mask R-CNN)
To train Base (Mask R-CNN) on a single node with 4 gpus, run:
#!/usr/bin/env bash
GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}
CONFIG=configs/icdar2017mlt/base.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_base
$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
--nnodes=1 --node_rank=0 --master_addr="localhost" \
--master_port=$PORT \
tools/train.py \
$CONFIG \
--no-validate \
--launcher pytorch \
--work-dir ${WORK_DIR} \
--seed 0
VRM
To train VRM on a single node with 4 gpus, run:
#!/usr/bin/env bash
GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}
CONFIG=configs/icdar2017mlt/vrm.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_vrm
$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
--nnodes=1 --node_rank=0 --master_addr="localhost" \
--master_port=$PORT \
tools/train.py \
$CONFIG \
--no-validate \
--launcher pytorch \
--work-dir ${WORK_DIR} \
--seed 0
CORE
To train CORE (ours) on a single node with 4 gpus, run:
#!/usr/bin/env bash
GPUS=4
PORT=${PORT:-29500}
PYTHON=${PYTHON:-"python"}
# pre-training
CONFIG=configs/icdar2017mlt/core_pretrain.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core_pretrain
$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
--nnodes=1 --node_rank=0 --master_addr="localhost" \
--master_port=$PORT \
tools/train.py \
$CONFIG \
--no-validate \
--launcher pytorch \
--work-dir ${WORK_DIR} \
--seed 0
# training
CONFIG=configs/icdar2017mlt/core.py
WORK_DIR=work_dirs/mask_rcnn_r50_fpn_train_core
$PYTHON -m torch.distributed.launch --nproc_per_node=$GPUS \
--nnodes=1 --node_rank=0 --master_addr="localhost" \
--master_port=$PORT \
tools/train.py \
$CONFIG \
--no-validate \
--launcher pytorch \
--work-dir ${WORK_DIR} \
--seed 0
Evaluation Demo
GPUS=4
PORT=${PORT:-29500}
CONFIG=path/to/config
CHECKPOINT=path/to/checkpoint
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
./tools/test.py $CONFIG $CHECKPOINT --launcher pytorch \
--eval segm \
--not-encode-mask \
--eval-options "jsonfile_prefix=path/to/work_dir/results/eval" "gt_path=data/icdar2017mlt/icdar2017mlt_gt.zip"
Dataset Format
The structure of the dataset directory is shown as following, and we provide the COCO-format label (ICDAR2017_train.json and ICDAR2017_val.json) and the ground truth zipfile (icdar2017mlt_gt.zip) for training and evaluation.
data
└── icdar2017mlt
├── annotations
| ├── ICDAR2017_train.json
| └── ICDAR2017_val.json
├── icdar2017mlt_gt.zip
└── image
├── train
└── val
Results
Our model achieves the following performance on ICDAR 2017 MLT val set. Note that the results are slightly different (~0.1%) from what we reported in the paper, because we reimplement the code based on the open-source mmdetection.
Method | Backbone | Training set | Test set | Hmean | Precision | Recall | Download |
---|---|---|---|---|---|---|---|
Base (Mask R-CNN) | ResNet50 | ICDAR 2017 MLT Train | ICDAR 2017 MLT Val | 0.800 | 0.828 | 0.773 | model | log |
VRM | ResNet50 | ICDAR 2017 MLT Train | ICDAR 2017 MLT Val | 0.812 | 0.853 | 0.774 | model | log |
CORE (ours) | ResNet50 | ICDAR 2017 MLT Train | ICDAR 2017 MLT Val | 0.821 | 0.872 | 0.777 | model | log |
Citation
@inproceedings{9428457,
author={Lin, Jingyang and Pan, Yingwei and Lai, Rongfeng and Yang, Xuehang and Chao, Hongyang and Yao, Ting},
booktitle={2021 IEEE International Conference on Multimedia and Expo (ICME)},
title={Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning},
year={2021},
pages={1-6},
doi={10.1109/ICME51207.2021.9428457}
}