The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Last update: Jan 5, 2023

Related tags

Deep Learning ViTAE-Transformer-Scene-Text-Detection

Overview

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

This is the repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection". I3CL with ViTAEv2, ResNet50 and ResNet50 w/ RegionCL backbone are included.

Updates

[2022/04/13] Publish links of training datasets.

[2022/04/11] Add SSL training code for this implementation.

[2022/04/09] The training code for ICDAR2019 ArT dataset is uploaded. Private github repo temporarily.

Other applications of ViTAE Transformer: Image Classification | Object Detection | Sementic Segmentation | Animal Pose Estimation | Matting | Remote Sensing

Introduction

Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, i.e., 1) fracture detections at the gaps in a text instance; and 2) inaccurate detections of arbitrary-shaped text instances with diverse background context. To address these issues, we propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL). Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields, which is able to collaboratively learn better character and gap feature representations at local and long ranges inside a text instance. To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances and a global context module to exploit the semantic context from the shared background, which are able to collaboratively learn more discriminative text feature representation. In this way, I3CL can effectively exploit the intra- and inter-instance dependencies together in a unified end-to-end trainable framework. Besides, to make full use of the unlabeled data, we design an effective semi-supervised learning method to leverage the pseudo labels via an ensemble strategy. Without bells and whistles, experimental results show that the proposed I3CL sets new state-of-the-art results on three challenging public benchmarks, i.e., an F-measure of 77.5% on ArT, 86.9% on Total-Text, and 86.4% on CTW-1500. Notably, our I3CL with the ResNeSt-101 backbone ranked the 1st place on the ArT leaderboard.

Results

Example results from paper.

Evaluation results of I3CL with different backbones on ArT. Note that: (1) I3CL with ViTAE only adopts one training stage with LSVT+MLT19+ArT training datasets in this repo. ResNet series adopt three training stages, i.e, pre-train on SynthText, mix-train on ReCTS+RCTW+LSVT+MLT19+ArT and lastly finetune on LSVT+MLT19+ArT. (2) Origin implementation of ResNet series is based on Detectron2. The results and model links of ResNet-50 will be updated soon in this implementation.

Backbone	Model Link	Training Data	Recall	Precision	F-measure
ViTAEv2-S [this repo]	OneDrive/ 百度网盘 (pw:w754)	LSVT,MLT19,ArT	75.4	82.8	78.9
ResNet-50 [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	71.3	82.7	76.6
ResNet-50 w/ RegionCL(finetuning) [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	72.6	81.9	77.0
ResNet-50 w/ RegionCL(w/o finetuning) [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	73.5	81.6	77.3
ResNeXt-101 [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	74.1	85.5	79.4
ResNeSt-101 [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	75.1	86.3	80.3
ResNeXt-151 [paper]	-	SynthText,ReCTS,RCTW,LSVT,MLT19,ArT	74.9	86.0	80.1

Usage

Install

Prerequisites：

Linux (macOS and Windows are not tested)

Python >= 3.6

Pytorch >= 1.8.1 (For ViTAE implementation). Please make sure your compilation CUDA version and runtime CUDA version match.

GCC >= 5

MMCV (We use mmcv-full==1.4.3)

Create a conda virtual environment and activate it. Note that this implementation is based on mmdetection 2.20.0 version.
Install Pytorch and torchvision following official instructions.

Install mmcv-full and timm. Please refer to mmcv to install the proper version. For example:

pip install mmcv-full==1.4.3 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install timm

Clone this repository and then install it:

git clone https://github.com/ViTAE-Transformer/ViTAE-Transformer-Scene-Text-Detection.git
cd ViTAE-Transformer-Scene-Text-Detection
pip install -r requirements/build.txt
pip install -r requirements/runtime.txt
pip install -v -e .

Preparation

Model:

To train I3CL model yourself, please download the pretrained ViTAEv2 used in this implementation from here: OneDrive | 百度网盘 (pw:petb). ResNet-50 w/ RegionCL(finetuning): OneDrive | 百度网盘 (pw:y598). ResNet-50 w/ RegionCL(w/o finetuning): OneDrive | 百度网盘 (pw:cybh). For backbone initialization, please put them in pretrained_model/ViTAE or pretrained_model/RegionCL.
Full I3CL model with ViTAE backbone trained on ArT can be downloaded and put in pretrained_model/I3CL.

Data

Coco format training datasets are utilized. Some offline augmented ArT training datasets are used. lsvt-test is only used to train SSL(Semi-Supervised Learning) model in paper. Files named train_lossweight.json are the provided pseudo-label for SSL training. You can download correspoding datasets in config file from here and put them in data/:

Dataset	Link (OneDrive)	Link (Baidu Wangpan百度网盘)
art	Link	Link (pw:etif)
art_light	Link	Link (pw:mzrk)
art_noise	Link	Link (pw:scxi)
art_sig	Link	Link (pw:cdk8)
lsvt	Link	Link (pw:wly0)
lsvt_test	Link	Link (pw:8ha3)
icdar2019_mlt	Link	Link (pw:hmnj)
rctw	Link	Link (pw:ngge)
rects	Link	Link (pw:y00o)

The file structure should look like:

|- data
    |- art
    |   |- train_images
    |   |    |- *.jpg
    |   |- test_images
    |   |    |- *.jpg
    |   |- train.json
    |   |- train_lossweight.json
    |- art_light
    |   |- train_images
    |   |    |- *.jpg
    |   |- train.json
    |   |- train_lossweight.json
    ......
    |- lsvt
    |   |- train_images1
    |   |    |- *.jpg
    |   |- train_images2
    |   |    |- *.jpg
    |   |- train1.json
    |   |- train1_lossweight.json
    |   |- train2.json
    |   |- train2_lossweight.json
    |- lsvt_test
    |   |- train_images
    |   |    |- *.jpg
    |   |- train_lossweight.json
    ......

Training

Distributed training with 4GPUs for ViTAE backbone:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_vitae_fpn/i3cl_vitae_fpn_ms_train.py --launcher pytorch --work-dir ./out_dir/${your_dir}

Distributed training with 4GPUs for ResNet50 backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_finetune/

Distributed training with 4GPUs for ResNet50 w/ RegionCL backbone:

stage1:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_pretrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_pretrain/

stage2:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_mixtrain.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_mixtrain/

stage3:

python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \
configs/i3cl_r50_regioncl_fpn/i3cl_r50_fpn_ms_finetune.py --launcher pytorch --work-dir ./out_dir/art_r50_regioncl_finetune/

Note:

If the GPU memory is limited during training I3CL ViTAE backbone, please adjust img_scale in configuration file. The maximum scale set to (800, 1333) is proper for V100(16G) while there is little effect on the performance actually. Please change the training scale according to your condition.

Inference

For example, use our trained I3CL model to get inference results on ICDAR2019 ArT test set with visualization images, txt format records and the json file for testing submission, please run:

python demo/art_demo.py --checkpoint pretrained_model/I3CL/vitae_epoch_12.pth --score-thr 0.45 --json_file art_submission.json

Note:

Upload the saved json file to ICDAR2019-ArT evaluation website for Recall, Precision and F1 evaluation results. Change the path for saving visualizations and txt files if needed.

Citation

This project is for research purpose only.

If you are interested in our work, please consider citing our work. Arxiv

Please post issues to let us know if you encounter any problems.

Acknowledgement

Thanks for mmdetection.

You might also like...

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

19 Nov 30, 2022

This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie

20 Dec 19, 2022

Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation Source code repo for paper Zero-Shot Information Extraction as a Unified Text

88 Dec 31, 2022

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

[ICCV2021] Learning to Regress Bodies from Images using Differentiable Semantic Rendering Getting Started DSR has been implemented and tested on Ubunt

83 Nov 27, 2022

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

12 Nov 22, 2022

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

DiLBERT Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP" Pretrained Model The pretrained model presented in the paper is

2 Dec 15, 2022

Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Extrapolating from a Single Image to a Thousand Classes using Distillation by Yuki M. Asano* and Aaqib Saeed* (*Equal Contribution) Extrapolating from

16 Nov 4, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

Comments

如何训练自己的模型？

我训练自己的数据，也要下载这些数据集吗？ Some offline augmented ArT training datasets are used. lsvt-test is only used to train SSL(Semi-Supervised Learning) model in paper. Files named train_lossweight.json are the provided pseudo-label for SSL training. You can download correspoding datasets in config file from here and put them in data/:

opened by zhaoguoqing12 22
复现结果和表格相差一个多点？

作者你好，我用python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 tools/train.py \ configs/i3cl_vitae_fpn/i3cl_vitae_fpn_ms_train.py --launcher pytorch --work-dir ./out_dir/${your_dir}复现了下vitae的结果，训练数据用的是art_light，art_sig，art_noise，lsvt以及mlt19，和项目中一致，配置都是完全一样，但是结果相差一个多点，请问会是什么原因呢

opened by 1700127 14
作者您好，相似区域检测不出来文字和适用于中文文本检测的论文

作者您好，我在阅读您的论文和代码中收获了很多，这篇论文带给我很大帮助，但我在测试预测图像时候遇到一个问题：我将视频转成图像，用您的算法进行推理检测图中的文字，相邻的某几帧图像一开始在该区域可以检测到文字，过了几帧后突然检测不出来文字（这个文字区域和之前检测到的文字区域内容大小都一模一样），但后面又能检测出来的情况，请问这个是因为用的训练数据集的原因，还是其他原因呐？麻烦您能在空闲时候回复下，十分感谢！还有一个问题：请问您有检测中英文文本的其他论文推荐吗？

opened by aibohang 1

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".

Related tags

Overview

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection

Updates

Introduction

Results

Usage

Install

Preparation

Training

Inference

Citation

Acknowledgement

You might also like...

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Official Repo for ICCV2021 Paper: Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Repo for the paper Extrapolating from a Single Image to a Thousand Classes using Distillation

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Comments

如何训练自己的模型？

复现结果和表格相差一个多点？

作者您好，相似区域检测不出来文字和适用于中文文本检测的论文

Owner

The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Repo for the Video Person Clustering dataset, and code for the associated paper

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .