1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

yuxzho

Last update: Dec 25, 2022

Related tags

Deep Learning ICDAR2021_MFD

Overview

About The Project

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection toolbox based on PyTorch. You can click here for more details about this competition.

Method Description

We built our approach on FCOS, A simple and strong anchor-free object detector, with ResNeSt as our backbone, to detect embedded and isolated formulas. We employed ATSS as our sampling strategy instead of random sampling to eliminate the effects of sample imbalance. Moreover, we observed and revealed the influence of different FPN levels on the detection result. Generalized Focal Loss is adopted to our loss. Finally, with a series of useful tricks and model ensembles, our method was ranked 1st in the MFD task.

Random Sampling(left) ATSS(right)

Getting Start

Prerequisites

Linux or macOS (Windows is in experimental support)
Python 3.6+
PyTorch 1.3+
CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
GCC 5+
MMCV

This project is based on MMDetection-v2.7.0, mmcv-full>=1.1.5, <1.3 is needed. Note: You need to run pip uninstall mmcv first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be ModuleNotFoundError.

Installation

Install PyTorch and torchvision following the official instructions , e.g.,
```
pip install pytorch torchvision -c pytorch
```
Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the PyTorch website.

E.g.1 If you have CUDA 10.1 installed under /usr/local/cuda and would like to install PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.
```
pip install pytorch cudatoolkit=10.1 torchvision -c pytorch
```
E.g. 2 If you have CUDA 9.2 installed under /usr/local/cuda and would like to install PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.
```
pip install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
```
If you build PyTorch from source instead of installing the prebuilt pacakge, you can use more CUDA versions such as 9.0.

Install mmcv-full, we recommend you to install the pre-build package as below.

pip install mmcv-full==latest+torch1.6.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html

See here for different versions of MMCV compatible to different PyTorch and CUDA versions. Optionally you can choose to compile mmcv from source by the following command

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full will be installed after this step
cd ..

Or directly run

pip install mmcv-full

Install build requirements and then compile MMDetection.

pip install -r requirements.txt
pip install tensorboard
pip install ensemble-boxes
pip install -v -e .  # or "python setup.py develop"

Usage

Data Preparation

Firstly, Firstly, you need to put the image files and the GT files into two separate folders as below.

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
├── img
    ├── 0001125-page02.jpg
    ├── 0001125-page05.jpg
    ├── ...
    └── 0304067-page08.jpg

Secondly, run data_preprocess.py to get coco format label. Remember to change 'img_path', 'txt_path', 'dst_path' and 'train_path' to your own path.

python ./tools/data_preprocess.py

The new structure of data folder will become,

Tr01
├── gt
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│
├── gt_icdar
│   ├── 0001125-color_page02.txt
│   ├── 0001125-color_page05.txt
│   ├── ...
│   └── 0304067-color_page08.txt
│   
├── img
│   ├── 0001125-page02.jpg
│   ├── 0001125-page05.jpg
│   ├── ...
│   └── 0304067-page08.jpg
│
└── train_coco.json

Finally, change 'data_root' in ./configs/base/datasets/formula_detection.py to your path.

Train

train with single gpu on ResNeSt50

python tools/train.py configs/gfl/gfl_s50_fpn_2x_coco.py --gpus 1 --work-dir ${Your Dir}

train with 8 gpus on ResNeSt101

./tools/dist_train.sh configs/gfl/gfl_s101_fpn_2x_coco.py 8 --work-dir ${Your Dir}

Inference

Run tools/test_formula.py

python tools/test_formula.py configs/gfl/gfl_s101_fpn_2x_coco.py ${checkpoint path}

It will generate a 'result' file at the same level with work-dir in default. You can specify the output path of the result file in line 231.

Model Ensemble

Specify the paths of the results in tools/model_fusion_test.py, and run

python tools/model_fusion_test.py

Evaluation

evaluate.py is the officially provided evaluation tool. Run

python evaluate.py ${GT_DIR} ${CSV_Pred_File}

Note: GT_DIR is the path of the original data folder which contains both the image and the GT files. CSV_Pred_File is the path of the final prediction csv file.

Result

Train on Tr00, Tr01, Va00 and Va01, and test on Ts01. Some results are as follows, F1-score

Method	embedded	isolated	total
ResNeSt50-DCN	95.67	97.67	96.03
ResNeSt101-DCN	96.11	97.75	96.41

Our final result, that was ranked 1st place in the competition, was obtained by fusing two Resnest101+GFL models trained with two different random seeds and all labeled data. The final ranking can be seen in our technical report.

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{zhong20211st,
  title={1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection},
  author={Zhong, Yuxiang and Qi, Xianbiao and Li, Shanjun and Gu, Dengyi and Chen, Yihao and Ning, Peiyang and Xiao, Rong},
  journal={arXiv preprint arXiv:2107.05534},
  year={2021}
}
@article{GFLli2020generalized,
  title={Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection},
  author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
  journal={arXiv preprint arXiv:2006.04388},
  year={2020}
}
@inproceedings{ATSSzhang2020bridging,
  title={Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection},
  author={Zhang, Shifeng and Chi, Cheng and Yao, Yongqiang and Lei, Zhen and Li, Stan Z},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9759--9768},
  year={2020}
}
@inproceedings{FCOStian2019fcos,
  title={Fcos: Fully convolutional one-stage object detection},
  author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9627--9636},
  year={2019}
}
@article{solovyev2019weighted,
  title={Weighted boxes fusion: ensembling boxes for object detection models},
  author={Solovyev, Roman and Wang, Weimin and Gabruseva, Tatiana},
  journal={arXiv preprint arXiv:1910.13302},
  year={2019}
}
@article{ResNestzhang2020resnest,
  title={Resnest: Split-attention networks},
  author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Lin, Haibin and Zhang, Zhi and Sun, Yue and He, Tong and Mueller, Jonas and Manmatha, R and others},
  journal={arXiv preprint arXiv:2004.08955},
  year={2020}
}
@article{MMDetectionchen2019mmdetection,
  title={MMDetection: Open mmlab detection toolbox and benchmark},
  author={Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Xu, Jiarui and others},
  journal={arXiv preprint arXiv:1906.07155},
  year={2019}
}

Acknowledgements

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around mAP@23 with TAO test set (based on our estimation).

79 Oct 8, 2022

Code for 1st place solution in Sleep AI Challenge SNU Hospital

Sleep AI Challenge SNU Hospital 2021 Code for 1st place solution for Sleep AI Challenge (Note that the code is not fully organized) Refer to the notio

13 Jan 3, 2022

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

37 Oct 30, 2022

The 3rd place solution for competition

The 3rd place solution for competition "Lyft Motion Prediction for Autonomous Vehicles" at Kaggle Team behind this solution: Artsiom Sanakoyeu [Homepa

104 Nov 22, 2022

Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

62 Dec 28, 2022

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

2 Oct 30, 2022

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Sartorius - Cell Instance Segmentation https://www.kaggle.com/c/sartorius-cell-instance-segmentation Environment setup Build docker image bash .dev_sc

68 Dec 9, 2022

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

41 Dec 14, 2022

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

InstanceRefer InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

63 Dec 7, 2022

Comments

__missing__ raise Keyerror (name)

您好，您的repo和mmd官方代码我都试过了，都存在这个问题。数据集有什么特殊要求吗，正常的cocodataset style有问题 Traceback (most recent call last): File "tools/train.py", line 188, in main() File "tools/train.py", line 164, in main datasets = [build_dataset(cfg.data.train)] File "/home/a/.local/lib/python3.7/site-packages/mmdet/datasets/builder.py", line 58, in build_dataset elif cfg['type'] == 'ConcatDataset': File "/home/a/.local/lib/python3.7/site-packages/mmcv/utils/config.py", line 35, in missing raise KeyError(name) KeyError: 'type'

opened by SivenLSP 15
After 24 epoches trained, its loss still can't reach convergence

`2022-11-20 01:35:49,834 - mmdet - INFO - Epoch [24][2000/2174] lr: 1.000e-05, eta: 0:05:54, time: 2.101, data_time: 0.014, memory: 15795, loss_cls: 0.2582, loss_bbox: 1.2449, loss_dfl: 0.4970, loss: 2.0000 2022-11-20 01:37:34,752 - mmdet - INFO - Epoch [24][2050/2174] lr: 1.000e-05, eta: 0:04:12, time: 2.098, data_time: 0.014, memory: 15795, loss_cls: 0.2564, loss_bbox: 1.2549, loss_dfl: 0.4957, loss: 2.0070 2022-11-20 01:39:19,524 - mmdet - INFO - Epoch [24][2100/2174] lr: 1.000e-05, eta: 0:02:30, time: 2.095, data_time: 0.014, memory: 15795, loss_cls: 0.2651, loss_bbox: 1.2299, loss_dfl: 0.4924, loss: 1.9874 2022-11-20 01:41:04,317 - mmdet - INFO - Epoch [24][2150/2174] lr: 1.000e-05, eta: 0:00:48, time: 2.096, data_time: 0.013, memory: 15795, loss_cls: 0.2600, loss_bbox: 1.2279, loss_dfl: 0.4954, loss: 1.9832 2022-11-20 01:41:54,811 - mmdet - INFO - Saving checkpoint at 24 epochs 2022-11-20 01:44:58,555 - mmdet - INFO - Evaluating bbox... 2022-11-20 01:45:01,473 - mmdet - INFO - Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.014 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.014 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.014 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.016

2022-11-20 01:45:01,502 - mmdet - INFO - Exp name: gfl_s50_fpn_2x_coco.py 2022-11-20 01:45:01,502 - mmdet - INFO - Epoch(val) [24][380] bbox_mAP: 0.0000, bbox_mAP_50: 0.0000, bbox_mAP_75: 0.0000, bbox_mAP_s: 0.0000, bbox_mAP_m: 0.0000, bbox_mAP_l: 0.0010, bbox_mAP_copypaste: 0.000 0.000 0.000 0.000 0.000 0.001 `

opened by GulpFire 6
RuntimeError: CUDA OOM / Bad Training Performance after reducing crop_size
Hello @Yuxiang1995 ,

I am attempting to train your Network on a single gpu. But then I get this Error:

RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 4.00 GiB total capacity; 1.48 GiB already allocated; 319.91 MiB free; 1.88 GiB reserved in total by PyTorch)

I also tried it with a 10 GiB GPU but still the same error. ... well I am able to train it when I reduce the crop_size in conigs/_base_/datasets/formula_detection.py But it seem the model doesnt learn anything, since the loss doesnt get smaller.
... and I saw in your presentation that the large crop_size is a feature of your model.

Can you give me hint how to sucessfully train the model on a single gpu, i.e get rid of the CUDA OOM Error ?
opened by AaaJrwp4 2

what is the validation set?

In formula_detection.py, the annotation file is under Ts01, but the images are under Tr01. Also, the training set contains Va00, Va01. Shouldn't they belong to the validation set?

    val=dict(
        type=dataset_type,
        ann_file=data_root + 'Ts01/train_coco_sdk4.json',
        img_prefix=data_root + 'Tr01/img/',
        classes=classes,
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'Ts10/train_coco_sdk4.json',
        img_prefix=data_root + 'Tr10/img/',
        classes=classes,
        pipeline=test_pipeline))

opened by cpwan 1

Owner

yuxzho

GitHub

Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

87 Dec 27, 2022

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Related tags

Overview

About The Project

Method Description

Getting Start

Prerequisites

Installation

Usage

Data Preparation

Train

Inference

Model Ensemble

Evaluation

Result

License

Citations

Acknowledgements

You might also like...

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Code for 1st place solution in Sleep AI Challenge SNU Hospital

Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

The 3rd place solution for competition

Winning solution of the Indoor Location & Navigation Kaggle competition

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Comments

__missing__ raise Keyerror (name)

After 24 epoches trained, its loss still can't reach convergence

RuntimeError: CUDA OOM / Bad Training Performance after reducing crop_size

what is the validation set?

Owner

yuxzho

Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task

1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

QQ Browser 2021 AI Algorithm Competition Track 1 1st Place Program

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

My 1st place solution at Kaggle Hotel-ID 2021

1st place solution in CCF BDCI 2021 ULSEG challenge

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

missing raise Keyerror (name)