Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)
Official implementation of:
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation
Jialian Wu, Liangchen Song, Tiancai Wang, Qian Zhang and Junsong Yuan
In ACM International Conference on Multimedia , Seattle WA, October 12-16, 2020.
Many thanks to mmdetection authors for their great framework!
News
Mar 2, 2021 Update: We test Forest R-CNN on LVIS v1.0 set. Thanks for considering comparing with our method :)
Jan 1, 2021 Update: We propose Forest DetSeg, an extension of original Forest R-CNN. Forest DetSeg extends the proposed method to RetinaNet. While the new work is under review now, the code has been available. More details will come up along with the new paper.
Installation
Please refer to INSTALL.md for installation and dataset preparation.
Forest R-CNN
Inference
# Examples
# single-gpu testing
python tools/test.py configs/lvis/forest_rcnn_r50_fpn.py forest_rcnn_res50.pth --out out.pkl --eval bbox segm
# multi-gpu testing
./tools/dist_test.sh configs/lvis/forest_rcnn_r50_fpn.py forest_rcnn_res50.pth ${GPU_NUM} --out out.pkl --eval bbox segm
Training
# Examples
# single-gpu training
python tools/train.py configs/lvis/forest_rcnn_r50_fpn.py --validate
# multi-gpu training
./tools/dist_train.sh configs/lvis/forest_rcnn_r50_fpn.py ${GPU_NUM} --validate
(Note that we found in our experiments the best result comes up around the 20-th epoch instead of the end of training.)
Forest RetinaNet
Inference
# Examples
# multi-gpu testing
./tools/dist_test.sh configs/lvis/forest_retinanet_r50_fpn_1x.py forest_retinanet_res50.pth ${GPU_NUM} --out out.pkl --eval bbox segm
Training
# Examples
# multi-gpu training
./tools/dist_train.sh configs/lvis/forest_retinanet_r50_fpn_1x.py ${GPU_NUM} --validate
Main Results
Instance Segmentation on LVIS v0.5 val set
AP and AP.b denote the mask AP and box AP. r, c, f represent the rare, common, frequent contegoires.
Method | Backbone | AP | AP.r | AP.c | AP.f | AP.b | AP.b.r | AP.b.c | AP.b.f | download |
---|---|---|---|---|---|---|---|---|---|---|
MaskRCNN | R50-FPN | 21.7 | 6.8 | 22.6 | 26.4 | 21.8 | 6.5 | 21.6 | 28.0 | model |
Forest R-CNN | R50-FPN | 25.6 | 18.3 | 26.4 | 27.6 | 25.9 | 16.9 | 26.1 | 29.2 | model |
MaskRCNN | R101-FPN | 23.6 | 10.0 | 24.8 | 27.6 | 23.5 | 8.7 | 23.1 | 29.8 | model |
Forest R-CNN | R101-FPN | 26.9 | 20.1 | 27.9 | 28.3 | 27.5 | 20.0 | 27.5 | 30.4 | model |
MaskRCNN | X-101-32x4d-FPN | 24.8 | 10.0 | 26.4 | 28.6 | 24.8 | 8.6 | 25.0 | 30.9 | model |
Forest R-CNN | X-101-32x4d-FPN | 28.5 | 21.6 | 29.7 | 29.7 | 28.8 | 20.6 | 29.2 | 31.7 | model |
Instance Segmentation on LVIS v1.0 val set
Method | Backbone | AP | AP.r | AP.c | AP.f | AP.b |
---|---|---|---|---|---|---|
MaskRCNN | R50-FPN | 19.2 | 0.0 | 17.2 | 29.5 | 20.0 |
Forest R-CNN | R50-FPN | 23.2 | 14.2 | 22.7 | 27.7 | 24.6 |
Visualized Examples
Citation
If you find it useful in your research, please consider citing our paper as follows:
@inproceedings{wu2020forest,
title={Forest R-CNN: Large-vocabulary long-tailed object detection and instance segmentation},
author={Wu, Jialian and Song, Liangchen and Wang, Tiancai and Zhang, Qian and Yuan, Junsong},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={1570--1578},
year={2020}}