Propose-Reduce VIS
This repo contains the official implementation for the paper:
Video Instance Segmentation with a Propose-Reduce Paradigm
Huaijia Lin*, Ruizheng Wu*, Shu Liu, Jiangbo Lu, Jiaya Jia
ICCV 2021 | Paper
Installation
Please refer to INSTALL.md.
Demo
You can compute the VIS results for your own videos.
- Download pretrained weight.
- Put example videos in 'demo/inputs'. We support two types of inputs, frames directories or .mp4 files (see example for details).
- Run the following script and obtain the results in demo/outputs.
sh demo.sh
Data Preparation
(1) Download the videos and jsons of val set from YouTube-VIS 2019
(2) Download the videos and jsons of val set from YouTube-VIS 2021
(3) Symlink the corresponding dataset and json files to the data folder
mkdir data
data
├── valset_ytv19 --> /path/to/ytv2019/vos/valid/JPEGImages/
├── valid_ytv19.json --> /path/to/ytv2019/vis/valid.json
├── valset_ytv21 --> /path/to/ytv2021/vis/valid/JPEGImages/
├── valid_ytv21.json --> /path/to/ytv2021/vis/valid/instances.json
Results
We provide the results of several pretrained models and corresponding scripts on different backbones. The results have slight differences from the paper because we make minor modifications to the inference codes.
Download the pretrained models and put them in pretrained folder.
mkdir pretrained
Dataset | Method | Backbone | CA Reduce | AP | AR@10 | download |
---|---|---|---|---|---|---|
YouTube-VIS 2019 | Seq Mask R-CNN | ResNet-50 | 40.8 | 49.9 | model | scripts | |
YouTube-VIS 2019 | Seq Mask R-CNN | ResNet-50 | ✓ | 42.5 | 56.8 | scripts |
YouTube-VIS 2019 | Seq Mask R-CNN | ResNet-101 | 43.8 | 52.7 | model | scripts | |
YouTube-VIS 2019 | Seq Mask R-CNN | ResNet-101 | ✓ | 45.2 | 59.0 | scripts |
YouTube-VIS 2019 | Seq Mask R-CNN | ResNeXt-101 | 47.6 | 56.7 | model | scripts | |
YouTube-VIS 2019 | Seq Mask R-CNN | ResNeXt-101 | ✓ | 48.8 | 62.2 | scripts |
YouTube-VIS 2021 | Seq Mask R-CNN | ResNet-50 | 39.6 | 47.5 | model | scripts | |
YouTube-VIS 2021 | Seq Mask R-CNN | ResNet-50 | ✓ | 41.7 | 54.9 | scripts |
YouTube-VIS 2021 | Seq Mask R-CNN | ResNeXt-101 | 45.6 | 52.9 | model | scripts | |
YouTube-VIS 2021 | Seq Mask R-CNN | ResNeXt-101 | ✓ | 47.2 | 57.6 | scripts |
Evaluation
YouTube-VIS 2019: A json file will be saved in `../Results_ytv19' folder. Please zip and upload to the codalab server.
YouTube-VIS 2021: A json file will be saved in `../Results_ytv21' folder. Please zip and upload to the codalab server.
TODOs
- Results on YouTube-VIS 2021
- Results on DAVIS-UVOS
- Category-Aware Sequence Reduction (CA Reduce)
- Training Codes
Citation
If you find this work useful in your research, please cite:
@article{lin2021video,
title={Video Instance Segmentation with a Propose-Reduce Paradigm},
author={Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
year={2021}
}
Contact
If you have any questions regarding the repo, please feel free to contact me ([email protected]) or create an issue.
Acknowledgments
This repo is based on MMDetection, MaskTrackRCNN, STM, MMCV and COCOAPI.