Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Overview

Propose-Reduce VIS

This repo contains the official implementation for the paper:

Video Instance Segmentation with a Propose-Reduce Paradigm

Huaijia Lin*, Ruizheng Wu*, Shu Liu, Jiangbo Lu, Jiaya Jia

ICCV 2021 | Paper

TeaserImage

Installation

Please refer to INSTALL.md.

Demo

You can compute the VIS results for your own videos.

  1. Download pretrained weight.
  2. Put example videos in 'demo/inputs'. We support two types of inputs, frames directories or .mp4 files (see example for details).
  3. Run the following script and obtain the results in demo/outputs.
sh demo.sh

Data Preparation

(1) Download the videos and jsons of val set from YouTube-VIS 2019

(2) Download the videos and jsons of val set from YouTube-VIS 2021

(3) Symlink the corresponding dataset and json files to the data folder

mkdir data
data
├── valset_ytv19 --> /path/to/ytv2019/vos/valid/JPEGImages/ 
├── valid_ytv19.json --> /path/to/ytv2019/vis/valid.json
├── valset_ytv21 --> /path/to/ytv2021/vis/valid/JPEGImages/ 
├── valid_ytv21.json --> /path/to/ytv2021/vis/valid/instances.json

Results

We provide the results of several pretrained models and corresponding scripts on different backbones. The results have slight differences from the paper because we make minor modifications to the inference codes.

Download the pretrained models and put them in pretrained folder.

mkdir pretrained
Dataset Method Backbone CA Reduce AP AR@10 download
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 40.8 49.9 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-50 42.5 56.8 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 43.8 52.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNet-101 45.2 59.0 scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 47.6 56.7 model | scripts
YouTube-VIS 2019 Seq Mask R-CNN ResNeXt-101 48.8 62.2 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 39.6 47.5 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNet-50 41.7 54.9 scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 45.6 52.9 model | scripts
YouTube-VIS 2021 Seq Mask R-CNN ResNeXt-101 47.2 57.6 scripts

Evaluation

YouTube-VIS 2019: A json file will be saved in `../Results_ytv19' folder. Please zip and upload to the codalab server.

YouTube-VIS 2021: A json file will be saved in `../Results_ytv21' folder. Please zip and upload to the codalab server.

TODOs

Citation

If you find this work useful in your research, please cite:

@article{lin2021video,
  title={Video Instance Segmentation with a Propose-Reduce Paradigm},
  author={Lin, Huaijia and Wu, Ruizheng and Liu, Shu and Lu, Jiangbo and Jia, Jiaya},
  booktitle={IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Contact

If you have any questions regarding the repo, please feel free to contact me ([email protected]) or create an issue.

Acknowledgments

This repo is based on MMDetection, MaskTrackRCNN, STM, MMCV and COCOAPI.

You might also like...
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation
[2021][ICCV][FSNet] Full-Duplex Strategy for Video Object Segmentation

Full-Duplex Strategy for Video Object Segmentation (ICCV, 2021) Authors: Ge-Peng Ji, Keren Fu, Zhe Wu, Deng-Ping Fan*, Jianbing Shen, & Ling Shao This

Official implementation of the ICCV 2021 paper
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Parasite: a tool allowing you to compress and decompress files, to reduce their size
Parasite: a tool allowing you to compress and decompress files, to reduce their size

🦠 Parasite 🦠 Parasite is a tool written in Python3 allowing you to "compress" any file, reducing its size. ⭐ Features ⭐ + Fast + Good optimization,

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)
Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

DSIG Deep Structured Instance Graph for Distilling Object Detectors Authors: Yixin Chen, Pengguang Chen, Shu Liu, Liwei Wang, Jiaya Jia. [pdf] [slide]

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Comments
  • demo

    demo

    Hello, I encountered the following error while running the demo:

    Traceback (most recent call last): File "tools/demo.py", line 192, in main() File "tools/demo.py", line 189, in main single_test(args, infer_paradigm) File "tools/demo.py", line 117, in single_test seq_masks, seq_scores = infer_paradigm(buffer_data, buffer_meta) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/ProposeReduce/propose_reduce.py", line 37, in forward self.init_buffer(buffer_data) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/ProposeReduce/propose_reduce.py", line 77, in init_buffer cur_x, cur_x_ori = self.model(return_loss=False, rescale=False, **data) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/../mmdet/models/detectors/base.py", line 92, in forward return self.forward_test(img, img_meta, **kwargs) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/../mmdet/models/detectors/base.py", line 84, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/../mmdet/models/detectors/seq_mask_rcnn.py", line 82, in simple_test x, x_ori = self.extract_feat(img, ret_ori=True) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/../mmdet/models/detectors/seq_mask_rcnn.py", line 55, in extract_feat x = self.backbone(img) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/zhangsai/propose_reduce/ProposeReduce/tools/../mmdet/models/backbones/resnet.py", line 371, in forward x = self.relu(x) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 99, in forward return F.relu(input, inplace=self.inplace) File "/home/zhangsai/.conda/envs/propose_reduce/lib/python3.7/site-packages/torch/nn/functional.py", line 941, in relu result = torch.relu_(input) RuntimeError: CUDA error: no kernel image is available for execution on the device

    I configured the environment according to the requirements of INSTALL.md

    Hope to get your reply! Thank you very much!!!!!

    opened by qzsrh 26
  • how to calculate classification score in non-key frame?

    how to calculate classification score in non-key frame?

    hi, paper says:“For each sequence set Sk , We have its corresponding mask M(Sk) and classification score C(Sk)”. But I don't find anywhere describing how to calculate classification in the non-key frame.

    The framework of seq mask r-cnn only calculate the mask of non-key frame ,not cls.

    Thank you!

    opened by bxcxa 2
  • GPU memory

    GPU memory

    HI, good work guys ! Before trying out your repo I would like to know what is the GPU memory needed for inference and training ? HOw much time is needed to train one epoch ?

    Thank you

    opened by orangeRobot990 1
Owner
DV Lab
Deep Vision Lab
DV Lab
PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric (ICCV 2021)

PrimitiveNet Source code for the paper: Jingwei Huang, Yanfeng Zhang, Mingwei Sun. [PrimitiveNet: Primitive Instance Segmentation with Local Primitive

Jingwei Huang 47 Dec 6, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Qing Guo 146 Dec 31, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 5, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 268 Jan 9, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

null 32 Sep 25, 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

null 3 Dec 5, 2022
ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

ByteTrack_ReID ByteTrack is the SOTA tracker in MOT benchmarks with strong detector YOLOX and a simple association strategy only based on motion infor

Han GuangXin 46 Dec 29, 2022