Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Overview

CMaskTrack R-CNN for OVIS

This repo serves as the official code release of the CMaskTrack R-CNN model on the Occluded Video Instance Segmentation dataset described in the tech report:

Occluded Video Instance Segmentation

Jiyang Qi1,2*, Yan Gao2*, Yao Hu2, Xinggang Wang1, Xiaoyu Liu2,
Xiang Bai1, Serge Belongie3, Alan Yuille4, Philip Torr5, Song Bai2,5 📧
1Huazhong University of Science and Technology 2Alibaba Group 3University of Copenhagen
4Johns Hopkins University 5University of Oxford

In this work, we collect a large-scale dataset called OVIS for Occluded Video Instance Segmentation. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion.

Some annotation examples can be seen below:

2592056 2930398 2932104 3021160

For more details about the dataset, please refer to our paper or website.

Model training and evaluation

Installation

This repo is built based on MaskTrackRCNN. A customized COCO API for the OVIS dataset is also provided.

You can use following commands to create conda env with all dependencies.

conda create -n cmtrcnn python=3.6 -y
conda activate cmtrcnn

conda install -c pytorch pytorch=1.3.1 torchvision=0.2.2 cudatoolkit=10.1 -y
pip install -r requirements.txt
pip install git+https://github.com/qjy981010/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

bash compile.sh

Data preparation

  1. Download OVIS from our website.
  2. Symlink the train/validation dataset to data/OVIS/ folder. Put COCO-style annotations under data/annotations.
mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── OVIS
│   │   ├── train_images
│   │   ├── valid_images
│   │   ├── annotations
│   │   │   ├── annotations_train.json
│   │   │   ├── annotations_valid.json

Training

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on OVIS based on a MSCOCO pretrained checkpoint (mmlab link or google drive).

Run the command below to train the model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py --work_dir ./workdir/cmasktrack_rcnn_r50_fpn_1x_ovis --gpus 4

For reference to arguments such as learning rate and model parameters, please refer to configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py.

Evaluation

Our pretrained model is available for download at Google Drive (comming soon). Run the following command to evaluate the model on OVIS.

CUDA_VISIBLE_DEVICES=0 python test_video.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py [MODEL_PATH] --out [OUTPUT_PATH.pkl] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.pkl.json. OVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license, while the correlation ops is under MIT license.

Acknowledgement

This project is based on mmdetection (commit hash f3a939f), mmcv, MaskTrackRCNN and Pytorch-Correlation-extension. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star and citation 📝 :

@article{qi2021occluded,
    title={Occluded Video Instance Segmentation},
    author={Jiyang Qi and Yan Gao and Yao Hu and Xinggang Wang and Xiaoyu Liu and Xiang Bai and Serge Belongie and Alan Yuille and Philip Torr and Song Bai},
    journal={arXiv preprint arXiv:2102.01558},
    year={2021},
}
You might also like...
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.
the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

PyTorch code for the paper
PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

FIERY This is the PyTorch implementation for inference and training of the future prediction bird's-eye view network as described in: FIERY: Future In

code for CVPR paper Zero-shot Instance Segmentation

Code for CVPR2021 paper Zero-shot Instance Segmentation Code requirements python: python3.7 nvidia GPU pytorch1.1.0 GCC =5.4 NCCL 2 the other python

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`
code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN
Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Comments
  • "RuntimeError: CUDA error: invalid device function" during training

    Some problems occur when I launch the training script. The error information is as follow, Traceback (most recent call last):
    File "train.py", line 103, in
    main()
    File "train.py", line 99, in main
    logger=logger)
    File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 60, in train_detector
    _non_dist_train(model, dataset, cfg, validate=validate, logger = logger) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 124, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/data/liangzhiyuan/projects/CMaskTrack/mmcv/runner/runner.py", line 358, in run epoch_runner(data_loaders[i], **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmcv/runner/runner.py", line 264, in train self.model, data_batch, train_mode=True, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 38, in batch_processor losses = model(**data) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/detectors/base.py", line 85, in forward return self.forward_train(img, img_meta, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/detectors/two_stage.py", line 117, in forward_train proposal_list = self.rpn_head.get_bboxes(*proposal_inputs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/anchor_heads/anchor_head.py", line 232, in get_bboxes scale_factor, cfg, rescale) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/anchor_heads/rpn_head.py", line 80, in get_bboxes_single proposals, _ = nms(proposals, cfg.nms_thr) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/ops/nms/nms_wrapper.py", line 35, in nms return dets[inds, :], inds RuntimeError: CUDA error: invalid device function

    The pytorch version is 1.3.1 with cuda10.1, and the torchvision is 0.2.2.

    opened by liangzhiyuanCV 2
  • ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

    ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

    Following install instructions. Running test_video.py returns:

    File "CMaskTrack-RCNN/mmdet/core/post_processing/init.py", line 1, in from .bbox_nms import multiclass_nms File "CMaskTrack-RCNN/mmdet/core/post_processing/bbox_nms.py", line 3, in from mmdet.ops.nms import nms_wrapper File "CMaskTrack-RCNN/mmdet/ops/init.py", line 1, in from .nms import nms, soft_nms File "CMaskTrack-RCNN/mmdet/ops/nms/init.py", line 1, in from .nms_wrapper import nms, soft_nms File "CMaskTrack-RCNN/mmdet/ops/nms/nms_wrapper.py", line 5, in from mmdet.ops.nms.gpu_nms import gpu_nms ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

    opened by emibr12 1
  • Making intermediate frames available

    Making intermediate frames available

    Hi

    I just discovered the OVIS dataset and it looks really useful to us. Great work. I realized that the frames are annotated at 5 fps and since I would like to use this dataset for training I am planning to interpolate the annotations such that I get a higher frame rate of for example 30 fps (6 times more training data). However, I cannot find all the frames. For example TAO makes all frames available but the annotations are only available on every 30th frame. Could you share the intermediate video frames (even without annotations)? It would also be interesting to analyze if running the methods on a higher frame rate helps to achieve a higher accuracy.

    Thanks in advance and best, Christoph

    opened by 2006pmach 0
Owner
Q . J . Y
A coder from hust
Q . J . Y
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.4k Nov 22, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

null 5 Nov 3, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 143 Nov 16, 2022
Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

HOG Algorithm Implementation Description HOG (Histograms of Oriented Gradients) Algorithm is an algorithm aiming to realize object segmentation (edge

Leo Hsieh 2 Mar 12, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 677 Nov 27, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 200 Nov 17, 2022
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You 92 Nov 22, 2022