Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Q . J . Y

Last update: Nov 25, 2022

Related tags

Deep Learning CMaskTrack-RCNN

Overview

CMaskTrack R-CNN for OVIS

This repo serves as the official code release of the CMaskTrack R-CNN model on the Occluded Video Instance Segmentation dataset described in the tech report:

Occluded Video Instance Segmentation

Jiyang Qi^1,2*, Yan Gao²*, Yao Hu², Xinggang Wang¹, Xiaoyu Liu²,
Xiang Bai¹, Serge Belongie³, Alan Yuille⁴, Philip Torr⁵, Song Bai^{2,5

📧}
¹Huazhong University of Science and Technology ²Alibaba Group ³University of Copenhagen
⁴Johns Hopkins University ⁵University of Oxford

In this work, we collect a large-scale dataset called OVIS for Occluded Video Instance Segmentation. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

We also present a simple plug-and-play module that performs temporal feature calibration to complement missing object cues caused by occlusion.

Some annotation examples can be seen below:

For more details about the dataset, please refer to our paper or website.

Model training and evaluation

Installation

This repo is built based on MaskTrackRCNN. A customized COCO API for the OVIS dataset is also provided.

You can use following commands to create conda env with all dependencies.

conda create -n cmtrcnn python=3.6 -y
conda activate cmtrcnn

conda install -c pytorch pytorch=1.3.1 torchvision=0.2.2 cudatoolkit=10.1 -y
pip install -r requirements.txt
pip install git+https://github.com/qjy981010/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

bash compile.sh

Data preparation

Download OVIS from our website.
Symlink the train/validation dataset to data/OVIS/ folder. Put COCO-style annotations under data/annotations.

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── OVIS
│   │   ├── train_images
│   │   ├── valid_images
│   │   ├── annotations
│   │   │   ├── annotations_train.json
│   │   │   ├── annotations_valid.json

Training

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on OVIS based on a MSCOCO pretrained checkpoint (mmlab link or google drive).

Run the command below to train the model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py --work_dir ./workdir/cmasktrack_rcnn_r50_fpn_1x_ovis --gpus 4

For reference to arguments such as learning rate and model parameters, please refer to configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py.

Evaluation

Our pretrained model is available for download at Google Drive (comming soon). Run the following command to evaluate the model on OVIS.

CUDA_VISIBLE_DEVICES=0 python test_video.py configs/cmasktrack_rcnn_r50_fpn_1x_ovis.py [MODEL_PATH] --out [OUTPUT_PATH.pkl] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.pkl.json. OVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license, while the correlation ops is under MIT license.

Acknowledgement

This project is based on mmdetection (commit hash f3a939f), mmcv, MaskTrackRCNN and Pytorch-Correlation-extension. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :

@article{qi2021occluded,
    title={Occluded Video Instance Segmentation},
    author={Jiyang Qi and Yan Gao and Yao Hu and Xinggang Wang and Xiaoyu Liu and Xiang Bai and Serge Belongie and Alan Yuille and Philip Torr and Song Bai},
    journal={arXiv preprint arXiv:2102.01558},
    year={2021},
}

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

34 Nov 15, 2022

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

88 Dec 25, 2022

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

FIERY This is the PyTorch implementation for inference and training of the future prediction bird's-eye view network as described in: FIERY: Future In

406 Dec 24, 2022

code for CVPR paper Zero-shot Instance Segmentation

Code for CVPR2021 paper Zero-shot Instance Segmentation Code requirements python: python3.7 nvidia GPU pytorch1.1.0 GCC =5.4 NCCL 2 the other python

86 Dec 13, 2022

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation (CVPR 2021) Introduction PBR is a conceptually simple yet effective

143 Jan 5, 2023

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

CoTr: Efficient 3D Medical Image Segmentation by bridging CNN and Transformer This is the official pytorch implementation of the CoTr: Paper: CoTr: Ef

218 Dec 25, 2022

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

41 Dec 8, 2022

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron Implementation of Geometric Vector Perceptron, a simple circuit with 3d rotation equivariance for learning over large biom

59 Nov 24, 2022

Comments

"RuntimeError: CUDA error: invalid device function" during training

Some problems occur when I launch the training script. The error information is as follow, Traceback (most recent call last):
File "train.py", line 103, in
main()
File "train.py", line 99, in main
logger=logger)
File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 60, in train_detector
_non_dist_train(model, dataset, cfg, validate=validate, logger = logger) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 124, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/data/liangzhiyuan/projects/CMaskTrack/mmcv/runner/runner.py", line 358, in run epoch_runner(data_loaders[i], **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmcv/runner/runner.py", line 264, in train self.model, data_batch, train_mode=True, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/apis/train.py", line 38, in batch_processor losses = model(**data) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/home/liangzhiyuan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/detectors/base.py", line 85, in forward return self.forward_train(img, img_meta, **kwargs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/detectors/two_stage.py", line 117, in forward_train proposal_list = self.rpn_head.get_bboxes(*proposal_inputs) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/anchor_heads/anchor_head.py", line 232, in get_bboxes scale_factor, cfg, rescale) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/models/anchor_heads/rpn_head.py", line 80, in get_bboxes_single proposals, _ = nms(proposals, cfg.nms_thr) File "/data/liangzhiyuan/projects/CMaskTrack/mmdet/ops/nms/nms_wrapper.py", line 35, in nms return dets[inds, :], inds RuntimeError: CUDA error: invalid device function

The pytorch version is 1.3.1 with cuda10.1, and the torchvision is 0.2.2.

opened by liangzhiyuanCV 2
ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

Following install instructions. Running test_video.py returns:

File "CMaskTrack-RCNN/mmdet/core/post_processing/init.py", line 1, in from .bbox_nms import multiclass_nms File "CMaskTrack-RCNN/mmdet/core/post_processing/bbox_nms.py", line 3, in from mmdet.ops.nms import nms_wrapper File "CMaskTrack-RCNN/mmdet/ops/init.py", line 1, in from .nms import nms, soft_nms File "CMaskTrack-RCNN/mmdet/ops/nms/init.py", line 1, in from .nms_wrapper import nms, soft_nms File "CMaskTrack-RCNN/mmdet/ops/nms/nms_wrapper.py", line 5, in from mmdet.ops.nms.gpu_nms import gpu_nms ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

opened by emibr12 1
Making intermediate frames available

Hi

I just discovered the OVIS dataset and it looks really useful to us. Great work. I realized that the frames are annotated at 5 fps and since I would like to use this dataset for training I am planning to interpolate the annotations such that I get a higher frame rate of for example 30 fps (6 times more training data). However, I cannot find all the frames. For example TAO makes all frames available but the annotations are only available on every 30th frame. Could you share the intermediate video frames (even without annotations)? It would also be interesting to analyze if running the methods on a higher frame rate helps to achieve a higher accuracy.

Thanks in advance and best, Christoph

opened by 2006pmach 0

Owner

Q . J . Y

A coder from hust

GitHub

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

22.5k Jan 4, 2023

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

39 Sep 20, 2022

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 3, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

HOG Algorithm Implementation Description HOG (Histograms of Oriented Gradients) Algorithm is an algorithm aiming to realize object segmentation (edge

2 Mar 12, 2022

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

687 Jan 7, 2023

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

39 Nov 23, 2022

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

6 May 3, 2022

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

203 Dec 31, 2022

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You

93 Dec 25, 2022

Code for CMaskTrack R-CNN (proposed in Occluded Video Instance Segmentation)

Related tags

Overview

CMaskTrack R-CNN for OVIS

Occluded Video Instance Segmentation

Model training and evaluation

Installation

Data preparation

Training

Evaluation

License

Acknowledgement

Citation

You might also like...

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

code for CVPR paper Zero-shot Instance Segmentation

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Comments

"RuntimeError: CUDA error: invalid device function" during training

ModuleNotFoundError: No module named 'mmdet.ops.nms.gpu_nms'

Making intermediate frames available

Owner

Q . J . Y

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Implement object segmentation on images using HOG algorithm proposed in CVPR 2005

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination