MiVOS (CVPR 2021) - Scribble To Mask
Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang
[arXiv] [Paper PDF] [Project Page]
A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!
Overall structure and capabilities
MiVOS | Mask-Propagation | Scribble-to-Mask | |
---|---|---|---|
DAVIS/YouTube semi-supervised evaluation |
|
|
|
DAVIS interactive evaluation |
|
|
|
User interaction GUI tool |
|
|
|
Dense Correspondences |
|
|
|
Train propagation module |
|
|
|
Train S2M (interaction) module |
|
|
|
Train fusion module |
|
|
|
Generate more synthetic data |
|
|
|
Requirements
The package versions shown here are the ones that I used. You might not need the exact versions.
- PyTorch
1.6.0
- torchvision
0.7.0
- opencv-contrib
4.2.0
- davis-interactive (https://github.com/albertomontesg/davis-interactive)
- gitpython for training
- gdown for downloading pretrained models
Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:
pip install opencv-contrib-python gitpython gdown
Pretrained model
Download and put the model in ./saves/
. Alternatively use the provided download_model.py
.
Interactive GUI
python interactive.py --image <image>
Controls:
Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)
Known issues
The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.
Training
Datasets
- Download and extract LVIS training set.
- Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the
download_datasets.py
in Mask-Propagation.
├── lvis
│ ├── lvis_v1_train.json
│ └── train2017
├── Scribble-to-Mask
└── static
├── BIG_small
└── ...
Commands
Use the deeplabv3plus_resnet50
pretrained model provided here.
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>
Credit
Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.
Citation
Please cite our paper if you find this repo useful!
@inproceedings{MiVOS_2021,
title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle={CVPR},
year={2021}
}
Contact: [email protected]