Dense Unsupervised Learning for Video Segmentation
This repository contains the official implementation of our paper:
Dense Unsupervised Learning for Video Segmentation
Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth
To appear at NeurIPS*2021. [paper] [supp] [talk] [example results] [arXiv]
We efficiently learn spatio-temporal correspondences |
Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de
Installation
Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least one Titan X GPUs (12GB) or equivalent is required. The code was primarily developed under PyTorch 1.8 on a single A100 GPU.
The following steps will set up a local copy of the repository.
- Create conda environment:
conda create --name dense-ulearn-vos
source activate dense-ulearn-vos
- Install PyTorch >=1.4 (see PyTorch instructions). For example on Linux, run:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
- Install the dependencies:
pip install -r requirements.txt
- Download the data:
Dataset | Website | Target directory with video sequences |
---|---|---|
YouTube-VOS | Link | data/ytvos/train/JPEGImages/ |
OxUvA | Link | data/OxUvA/images/dev/ |
TrackingNet | Link | data/tracking/train/jpegs/ |
Kinetics-400 | Link | data/kinetics400/video_jpeg/train/ |
The last column in this table specifies a path to subdirectories (relative to the project root) containing images of video frames. You can obviously use a different path structure. In this case, you will need to adjust the paths in data/filelists/
for every dataset accordingly.
- Download filelists:
cd data/filelists
bash download.sh
This will download lists of training and validation paths for all datasets.
Training
We following bash script will train a ResNet-18 model from scratch on one of the four supported datasets (see above):
bash ./launch/train.sh [ytvos|oxuva|track|kinetics]
We also provide our final models for download.
Dataset | Mean J&F (DAVIS-2017) | Link | MD5 |
---|---|---|---|
OxUvA | 65.3 | oxuva_e430_res4.pth (132M) | af541[...]d09b3 |
YouTube-VOS | 69.3 | ytvos_e060_res4.pth (132M) | c3ae3[...]55faf |
TrackingNet | 69.4 | trackingnet_e088_res4.pth (88M) | 3e7e9[...]95fa9 |
Kinetics-400 | 68.7 | kinetics_e026_res4.pth (88M) | 086db[...]a7d98 |
Inference and evaluation
Inference
To run the inference use launch/infer_vos.sh
:
bash ./launch/infer_vos.sh [davis|ytvos]
The first argument selects the validation dataset to use (davis
for DAVIS-2017; ytvos
for YouTube-VOS). The bash variables declared in the script further help to set up the paths for reading the data and the pre-trained models as well as the output directory:
EXP
,RUN_ID
andSNAPSHOT
determine the pre-trained model to load.VER
specifies a suffix for the output directory (in case you would like to experiment with different configurations for label propagation). Please, refer tolaunch/infer_vos.sh
for their usage.
The inference script will create two directories with the result: [res3|res4|key]_vos
and [res3|res4|key]_vis
, where the prefix corresponds to the codename of the output CNN layer used in the evaluation (selected in infer_vos.sh
using KEY
variable). The vos
-directory contains the segmentation result ready for evaluation; the vis
-directory produces the results for visualisation purposes. You can optionally disable generating the visualisation by setting VERBOSE=False
in infer_vos.py
.
Evaluation: DAVIS-2017
Please use the official evaluation package. Install the repository, then simply run:
python evaluation_method.py --task semi-supervised --davis_path data/davis2017 --results_path <path-to-vos-directory>
Evaluation: YouTube-VOS 2018
Please use the official CodaLab evaluation server. To create the submission, rename the vos
-directory to Annotations
and compress it to Annotations.zip
for uploading.
Acknowledgements
We thank PyTorch contributors and Allan Jabri for releasing their implementation of the label propagation.
Citation
We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:
@inproceedings{Araslanov:2021:DUL,
author = {Araslanov, Nikita and Simone Schaub-Mayer and Roth, Stefan},
title = {Dense Unsupervised Learning for Video Segmentation},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {34},
year = {2021}
}