Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Visual Inference Lab @TU Darmstadt

Last update: Dec 26, 2022

Related tags

Overview

Dense Unsupervised Learning for Video Segmentation

This repository contains the official implementation of our paper:

Dense Unsupervised Learning for Video Segmentation
Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth
To appear at NeurIPS*2021. [paper] [supp] [talk] [example results] [arXiv]


We efficiently learn spatio-temporal correspondences without any supervision, and achieve state-of-the-art accuracy of video object segmentation.

Contact: Nikita Araslanov fname.lname (at) visinf.tu-darmstadt.de

Installation

Requirements. To reproduce our results, we recommend Python >=3.6, PyTorch >=1.4, CUDA >=10.0. At least one Titan X GPUs (12GB) or equivalent is required. The code was primarily developed under PyTorch 1.8 on a single A100 GPU.

The following steps will set up a local copy of the repository.

Create conda environment:

conda create --name dense-ulearn-vos
source activate dense-ulearn-vos

Install PyTorch >=1.4 (see PyTorch instructions). For example on Linux, run:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Install the dependencies:

pip install -r requirements.txt

Download the data:

Dataset	Website	Target directory with video sequences
YouTube-VOS	Link	`data/ytvos/train/JPEGImages/`
OxUvA	Link	`data/OxUvA/images/dev/`
TrackingNet	Link	`data/tracking/train/jpegs/`
Kinetics-400	Link	`data/kinetics400/video_jpeg/train/`

The last column in this table specifies a path to subdirectories (relative to the project root) containing images of video frames. You can obviously use a different path structure. In this case, you will need to adjust the paths in data/filelists/ for every dataset accordingly.

Download filelists:

cd data/filelists
bash download.sh

This will download lists of training and validation paths for all datasets.

Training

We following bash script will train a ResNet-18 model from scratch on one of the four supported datasets (see above):

bash ./launch/train.sh [ytvos|oxuva|track|kinetics]

We also provide our final models for download.

Dataset	Mean J&F (DAVIS-2017)	Link	MD5
OxUvA	65.3	oxuva_e430_res4.pth (132M)	`af541[...]d09b3`
YouTube-VOS	69.3	ytvos_e060_res4.pth (132M)	`c3ae3[...]55faf`
TrackingNet	69.4	trackingnet_e088_res4.pth (88M)	`3e7e9[...]95fa9`
Kinetics-400	68.7	kinetics_e026_res4.pth (88M)	`086db[...]a7d98`

Inference and evaluation

Inference

To run the inference use launch/infer_vos.sh:

bash ./launch/infer_vos.sh [davis|ytvos]

The first argument selects the validation dataset to use (davis for DAVIS-2017; ytvos for YouTube-VOS). The bash variables declared in the script further help to set up the paths for reading the data and the pre-trained models as well as the output directory:

EXP, RUN_ID and SNAPSHOT determine the pre-trained model to load.
VER specifies a suffix for the output directory (in case you would like to experiment with different configurations for label propagation). Please, refer to launch/infer_vos.sh for their usage.

The inference script will create two directories with the result: [res3|res4|key]_vos and [res3|res4|key]_vis, where the prefix corresponds to the codename of the output CNN layer used in the evaluation (selected in infer_vos.sh using KEY variable). The vos-directory contains the segmentation result ready for evaluation; the vis-directory produces the results for visualisation purposes. You can optionally disable generating the visualisation by setting VERBOSE=False in infer_vos.py.

Evaluation: DAVIS-2017

Please use the official evaluation package. Install the repository, then simply run:

python evaluation_method.py --task semi-supervised --davis_path data/davis2017 --results_path <path-to-vos-directory>

Evaluation: YouTube-VOS 2018

Please use the official CodaLab evaluation server. To create the submission, rename the vos-directory to Annotations and compress it to Annotations.zip for uploading.

Acknowledgements

We thank PyTorch contributors and Allan Jabri for releasing their implementation of the label propagation.

Citation

We hope you find our work useful. If you would like to acknowledge it in your project, please use the following citation:

@inproceedings{Araslanov:2021:DUL,
  author    = {Araslanov, Nikita and Simone Schaub-Mayer and Roth, Stefan},
  title     = {Dense Unsupervised Learning for Video Segmentation},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  volume    = {34},
  year = {2021}
}

Comments

Request of the performance deviation when training with YouTube-VOS

Thanks for your great work! Could you provide the performance deviation when training with YouTube-VOS? By the way, I'd like to know whether the number reported in the paper is produced by the last training checkpoint or not. Thanks!

opened by JerryX1110 3
train on youtube-vos

Grear work! Thanks for sharing your code!

I use the default training configure of ytvos to train the network. But I only got best performance J&F=65.5 at epoch 490.
Is the default configure supposed to have this performance? How can I get the best performance like your provided checkpoint? I'd appreciate it if you could point out what I did wrong.

opened by colorblank 3
Are you training your own model or a pretrained model (Resnet18) ?

In the readme file you're training the data on resnet, what about your own model ? , Are you contributing in the data preprocessing level or you created your own framework.? If so, why it appears that you are training the data not with you own model(framework), but with the pretrained model (resnet)?

Thanks in advance.

opened by anwarghammam 1
inference error

the inference result is black

Available threads: 12 Loaded 2 sequences Dataloader: filelists/val_ytvos2018_test # 271 filelists/val_ytvos2018_test: no augmentation Sequence 00 | 0062f687f1

..........................................................................................< Sequence 01 | 00f88c4f0a ...................................................................................................................................................................................< 984.928 elapsed: Inference completed

opened by hushuai1 1
TypeError: forward() missing 1 required positional argument: 'frames'

In infer_vos.py, Line 211 and 237, I've got the TypeError: forward() missing 1 required positional argument: 'frames', after I change them as keyward arguments (frames=frames[:1]), the error solved. Is it a bug or something related to my environment? I haven't found the reason yet since the frames is not a keyward arguments in framework.py

By the way, does your evaluation script support Multi-GPU inference? It seems that inference on YouTube-VOS will take a very long time ?

Best,

opened by pansanity666 1
GPU out of memory when running infer_vos.sh/py

I am trying to run bash ./launch/infer_vos.sh ytvos, but am getting errors of "GPU out of memory". Trying to reduce the batch_size down to 8, 4, 2, 1, but still getting the error. I have nVidia K2000, with only 4G GPU memory. Any suggestions/advice how to get around the issue? Thanks.

opened by chyphen7 1
incomplete training samples when using YoutubeVOS dataset

After downloading the filelist.txt for YoutubeVOS, I found that the images are sampled per 5 frames, which is the fully supervised setting, since only one out of every five frames will be annotated. But under self-supervised setting, previous methods (like MAST), are using the full version with all training frames. Have you tried this later one ?

opened by lingorX 0
all data not found when training

I uploaded the training data and put it in the path that exists on the data/filelists, everytime I am facing this error : (this is one example: AssertionError: cfg.DATASET.ROOT/ytvos/train/JPEGImages/003234408d/00000.jpg not found. I also brought a new data that I wanted to train the model with and it keeps giving this error. it seems that it is considering that the data is not there where it is there and I am pretty sure of the path. Any suggestions please ?

opened by anwarghammam 0

Owner

Visual Inference Lab @TU Darmstadt

GitHub

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

Unsupervised Object-Level Representation Learning from Scene Images This repository contains the official PyTorch implementation of the ORL algorithm

55 Dec 3, 2022

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

16 Nov 4, 2020

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

491 Jan 3, 2023

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

90 Oct 19, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

96 Nov 1, 2022

A simple approach to emable dense segmentation with ViT.

Vision Transformer Segmentation Network This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of

5 Jan 3, 2023

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Quasi-Dense Tracking This is the offical implementation of paper Quasi-Dense Similarity Learning for Multiple Object Tracking. We present a trailer th

327 Dec 27, 2022

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Introduction Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021 Prerequisites Python 3.8 and conda, get Conda CUDA 11

51 Dec 3, 2022

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

81 Dec 16, 2022

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

DSM The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Project Website; Datasets li

114 Oct 16, 2022

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

35 Oct 16, 2022

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

42 Dec 9, 2022

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature fo

50 Dec 21, 2022

Dense Unsupervised Learning for Video Segmentation (NeurIPS*2021)

Related tags

Overview

Dense Unsupervised Learning for Video Segmentation

Installation

Training

Inference and evaluation

Inference

Evaluation: DAVIS-2017

Evaluation: YouTube-VOS 2018

Acknowledgements

Citation

Comments

Owner

Visual Inference Lab @TU Darmstadt

[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

A simple approach to emable dense segmentation with ViT.

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

[AAAI2021] The source code for our paper 《Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion》.

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers