Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

HYOJINPARK

Last update: Jan 1, 2023

Related tags

Deep Learning Reuse_VOS

Overview

Training Script for Reuse-VOS

This code implementation of CVPR 2021 paper : Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Hard case (Ours, FRTM)

(Ours)

(FRTM)

Easy case (Ours, FRTM)

(Ours)

(FRTM)

Requirement

python package

torch
python-opencv
skimage
easydict

GPU support

GPU Memory >= 11GB (RN18)
CUDA >= 10.0
pytorch >= 1.4.0

Datasets

DAVIS

To test the DAVIS validation split, download and unzip the 2017 480p trainval images and annotations here.

/path/DAVIS
|-- Annotations/
|-- ImageSets/
|-- JPEGImages/

YouTubeVOS

To test our validation split and the YouTubeVOS challenge 'valid' split, download YouTubeVOS 2018 and place it in this directory structure:

/path/ytvos2018
|-- train/
|-- train_all_frames/
|-- valid/
`-- valid_all_frames/

Release

DAVIS

model	Backbone	Training set	J & F 17	J & F 16	link
G-FRTM (t=1)	Resnet18	Youtube-VOS + DAVIS	71.7	80.9	Google Drive
G-FRTM (t=0.7)	Resnet18	Youtube-VOS + DAVIS	69.9	80.5	same pth
G-FRTM (t=1)	Resnet101	Youtube-VOS + DAVIS	76.4	84.3	Google Drive
G-FRTM (t=0.7)	Resnet101	Youtube-VOS + DAVIS	74.3	82.3	same pth

Youtube-VOS

model	Backbone	Training set	G	J-S	J-Us	F-S	F-Us	link
G-FRTM (t=1)	Resnet18	Youtube-VOS	63.8	68.3	55.2	70.6	61.0	Google Drive
G-FRTM (t=0.8)	Resnet18	Youtube-VOS	63.4	67.6	55.8	69.3	60.9	same pth
G-FRTM (t=0.7)	Resnet18	Youtube-VOS	62.7	67.1	55.2	68.2	60.1	same pth

We initialize orignal-FRTM layers from official FRTM repository weight for Youtube-VOS benchmark. S = Seen, Us = Unseen

Target model cache

Here is the cache file we used for ResNet18 file

Run

Train

Open train.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python train.py --name <session-name> --ftext resnet18 --dset all --dev cuda:0

--name is the name of save_dir name of current train --ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2017, ytvos2018 or all ("all" really means "both"). --dev is the name of the device to train on. --m1 is the margin1 for training reuse gate, and we use 1.0 for DAVIS benchmark and 0.5 for Youtube-VOS benchmark. --m2 is the margin2 for training reuse gate, and we use 0.

Replace "session-name" with whatever you like. Subdirectories with this name will be created under your checkpoint and tensorboard paths.

Eval

Open eval.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python evaluate.py --ftext resnet18 --dset dv2017val --dev cuda:0

--ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2016val, dv2017val, yt2018jjval, yt2018val or yt2018valAll --dev is the name of the device to eval on. --TH Threshold for tau default= 0.7

The inference results will be saved at ${ROOT}/${result} . It is better to check multiple pth file for good accuracy.

Acknowledgement

This codebase borrows the code and structure from official FRTM repository. We are grateful to Facebook Inc. with valuable discussions.

Reference

The codebase is built based on following works

@misc{park2020learning,
      title={Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation}, 
      author={Hyojin Park and Jayeon Yoo and Seohyeong Jeong and Ganesh Venkatesh and Nojun Kwak},
      year={2020},
      eprint={2012.11655},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

What exactly do templates refer to in template matching

I would like to ask the following questions: 1. What exactly do templates refer to in template matching; 2. Refine-translator is added after backbone?

opened by longmalongma 1
How to inference my customed datasets?

Hello. Inference on your provided default datset is successful yet.

BUT It failed on the another DAVIS dataset I download elsewhere. The file structure is a litte different so I modified the datasets file. It turns out that the results are all black except for the first frame. I check the img and annotation input. The annotations are identical and imgs are a litte different (No difference to the human eye).

Then I tried on other datasets, It only worked well on FBMS (another public video segmentation datasets, the resulsts are normal segmentaion mask) . As for other datasets, such as DAVIS, MCL. The results are almost black, some frames has valid mask results.

It seems that the performace is sensetive to the selection of datasets, Does there exist some special settings of datasets? It is hard to well understand the code details in a short time, expect for you help.

opened by zwbx 0

Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Related tags

Overview

Training Script for Reuse-VOS

Requirement

python package

GPU support

Datasets

DAVIS

YouTubeVOS

Release

DAVIS

Youtube-VOS

Target model cache

Run

Train

Eval

Acknowledgement

Reference

You might also like...

Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Comments

What exactly do templates refer to in template matching

How to inference my customed datasets?

Owner

HYOJINPARK

Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

Semi-Supervised Learning, Object Detection, ICCV2021

Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels