Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

haochen wang

Last update: Dec 11, 2022

Related tags

Text Data & NLP Training-Code-of-STM

Overview

Training-code-of-STM

This repository fully reproduces Space-Time Memory Networks

Performance on Davis17 val set&Weights

	backbone	training stage	training dataset	J&F	J	F	weights
Ours	resnet-50	stage 1	MS-COCO	69.5	67.8	71.2	`link`
Origin	resnet-50	stage 2	MS-COCO -> Davis&Youtube-vos	81.8	79.2	84.3	`link`
Ours	resnet-50	stage 2	MS-COCO -> Davis&Youtube-vos	82.0	79.7	84.4	`link`
Ours	resnest-101	stage 2	MS-COCO -> Davis&Youtube-vos	84.6	82.0	87.2	`link`

Requirements

Python >= 3.6
Pytorch 1.5
Numpy
Pillow
opencv-python
imgaug
scipy
tqdm
pandas
resnest

Datasets

MS-COCO

We use MS-COCO's instance segmentation part to generate pseudo video sequence. Specifically, we cut out the objects in one image and paste them on another one. Then we perform different affine transformations on the foreground objects and the background image. If you want to visualize some of the processed training frame sequence:

python dataset/coco.py -Ddavis "path to davis" -Dcoco "path to coco" -o "path to output dir"

DAVIS

Youtube-VOS

Structure

 |- data
      |- Davis
          |- JPEGImages
          |- Annotations
          |- ImageSets
      
      |- Youtube-vos
          |- train
          |- valid
          
      |- Ms-COCO
          |- train2017
          |- annotations
              |- instances_train2017.json

Demo

python demo.py -g "gpu id" -s "set" -y "year" -D "path to davis" -p "path to weights" -backbone "[resnet50,resnet18,resnest101]"
#e.g.
python demo.py -g 0 -s val -y 17 -D ../data/Davis/ -p /smart/haochen/cvpr/0628_resnest_aspp/davis_youtube_resnest101_699999.pth -backbone resnest101

bmx-trees.mp4

Training

Stage 1

Pretraining on MS-COCO.

python train_coco.py -Ddavis "path to davis" -Dcoco "path to coco" -backbone "[resnet50,resnet18]" -save "path to checkpoints"
#e.g.
python train_coco.py -Ddavis ../data/Davis/ -Dcoco ../data/Ms-COCO/ -backbone resnet50 -save ../coco_weights/

Stage 2

Training on Davis&Youtube-vos.

python train_davis.py -Ddavis "path to davis" -Dyoutube "path to youtube-vos" -backbone "[resnet50,resnet18]" -save "path to checkpoints" -resume "path to coco pretrained weights"
#e.g. 
train_davis.py -Ddavis ../data/Davis/ -Dyoutube ../data/Youtube-vos/ -backbone resnet50 -save ../davis_weights/ -resume ../coco_weights/coco_pretrained_resnet50_679999.pth

Evaluation

Evaluating on Davis 2017&2016 val set.

python eval.py -g "gpu id" -s "set" -y "year" -D "path to davis" -p "path to weights" -backbone "[resnet50,resnet18,resnest101]"
#e.g.
python eval.py -g 0 -s val -y 17 -D ../data/davis -p ../davis_weights/davis_youtube_resnet50_799999.pth -backbone resnet50
python eval.py -g 0 -s val -y 17 -D ../data/davis -p ../davis_weights/davis_youtube_resnest101_699999.pth -backbone resnest101

Notes

STM is an attention-based implicit matching architecture, which needs large amounts of data for training. The first stage of training is necessary if you want to get better results.
Training takes about three days on a single NVIDIA 2080Ti. There is no log during training, you could add logs if you need.
Due to time constraints, the code is a bit messy and need to be optimized. Questions and suggestions are welcome.

Acknowledgement

This codebase borrows the code and structure from official STM repository

Citing STM

@inproceedings{oh2019video,
  title={Video object segmentation using space-time memory networks},
  author={Oh, Seoung Wug and Lee, Joon-Young and Xu, Ning and Kim, Seon Joo},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9226--9235},
  year={2019}
}

Comments

Some question about youtubevos result.

Hello, Can you publish the results of your youtubevos ? I used the 82.0 davis model to measure 51.8 on youtube vos, Is this consistent with the result you measured? Looking forward for your apply!

opened by zy5037 2
some details

what is the difference of resnest101 and resnet101? firstly, I wonder why you use"from resnest.torch import resnest101" instead of models.resnet101. and i didn't find an implementation for resnest101.

opened by leeqiaogithub 1
about training question

hi,thx for your sharing. there is a question about the davis and youtube training stage. why you set the parameter of 0.08 when you load the datasets from davis and youtube? how you get that?

opened by mmyjjl1009 0
Fine tuning on a custom dateset?

Thanks for your effort and for sharing the code. Is it possible to fine-tune the pre-trained models to a custom dataset? How? And Does the fine-tuning make the performance better if you have like 50 videos (~3 min)?

opened by zobeirraisi 0
some questions

Thanks for your good work! I have the following questions: 1, why is the Davis dataset used when the model is pre-trained on the Coco dataset, and 2, line 90 next is not defined before the call.3. How to continue to train the model pretrained on the static image with COCO data set?

opened by longmalongma 2

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Related tags

Overview

Training-code-of-STM

Performance on Davis17 val set&Weights

Requirements

Datasets

Structure

Demo

Training

Stage 1

Stage 2

Evaluation

Notes

Acknowledgement

Citing STM

Comments

Owner

haochen wang

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

Semi-automated vocabulary generation from semantic vector models

Code for the Python code smells video on the ArjanCodes channel.

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Shared code for training sentence embeddings with Flax / JAX

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

The training code for the 4th place model at MDX 2021 leaderboard A.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks