[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Xiaohao Xu

Last update: Dec 4, 2022

Related tags

Overview

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22)

Preview version paper of this work is available at: https://arxiv.org/abs/2112.02853

Qualitative results and comparisons with previous SOTAs are available at: https://youtu.be/X6BsS3t3wnc

This repo is a preview version. More details will be added later.

Abstract

Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error propagation through a correction mechanism with high reliability.

The key insight is to disentangle the correction from the conventional mask propagation process with reliable cues.

We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively. Specifically, we assemble the modulators with a cascaded propagation-correction scheme. This avoids overriding the effects of the reliable correction modulator by the propagation modulator.

Although the reference frame with the ground truth label provides reliable cues, it could be very different from the target frame and introduce uncertain or incomplete correlations. We augment the reference cues by supplementing reliable feature patches to a maintained pool, thus offering more comprehensive and expressive object representations to the modulators. In addition, a reliability filter is designed to retrieve reliable patches and pass them in subsequent frames.

Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance.

Requirements

This docker image may contain some redundent packages. A more light-weight one will be generated later.

docker image: xxiaoh/vos:10.1-cudnn7-torch1.4_v3

Citation

If you find this work is useful for your research, please consider citing:

@misc{xu2021reliable,
  title={Reliable Propagation-Correction Modulation for Video Object Segmentation}, 
  author={Xiaohao Xu and Jinglu Wang and Xiao Li and Yan Lu},
  year={2021},
  eprint={2112.02853},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Credit

CFBI: https://github.com/z-x-yang/CFBI

Deeplab: https://github.com/VainF/DeepLabV3Plus-Pytorch

GCT: https://github.com/z-x-yang/GCT

Acknowledgement

Firstly, the author would like to thank Rex for his insightful viewpoints about VOS during e-mail discussion! Also, this work is largely built upon the codebase of CFBI. Thanks for the author of CFBI to release such a wonderful code repo for further work to build upon!

Related impressive works in VOS

AOT [NeurIPS 2021]: https://github.com/z-x-yang/AOT

STCN [NeurIPS 2021]: https://github.com/hkchengrex/STCN

MiVOS [CVPR 2021]: https://github.com/hkchengrex/MiVOS

SSTVOS [CVPR 2021]: https://github.com/dukebw/SSTVOS

GraphMemVOS [ECCV 2020]: https://github.com/carrierlxk/GraphMemVOS

CFBI [ECCV 2020]: https://github.com/z-x-yang/CFBI

STM [ICCV 2019]: https://github.com/seoungwugoh/STM

FEELVOS [CVPR 2019]: https://github.com/kim-younghan/FEELVOS

Useful websites for VOS

The 1st Large-scale Video Object Segmentation Challenge: https://competitions.codalab.org/competitions/19544#learn_the_details

The 2nd Large-scale Video Object Segmentation Challenge - Track 1: Video Object Segmentation: https://competitions.codalab.org/competitions/20127#learn_the_details

The Semi-Supervised DAVIS Challenge on Video Object Segmentation @ CVPR 2020: https://competitions.codalab.org/competitions/20516#participate-submit_results

DAVIS: https://davischallenge.org/

YouTube-VOS: https://youtube-vos.org/

Papers with code for Semi-VOS: https://paperswithcode.com/task/semi-supervised-video-object-segmentation

Welcome to comments and discussions!!

Xiaohao Xu: [email protected]

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

35 Jan 1, 2023

Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

BasicVSR_PlusPlus (CVPR 2022) [Paper] [Project Page] [Code] This is the official repository for BasicVSR++. Please feel free to raise issue related to

227 Jan 1, 2023

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT)

35 Oct 31, 2022

Comments

How Can I setup dataset and .pth path?

I git clone this git page and download datasets and .pth file suessful, and then I want to know how to setup dataset and pth path. I got a path error.

opened by kkk3449 2
about Shannon entropy

I'm so sorry to bother you, I recently read your masterpiece. I am very interested in the calculation method in the formula for calculating information entropy in the Prediction reliability part of your paper. I would like to ask how the probability represented by the symbol P of the formula is calculated.

opened by lsy-dot 1
about the reweighting operation in two modulators

Thanks for your excellent work ! I have a detailed problem about the channel reweighting in two modulators. In fig.2, I see both w_p and w_c are sent to both of the two modulators, but in your statement in "Modulator block" part, the reweighting operation is performed separately by the two vectors (i.e, w_p for propagation, w_c for correction). I am confused about this point.

opened by ruoqi77 1

[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Related tags

Overview

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22)

Abstract

Requirements

Citation

Credit

Acknowledgement

Related impressive works in VOS

Useful websites for VOS

Welcome to comments and discussions!!

You might also like...

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

A simple tutoral for error correction task, based on Pytorch

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Reliable probability face embeddings

Comments

How Can I setup dataset and .pth path?

about Shannon entropy

about the reweighting operation in two modulators

Owner

Xiaohao Xu

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

Codes for AAAI22 paper "Learning to Solve Travelling Salesman Problem with Hardness-Adaptive Curriculum"

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation