Github project for Attention-guided Temporal Coherent Video Object Matting.

Related tags

Deep Learning TCVOM
Overview

Attention-guided Temporal Coherent Video Object Matting

This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matting (arXiv:2105.11427). We provide our code, the supplementary material, trained model and VideoMatting108 dataset here. For the trimap generation module, please see TCVOM-TGM.

The code, the trained model and the dataset are for academic and non-commercial use only.

The supplementary material can be found here.

Table of Contents

VideoMatting108 Dataset

VideoMatting108 is a large video matting dataset that contains 108 video clips with their corresponding groundtruth alpha matte, all in 1080p resolution, 80 clips for training and 28 clips for validation.

You can download the dataset here. The total size of the dataset is 192GB and we've split the archive into 1GB chunks.

The contents of the dataset are the following:

  • FG: contains the foreground RGBA image, where the alpha channel is the groundtruth matte and RGB channel is the groundtruth foreground.
  • BG: contains background RGB image used for composition.
  • flow_png_val: contains quantized optical flow of validation video clips for calculating MESSDdt metric. You can choose not to download this folder if you don't need to calculate this metric. You can refer to the _flow_read() function in calc_metric.py for usage.
  • *_videos*.txt: train / val split.
  • frame_corr.json: FG / BG frame pair used for composition.

After decompressing, the dataset folder should have the structure of the following (please rename flow_png_val to flow_png):

|---dataset
  |-FG_done
  |-BG_done
  |-flow_png
  |-frame_corr.json
  |-train_videos.txt
  |-train_videos_subset.txt
  |-val_videos.txt
  |-val_videos_subset.txt

Models

Currently our method supports four different image matting methods as base.

  • gca (GCA Matting by Li et al., code is from here)
  • dim (DeepImageMatting by Xu et al., we use the reimplementation code from here)
  • index (IndexNet Matting by Lu et al., code is from here)
  • fba (FBA Matting by Forte et al., code is from here)
    • There are some differences in our training and the original FBA paper. We believe that there are still space for further performance gain through hyperparameter fine-tuning.
      • We did not use the foreground extension technique during training. Also we use four GPUs instead of one.
      • We used the conventional adam optimizer instead of radam.
      • We used mean instead of sum during loss computation to keep the loss balanced (especially for L_af).

The trained model can be downloaded here. We provide four different weights for every base method.

  • *_SINGLE_Lim.pth: The trained weight of the base image matting method on the VideoMatting108 dataset without TAM. Only L_im is used during the pretrain. This is the baseline model.
  • *_TAM_Lim_Ltc_Laf.pth: The trained weight of base image matting method with TAM on VideoMatting108 dataset. L_im, L_tc and L_af is used during the training. This is our full model.
  • *_TAM_pretrain.pth: The pretrained weight of base image matting method with TAM on the DIM dataset. Only L_im is used during the training.
  • *_fe.pth: The converted weight from the original model checkpoint, only used for pretraining TAM.

Results

This is the quantitative result on VideoMatting108 validation dataset with medium width trimap. The metric is averaged on all 28 validation video clips.

We use CUDA 10.2 during the inference. Using CUDA 11.1 might result in slightly lower metric. All metrics are calculated with calc_metric.py.

Method Loss SSDA dtSSD MESSDdt MSE*(10^3) mSAD
GCA+F (Baseline) L_im 55.82 31.64 2.15 8.20 40.85
GCA+TAM L_im+L_tc+L_af 50.41 27.28 1.48 7.07 37.65
DIM+F (Baseline) L_im 61.85 34.55 2.82 9.99 44.38
DIM+TAM L_im+L_tc+L_af 58.94 29.89 2.06 9.02 43.28
Index+F (Baseline) L_im 58.53 33.03 2.33 9.37 43.53
Index+TAM L_im+L_tc+L_af 57.91 29.36 1.81 8.78 43.17
FBA+F (Baseline) L_im 57.47 29.60 2.19 9.28 40.57
FBA+TAM L_im+L_tc+L_af 51.57 25.50 1.59 7.61 37.24

Usage

Requirements

Python=3.8
Pytorch=1.6.0
numpy
opencv-python
imgaug
tqdm
yacs

Inference

pred_single.py and pred_vmn.py automatically use all CUDA devices available. pred_test.py uses cuda:0 device as default.

  • Inference on VideoMatting108 validation set using our full model

    • python pred_vmd.py --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Inference on VideoMatting108 validation set using the baseline model

    • python pred_single.py --dataset vmd --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Calculating metrics

    • python calc_metric.py --pred /path/to/prediction/result --data /path/to/VideoMatting108dataset
    • The result will be saved in metric.json inside /path/to/prediction/result. Use tail to see the final averaged result.

  • Inference on test video clips

    • First, prepare the data. Make sure the workspace folder has the structure of the following:

      |---workspace
        |---video1
          |---00000_rgb.png
          |---00000_trimap.png
          |---00001_rgb.png
          |---00001_trimap.png
          |---....
        |---video2
        |---video3
        |---...
      
    • python pred_test.py --gpu CUDA_DEVICES_NUMBER_SPLIT_BY_COMMA --model {gca,vmn_gca,dim,vmn_dim,index,vmn_index,fba,vmn_fba} --data /path/to/workspace --load /path/to/weight.pth --save /path/to/outdir [video1] [video2] ...
      • The model parameter: vmn_BASEMETHOD corresponds to our full model, BASEMETHOD corresponds to the baseline model.
      • Without specifying the name of the video clip folders in the command line, the script will process all video clips under /path/to/workspace.

Training

PY_CMD="python -m torch.distributed.launch --nproc_per_node=NUMBER_OF_CUDA_DEVICES"
  • Pretrain TAM on DIM dataset. Please see cfgs/pretrain_vmn_BASEMETHOD.yaml for configuration and refer to dataset/DIM.py for dataset preparation.

    $PY_CMD pretrain_ddp.py --cfg cfgs/pretrain_vmn_index.yaml
  • Training our full method on VideoMatting108 dataset. This will load the pretrained TAM weight as initialization. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep.yaml for configuration.

    $PY_CMD train_ddp.py --cfg /path/to/config.yaml
  • Training the baseline method on VideoMatting108 dataset without TAM. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep_single.yaml for configuration.

    $PY_CMD train_single_ddp.py --cfg /path/to/config.yaml

Contact

If you have any questions, please feel free to contact [email protected].

You might also like...
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

 PyMatting: A Python Library for Alpha Matting
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

[IJCAI'21] Deep Automatic Natural Image Matting
[IJCAI'21] Deep Automatic Natural Image Matting

Deep Automatic Natural Image Matting [IJCAI-21] This is the official repository of the paper Deep Automatic Natural Image Matting. Introduction | Netw

 MODNet: Trimap-Free Portrait Matting in Real Time
MODNet: Trimap-Free Portrait Matting in Real Time

MODNet is a model for real-time portrait matting with only RGB image input.

Comments
  • TCVOM - Blurry Edges, Am I doing something wrong?

    TCVOM - Blurry Edges, Am I doing something wrong?

    Hey there, trying to run this on my own individual dataset, but I can't seem to reproduce the sharpe edges found in your results in the paper (even for single frame of 1280x720)

    Using EvalModel VMN-FBA

    Alpha Matte Result

    Image 11-23-22 at 9 47 PM

    Input Frame

    Screen Shot 2022-11-25 at 11 32 48 AM

    Trimap

    Screen Shot 2022-11-25 at 11 30 28 AM

    Alpha Matte

    Screen Shot 2022-11-25 at 11 29 09 AM

    opened by bfan1256 5
  • How to unzip dataset

    How to unzip dataset

    Thanks for your excellent work. I download the dataset, but when I decompress files, I get some errors. There are three types of data: BG , FG and flow_png_val. I download all the files in the BG folder and all the files end with '.7z.00*'. I try to decompress one by one , I get errors: The archive file is incomplete. And I try to merge all the files into one file, then I use this command:cat BG_done.7z.00* > BG_done.7z. But I also got the same error message. Could you please give me some suggestion about this problem. Looking forward to any reply. I decompressed on MacOS. Tools: Keka , Unarchiver.

    opened by pinguo-huxiaohe 1
  • 您好,我刚才看了您的补充材料的视频,效果很惊艳

    您好,我刚才看了您的补充材料的视频,效果很惊艳

    我有几个问题想问一下: 1.我如果测试自己的图像或者视频该如何制作自己的测试数据集 2.对于推理时间达到什么速度,能否实现实时效果 3.相比于Robust High-Resolution Video Matting with Temporal Guidance最近开源的这个项目,效果如何:抠像质量上(头发丝、半透明物体);速度上4k实时(我们项目未来能达到吗)

    opened by zhanghongyong123456 1
  • artifacts near the boundary of the object

    artifacts near the boundary of the object

    Thank you for sharing your great work!

    When I ran pred_vmn.py using the provided dataset (VideoMatting108) and pretrained model (FBA_TAM_Lim_Ltc_Laf.pth), the result of alpha matte seems to be wrong.

    00000_pred

    The above result is animal_still/dove/00000_pred.png, and artifacts appear very much near the boundary of the object. The running script is python pred_vmn.py --model fba --data ./data/VideoMatting108 --load ./weights/FBA_TAM_Lim_Ltc_Laf.pth --trimap narrow --save ./results/fba_vmn

    I got some warnings below and it looks like this could be the problem.

    Missing keys: ['decoder.fam.key_conv.bias', 'decoder.fam.key_conv.weight', 'decoder.fam.query_conv.bias', 'decoder.fam.query_conv.weight', 'decoder.fam.value_conv.bias', 'decoder.fam.value_conv.weight']
    Unexpected keys: []
    
    opened by Hongje 1
Owner
null
Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video Project Page | Paper NeuralRecon: Real-Time Coherent 3D Reconstruction from Mon

ZJU3DV 1.4k Dec 30, 2022
A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

TecoGAN-PyTorch Introduction This is a PyTorch reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution (VSR). Please refer to

null 165 Dec 17, 2022
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Video Matting Refinement For Python

Video-matting refinement Library (use pip to install) scikit-image numpy av matplotlib Run Static background python path_to_video.mp4 Moving backgroun

null 3 Jan 11, 2022
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022