Github project for Attention-guided Temporal Coherent Video Object Matting.

Last update: Dec 19, 2022

Related tags

Deep Learning TCVOM

Overview

Attention-guided Temporal Coherent Video Object Matting

This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matting (arXiv:2105.11427). We provide our code, the supplementary material, trained model and VideoMatting108 dataset here. For the trimap generation module, please see TCVOM-TGM.

The code, the trained model and the dataset are for academic and non-commercial use only.

The supplementary material can be found here.

VideoMatting108 Dataset
Models
Usage
Contact

VideoMatting108 Dataset

VideoMatting108 is a large video matting dataset that contains 108 video clips with their corresponding groundtruth alpha matte, all in 1080p resolution, 80 clips for training and 28 clips for validation.

You can download the dataset here. The total size of the dataset is 192GB and we've split the archive into 1GB chunks.

The contents of the dataset are the following:

FG: contains the foreground RGBA image, where the alpha channel is the groundtruth matte and RGB channel is the groundtruth foreground.
BG: contains background RGB image used for composition.
flow_png_val: contains quantized optical flow of validation video clips for calculating MESSDdt metric. You can choose not to download this folder if you don't need to calculate this metric. You can refer to the _flow_read() function in calc_metric.py for usage.
*_videos*.txt: train / val split.
frame_corr.json: FG / BG frame pair used for composition.

After decompressing, the dataset folder should have the structure of the following (please rename flow_png_val to flow_png):

|---dataset
  |-FG_done
  |-BG_done
  |-flow_png
  |-frame_corr.json
  |-train_videos.txt
  |-train_videos_subset.txt
  |-val_videos.txt
  |-val_videos_subset.txt

Models

Currently our method supports four different image matting methods as base.

gca (GCA Matting by Li et al., code is from here)
dim (DeepImageMatting by Xu et al., we use the reimplementation code from here)
index (IndexNet Matting by Lu et al., code is from here)
fba (FBA Matting by Forte et al., code is from here)
- There are some differences in our training and the original FBA paper. We believe that there are still space for further performance gain through hyperparameter fine-tuning.
  - We did not use the foreground extension technique during training. Also we use four GPUs instead of one.
  - We used the conventional adam optimizer instead of radam.
  - We used mean instead of sum during loss computation to keep the loss balanced (especially for L_af).

The trained model can be downloaded here. We provide four different weights for every base method.

*_SINGLE_Lim.pth: The trained weight of the base image matting method on the VideoMatting108 dataset without TAM. Only L_im is used during the pretrain. This is the baseline model.
*_TAM_Lim_Ltc_Laf.pth: The trained weight of base image matting method with TAM on VideoMatting108 dataset. L_im, L_tc and L_af is used during the training. This is our full model.
*_TAM_pretrain.pth: The pretrained weight of base image matting method with TAM on the DIM dataset. Only L_im is used during the training.
*_fe.pth: The converted weight from the original model checkpoint, only used for pretraining TAM.

Results

This is the quantitative result on VideoMatting108 validation dataset with medium width trimap. The metric is averaged on all 28 validation video clips.

We use CUDA 10.2 during the inference. Using CUDA 11.1 might result in slightly lower metric. All metrics are calculated with calc_metric.py.

Method	Loss	SSDA	dtSSD	MESSDdt	MSE*(10^3)	mSAD
GCA+F (Baseline)	L_im	55.82	31.64	2.15	8.20	40.85
GCA+TAM	L_im+L_tc+L_af	50.41	27.28	1.48	7.07	37.65
DIM+F (Baseline)	L_im	61.85	34.55	2.82	9.99	44.38
DIM+TAM	L_im+L_tc+L_af	58.94	29.89	2.06	9.02	43.28
Index+F (Baseline)	L_im	58.53	33.03	2.33	9.37	43.53
Index+TAM	L_im+L_tc+L_af	57.91	29.36	1.81	8.78	43.17
FBA+F (Baseline)	L_im	57.47	29.60	2.19	9.28	40.57
FBA+TAM	L_im+L_tc+L_af	51.57	25.50	1.59	7.61	37.24

Usage

Requirements

Python=3.8
Pytorch=1.6.0
numpy
opencv-python
imgaug
tqdm
yacs

Inference

pred_single.py and pred_vmn.py automatically use all CUDA devices available. pred_test.py uses cuda:0 device as default.

Inference on VideoMatting108 validation set using our full model

python pred_vmd.py --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir

Inference on VideoMatting108 validation set using the baseline model

python pred_single.py --dataset vmd --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir

Calculating metrics
- ```
python calc_metric.py --pred /path/to/prediction/result --data /path/to/VideoMatting108dataset
```
- The result will be saved in metric.json inside /path/to/prediction/result. Use tail to see the final averaged result.

Inference on test video clips

First, prepare the data. Make sure the workspace folder has the structure of the following:

|---workspace
  |---video1
    |---00000_rgb.png
    |---00000_trimap.png
    |---00001_rgb.png
    |---00001_trimap.png
    |---....
  |---video2
  |---video3
  |---...

```
python pred_test.py --gpu CUDA_DEVICES_NUMBER_SPLIT_BY_COMMA --model {gca,vmn_gca,dim,vmn_dim,index,vmn_index,fba,vmn_fba} --data /path/to/workspace --load /path/to/weight.pth --save /path/to/outdir [video1] [video2] ...
```
- The model parameter: vmn_BASEMETHOD corresponds to our full model, BASEMETHOD corresponds to the baseline model.
- Without specifying the name of the video clip folders in the command line, the script will process all video clips under /path/to/workspace.

Training

PY_CMD="python -m torch.distributed.launch --nproc_per_node=NUMBER_OF_CUDA_DEVICES"

Pretrain TAM on DIM dataset. Please see cfgs/pretrain_vmn_BASEMETHOD.yaml for configuration and refer to dataset/DIM.py for dataset preparation.
```
$PY_CMD pretrain_ddp.py --cfg cfgs/pretrain_vmn_index.yaml
```
Training our full method on VideoMatting108 dataset. This will load the pretrained TAM weight as initialization. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep.yaml for configuration.
```
$PY_CMD train_ddp.py --cfg /path/to/config.yaml
```
Training the baseline method on VideoMatting108 dataset without TAM. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep_single.yaml for configuration.
```
$PY_CMD train_single_ddp.py --cfg /path/to/config.yaml
```

Contact

If you have any questions, please feel free to contact [email protected].

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 8, 2022

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

90 Oct 19, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

63 Sep 27, 2022

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

109 Dec 28, 2022

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Comments

TCVOM - Blurry Edges, Am I doing something wrong?

Hey there, trying to run this on my own individual dataset, but I can't seem to reproduce the sharpe edges found in your results in the paper (even for single frame of 1280x720)

Using EvalModel VMN-FBA

Alpha Matte Result

Input Frame

Trimap

Alpha Matte

opened by bfan1256 5
How to unzip dataset

Thanks for your excellent work. I download the dataset, but when I decompress files, I get some errors. There are three types of data: BG , FG and flow_png_val. I download all the files in the BG folder and all the files end with '.7z.00*'. I try to decompress one by one , I get errors: The archive file is incomplete. And I try to merge all the files into one file, then I use this command:cat BG_done.7z.00* > BG_done.7z. But I also got the same error message. Could you please give me some suggestion about this problem. Looking forward to any reply. I decompressed on MacOS. Tools: Keka , Unarchiver.

opened by pinguo-huxiaohe 1
您好，我刚才看了您的补充材料的视频，效果很惊艳

我有几个问题想问一下： 1.我如果测试自己的图像或者视频该如何制作自己的测试数据集 2.对于推理时间达到什么速度，能否实现实时效果 3.相比于Robust High-Resolution Video Matting with Temporal Guidance最近开源的这个项目，效果如何：抠像质量上（头发丝、半透明物体）；速度上4k实时（我们项目未来能达到吗）

opened by zhanghongyong123456 1
artifacts near the boundary of the object
Thank you for sharing your great work!

When I ran pred_vmn.py using the provided dataset (VideoMatting108) and pretrained model (FBA_TAM_Lim_Ltc_Laf.pth), the result of alpha matte seems to be wrong.

The above result is animal_still/dove/00000_pred.png, and artifacts appear very much near the boundary of the object. The running script is python pred_vmn.py --model fba --data ./data/VideoMatting108 --load ./weights/FBA_TAM_Lim_Ltc_Laf.pth --trimap narrow --save ./results/fba_vmn

I got some warnings below and it looks like this could be the problem.

Missing keys: ['decoder.fam.key_conv.bias', 'decoder.fam.key_conv.weight', 'decoder.fam.query_conv.bias', 'decoder.fam.query_conv.weight', 'decoder.fam.value_conv.bias', 'decoder.fam.value_conv.weight'] Unexpected keys: []
opened by Hongje 1

Github project for Attention-guided Temporal Coherent Video Object Matting.

Related tags

Overview

Attention-guided Temporal Coherent Video Object Matting

Table of Contents

VideoMatting108 Dataset

Models

Results

Usage

Requirements

Inference

Training

Contact

You might also like...

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

PyMatting: A Python Library for Alpha Matting

[IJCAI'21] Deep Automatic Natural Image Matting

MODNet: Trimap-Free Portrait Matting in Real Time

Comments

TCVOM - Blurry Edges, Am I doing something wrong?

Alpha Matte Result

Input Frame

Trimap

Alpha Matte

How to unzip dataset

您好，我刚才看了您的补充材料的视频，效果很惊艳

artifacts near the boundary of the object

Owner

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral

A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

Temporally Coherent GAN SIGGRAPH project.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Video Matting Refinement For Python

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.