Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Related tags

Deep Learning DSTT
Overview

Decoupled Spatial-Temporal Transformer for Video Inpainting

By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.

This repo is the official Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting.

Introduction

Usage

Prerequisites

Install

  • Clone this repo:
git clone https://github.com/ruiliu-ai/DSTT.git
  • Install other packages:
cd DSTT
pip install -r requirements.txt

Training

Dataset preparation

Download datasets (YouTube-VOS and DAVIS) into the data folder.

mkdir data

Training script

python train.py -c configs/youtube-vos.json

Test

Download pre-trained model into checkpoints folder.

mkdir checkpoints

Test script

python test.py -c checkpoints/dstt.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan

Citing DSTT

If you find DSTT useful in your research, please consider citing:

@article{Liu_2021_DSTT,
  title={Decoupled Spatial-Temporal Transformer for Video Inpainting},
  author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Li Hongsheng},
  journal={arXiv preprint arXiv:2104.06637},
  year={2021}
}

Acknowledement

This code relies heavily on the video inpainting framework from spatial-temporal transformer net.

Comments
  • Some question about the paper!

    Some question about the paper!

    In your paper Section 3.2, split the F in the s^2 zones, Then total number is t * s^2 * n, why this number need to * n , should the number is t * s^2? Looking forward your reply

    opened by jiaaihhy 3
  • Generate Mask for custom video

    Generate Mask for custom video

    Hi, thanks for the code!

    Any code, technique or framework recommendation to perform the necessary segmentation/mask (like --mask examples/schoolgirls) as input for your algorithm? Thanks in advance!

    opened by italosalgado14 2
  • how does this network realize self-training?

    how does this network realize self-training?

    the paper uses most of the parts to show the generator network. but because of lacking ground-truth, we need self-training to realize objective moving? so I wonder how to realize self-training? GAN is supervised learning as I know. tks for answer!

    opened by JialinKang 1
  • question about the dataset

    question about the dataset

    sorry for bother u, but i can't download the dataset from google driver. And I tried download dataset from Onedrive then meet the issue "CSC error". So i wonder that if you have any other way for us to download the YouTube-VOS dataset. thank u.

    opened by OnlyFlashEobard 0
  • About GPU issues

    About GPU issues

    Hi Thanks for sharing your work, I have the following problem:

    We have 6 2080ti graphics cards, but the following error will be reported when running: RuntimeError: CUDA error: out of memory

    image image

    opened by caimaomao315 2
  • Questions about pos_embedding

    Questions about pos_embedding

    Hello,

    This is such a great work and thanks for sharing the codes! I have a question about why there is no pos_embedding in the codes while transformer is not aware of the temporal orders of inputs. Please give me any hints, thanks!

    Best, Kejie

    opened by kejie7243 0
  • Can you please point out the download links of the dataset you use?

    Can you please point out the download links of the dataset you use?

    Hi author, Seems no YouTube-VOS download link from https://competitions.codalab.org/competitions/19544 and too many DAVIS download links from https://davischallenge.org/davis2017/code.html Can you please give us a concrete hint about it?

    opened by napohou 1
  • Assertion `index >= -sizes[i] && index < sizes[i] &&

    Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed

    when i run test.py,it return an error , just like this. python3 test.py -c *** -v *** -m *** then it return error: /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [222,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "test.py", line 162, in main_worker() File "test.py", line 135, in main_worker pred_img = model(masked_imgs) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, **kwargs) File "/ProjectRoot/test/Video_inpainting/DSTT-master/model/DSTT.py", line 144, in forward enc_feat = self.encoder(masked_frames.view(bt, c, h, w)) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward return self.conv2d_forward(input, self.weight) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

    opened by shanyun123456 0
Owner
null
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 2, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
Group Activity Recognition with Clustered Spatial Temporal Transformer

GroupFormer Group Activity Recognition with Clustered Spatial-TemporalTransformer Backbone Style Action Acc Activity Acc Config Download Inv3+flow+pos

null 28 Dec 12, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

Soomin Park 28 Dec 16, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

disclaimer: this code is modified from pytorch-tutorial Image classification with synthetic gradient in Pytorch I implement the Decoupled Neural Inter

Andrew 114 Dec 22, 2022
The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks This repo is the official implementation of CVPR2021 paper: "Decoupled Dynamic Filter Networks". Introduction DDF is

F.S.Fire 180 Dec 30, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 4, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

null 202 Dec 30, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022