official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Last update: Dec 27, 2022

Related tags

Deep Learning FuseFormer

Overview

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.

This repo is the official Pytorch implementation of FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Introduction

Usage

Prerequisites

Python >= 3.6
Pytorch >= 1.0 and corresponding torchvision (https://pytorch.org/)

Install

Clone this repo:

git clone https://github.com/ruiliu-ai/FuseFormer.git

Install other packages:

cd FuseFormer
pip install -r requirements.txt

Training

Dataset preparation

Download datasets (YouTube-VOS and DAVIS) into the data folder.

mkdir data

Training script

python train.py -c configs/youtube-vos.json

Test

Download pre-trained model into checkpoints folder.

mkdir checkpoints

Test script

python test.py -c checkpoints/fuseformer.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan

Citing FuseFormer

If you find FuseFormer useful in your research, please consider citing:

@InProceedings{Liu_2021_FuseFormer,
  title={FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting},
  author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
}

Acknowledement

This code borrows heavily from the video inpainting framework spatial-temporal transformer net.

Comments

Error when running evaluate. py

When running evaluate. py, the error "ModuleNotFoundError: No module named 'model.i3d'" has occurred“. Could the author please provide information about model.i3d, I look forward to your reply, thank you very much.

opened by Drangonliao123 7
About your prepared Davis stationary masks sample numbers

Hi, thanks for your sharing. 关于您提供的 prepared stationary masks中，我注意到STTN方法中将Davis训练集90个场景的sample都用于evaluation，而您只从中选择了50个场景制作了mask. 请问这是出于什么原因考虑呢？是否是处于效率考虑，增加的场景不会影响evaluation结果？

另外，我还注意到在对比实验中，您直接使用了其他对比方法在STTN工作中的evaluation结果，您是否使用相同的验证集进行了验证并得到了相同的结果呢？

谢谢您的回答

opened by unclebuff 4
about the stationary mask for youtube-vos and davis test dataset

how can i find the stationary test mask for the two datasets (for PSNR)? if you can release the masks and the results that your model generate, that would be better!!!

opened by Feynman1999 4
The trained model cannot be tested

The YouTube Vos data set is trained according to the requirements in the paper, and the trained model always reports errors. Looking forward to your reply, thank you!

opened by Drangonliao123 2
Evaluation: Davis specification

Hey,

I am having trouble running the evaluate.py script.

Could you please specify which Davis dataset has been used/ is supposed to be downloaded?

Thanks in advance

opened by ewwnage 2
Why if I set batchsize over 5 it will indicate me out of memory?

I used your dataset and your training function on a single Tesla V100 with 32GB memory. But, I find over 5 batchsize, my memory is not enough, I don't know why. Cause you illustrate your batchsize is 8 and use V100 as well.

Thank you very much.

Best wishes.

opened by HJC2020 1
Evaluation Script

Hey,

another issue with the evaluation script is that in evaluate.py line 260 the model's state_dict is loaded: model.load_state_dict(data['netG']) but there is no netG key to the dictionary.

I've downloaded the pertained fuseformer.pth swell as the i3d_rgb_imagenet.pt

what could be the reason for the missing dict key?

Thanks in advance :)

opened by ewwnage 1
Whether there is a pretrained weight for discriminator?

I see the code, trainer.py, where netD needs to be loaded with pretrained data, but I cannot find the relevant file in your repository? Could you provide a way to access the weight file? Thank you.

opened by HJC2020 1

about the detail of ref_ids

hi，thank you for your work !

in the test.py:

ref_ids = get_ref_index(f, neighbor_ids, video_length)

selected_imgs = imgs[:1, neighbor_ids+ref_ids :, :, :]

for i in range(len(neighbor_ids)):  
            idx = neighbor_ids[i]
            img = np.array(pred_img[i]).astype(
                np.uint8)*binary_masks[idx] + frames[idx] * (1-binary_masks[idx])
            if comp_frames[idx] is None:
                comp_frames[idx] = img
            else:
                comp_frames[idx] = comp_frames[idx].astype(
                    np.float32)*0.5 + img.astype(np.float32)*0.5

What is the purpose of the reference frames?

opened by WEIZHIHONG720 0

Question about learning rate.

你好，感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.01，之后分别在200k，400k和450k时reduce by factor of 10 请问这样的设计是出于什么考虑呢？我还注意到代码中您的学习率设置是与STTN一致的：initial learning rate is 0.0001，reduce at 400k by factor of 10 您是否测试过这二者的区别？希望得到您的解答！！！

opened by unclebuff 0
question about the random masks

Thanks for the excellent work! In the paper, DAVIS is split into 90 video clips for training and 60 clips for testing. However, there are only 50 video clip for DAVIS evaluation in the prepared stationary masks. Is there something missing in this folder? I would be very grateful if you could answer my questions. Best wishes.

opened by wener-yung 0
Input dimension limitation

The code seems to be only work on a specific input resolution which the model trained on. If I want to test it with dimension other than 432x240, I can but I have to train the model with that specific dimension. right?

opened by tawsinDOTuddin 5

Owner

GitHub

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

90 Dec 31, 2022

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022

My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

1 Dec 12, 2021

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

28 Nov 16, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

32 Jun 14, 2022

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

28 Dec 19, 2022

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

SimTrans-Weak-Shot-Classification This repository contains the official PyTorch implementation of the following paper: Weak-shot Fine-grained Classifi

60 Dec 2, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022