official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Overview

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li.

This repo is the official Pytorch implementation of FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Introduction

Usage

Prerequisites

Install

  • Clone this repo:
git clone https://github.com/ruiliu-ai/FuseFormer.git
  • Install other packages:
cd FuseFormer
pip install -r requirements.txt

Training

Dataset preparation

Download datasets (YouTube-VOS and DAVIS) into the data folder.

mkdir data

Training script

python train.py -c configs/youtube-vos.json

Test

Download pre-trained model into checkpoints folder.

mkdir checkpoints

Test script

python test.py -c checkpoints/fuseformer.pth -v data/DAVIS/JPEGImages/blackswan -m data/DAVIS/Annotations/blackswan

Citing FuseFormer

If you find FuseFormer useful in your research, please consider citing:

@InProceedings{Liu_2021_FuseFormer,
  title={FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting},
  author={Liu, Rui and Deng, Hanming and Huang, Yangyi and Shi, Xiaoyu and Lu, Lewei and Sun, Wenxiu and Wang, Xiaogang and Dai, Jifeng and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
}

Acknowledement

This code borrows heavily from the video inpainting framework spatial-temporal transformer net.

Comments
  • Error when running evaluate. py

    Error when running evaluate. py

    When running evaluate. py, the error "ModuleNotFoundError: No module named 'model.i3d'" has occurred“. Could the author please provide information about model.i3d, I look forward to your reply, thank you very much.

    opened by Drangonliao123 7
  • About your prepared Davis stationary masks sample numbers

    About your prepared Davis stationary masks sample numbers

    Hi, thanks for your sharing. 关于您提供的 prepared stationary masks中,我注意到STTN方法中将Davis训练集90个场景的sample都用于evaluation,而您只从中选择了50个场景制作了mask. 请问这是出于什么原因考虑呢?是否是处于效率考虑,增加的场景不会影响evaluation结果?

    另外,我还注意到在对比实验中,您直接使用了其他对比方法在STTN工作中的evaluation结果,您是否使用相同的验证集进行了验证并得到了相同的结果呢?

    谢谢您的回答

    opened by unclebuff 4
  • about the stationary mask for youtube-vos and davis test dataset

    about the stationary mask for youtube-vos and davis test dataset

    how can i find the stationary test mask for the two datasets (for PSNR)? if you can release the masks and the results that your model generate, that would be better!!!

    opened by Feynman1999 4
  • The trained model cannot be tested

    The trained model cannot be tested

    The YouTube Vos data set is trained according to the requirements in the paper, and the trained model always reports errors. Looking forward to your reply, thank you!

    opened by Drangonliao123 2
  • Evaluation: Davis specification

    Evaluation: Davis specification

    Hey,

    I am having trouble running the evaluate.py script.

    Could you please specify which Davis dataset has been used/ is supposed to be downloaded?

    Thanks in advance

    opened by ewwnage 2
  • Why if I set batchsize over 5 it will indicate me out of memory?

    Why if I set batchsize over 5 it will indicate me out of memory?

    I used your dataset and your training function on a single Tesla V100 with 32GB memory. But, I find over 5 batchsize, my memory is not enough, I don't know why. Cause you illustrate your batchsize is 8 and use V100 as well.

    Thank you very much.

    Best wishes.

    opened by HJC2020 1
  • Evaluation Script

    Evaluation Script

    Hey,

    another issue with the evaluation script is that in evaluate.py line 260 the model's state_dict is loaded: model.load_state_dict(data['netG']) but there is no netG key to the dictionary.

    I've downloaded the pertained fuseformer.pth swell as the i3d_rgb_imagenet.pt

    what could be the reason for the missing dict key?

    Thanks in advance :)

    opened by ewwnage 1
  • Whether there is a pretrained weight for discriminator?

    Whether there is a pretrained weight for discriminator?

    I see the code, trainer.py, where netD needs to be loaded with pretrained data, but I cannot find the relevant file in your repository? Could you provide a way to access the weight file? Thank you.

    opened by HJC2020 1
  • about the  detail of ref_ids

    about the detail of ref_ids

    hi,thank you for your work !

    in the test.py:

    ref_ids = get_ref_index(f, neighbor_ids, video_length)
    
    selected_imgs = imgs[:1, neighbor_ids+ref_ids :, :, :]
    
    for i in range(len(neighbor_ids)):  
                idx = neighbor_ids[i]
                img = np.array(pred_img[i]).astype(
                    np.uint8)*binary_masks[idx] + frames[idx] * (1-binary_masks[idx])
                if comp_frames[idx] is None:
                    comp_frames[idx] = img
                else:
                    comp_frames[idx] = comp_frames[idx].astype(
                        np.float32)*0.5 + img.astype(np.float32)*0.5
    

    What is the purpose of the reference frames?

    opened by WEIZHIHONG720 0
  • Question about learning rate.

    Question about learning rate.

    你好,感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.01,之后分别在200k,400k和450k时reduce by factor of 10 请问这样的设计是出于什么考虑呢? 我还注意到代码中您的学习率设置是与STTN一致的:initial learning rate is 0.0001,reduce at 400k by factor of 10 您是否测试过这二者的区别? 希望得到您的解答!!!

    opened by unclebuff 0
  • question about the random masks

    question about the random masks

    Thanks for the excellent work! In the paper, DAVIS is split into 90 video clips for training and 60 clips for testing. However, there are only 50 video clip for DAVIS evaluation in the prepared stationary masks. Is there something missing in this folder? I would be very grateful if you could answer my questions. Best wishes.

    opened by wener-yung 0
  • Input dimension limitation

    Input dimension limitation

    The code seems to be only work on a specific input resolution which the model trained on. If I want to test it with dimension other than 432x240, I can but I have to train the model with that specific dimension. right?

    opened by tawsinDOTuddin 5
Owner
null
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 55 Dec 21, 2022
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

Flow-to-depth (FDNet) video-depth-estimation This is the implementation of paper Video Depth Estimation by Fusing Flow-to-Depth Proposals Jiaxin Xie,

null 32 Jun 14, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

SimTrans-Weak-Shot-Classification This repository contains the official PyTorch implementation of the following paper: Weak-shot Fine-grained Classifi

BCMI 60 Dec 2, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

DavidHuang 126 Dec 30, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

Yuming Jiang 221 Jan 7, 2023
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 9, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

null 202 Dec 30, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

null 368 Dec 26, 2022