Video Frame Interpolation with Transformer (CVPR2022)

Overview

VFIformer

Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer

Dependencies

  • python >= 3.8
  • pytorch >= 1.8.0
  • torchvision >= 0.9.0

Prepare Dataset

  1. Vimeo90K Triplet dataset
  2. MiddleBury Other dataset
  3. UCF101 dataset
  4. SNU-FILM dataset

To train on the Vimeo90K, we have to first compute the ground-truth flows between frames using Lite-flownet, you can clone the Lite-flownet repo and put compute_flow_vimeo.py we provide under its main directory and run (remember to change the data path):

python compute_flow_vimeo.py

Get Started

  1. Clone this repo.
    git clone https://github.com/Jia-Research-Lab/VFIformer.git
    cd VFIformer
    
  2. Modify the argument --data_root in train.py according to your Vimeo90K path.

Evaluation

  1. Download the pre-trained models and place them into the pretrained_models/ folder.

    • Pre-trained models can be downloaded from Google Drive
      • pretrained_VFIformer: the final model in the main paper
      • pretrained_VFIformerSmall: the smaller version of the model mentioned in the supplementary file
  2. Test on the Vimeo90K testing set.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    

    If you want to test with the smaller model, please change the --net_name and --resume accordingly:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformerSmall --resume ./pretrained_models/pretrained_VFIformerSmall/net_220.pth --save_result
    

    The testing results are saved in the test_results/ folder. If you do not want to save the image results, you can remove the --save_result argument in the commands optionally.

  3. Test on the MiddleBury dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your MiddleBury path] --testset MiddleburyDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  4. Test on the UCF101 dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your UCF101 path] --testset UFC101Dataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  5. Test on the SNU-FILM dataset.

    Modify the argument --data_root according to your data path. Choose the motion level and modify the argument --test_level accordingly, run:

    python FILM_test.py --data_root [your SNU-FILM path] --test_level [easy/medium/hard/extreme] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Training

  1. First train the flow estimator. (Note that skipping this step will not cause a significant impact on performance. We keep this step here only to be consistent with our paper.)
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4174 train.py --launcher pytorch --gpu_ids 0,1,2,3 \
            --loss_flow --use_tb_logger --batch_size 48 --net_name IFNet --name train_IFNet --max_iter 300 --crop_size 192 --save_epoch_freq 5
    
  2. Then train the whole framework.
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformer --name train_VFIformer --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    
  3. To train the smaller version, run:
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformerSmall --name train_VFIformerSmall --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    

Test on your own data

  1. Modify the arguments --img0_path and --img1_path according to your data path, run:
    python demo.py --img0_path [your img0 path] --img1_path [your img1 path] --save_folder [your save path] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Acknowledgement

We borrow some codes from RIFE and SwinIR. We thank the authors for their great work.

Citation

Please consider citing our paper in your publications if it is useful for your research.

@inproceedings{lu2022vfiformer,
    title={Video Frame Interpolation with Transformer},
    author={Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022},
}

Contact

[email protected]

Issues
  • issue of code

    issue of code

    Hi, there is an issue in the code, "from Timm. models. layers import DropPath, to_2tuple, trunc_normal_, "No module named 'Timm'"?"

    opened by Zhang-Hu927 1
  • Question on ablation study

    Question on ablation study

    Hi authors, thank you for your awesome work.

    I was going through the VFIformer paper and I got curious of something from the ablation study. It would have been great to attend CVPR and ask in person, but unfortunately I cannot do so, so I leave my question here.

    In short, to my understanding, the main contribution of the paper is: use of transformer layers in VFI, with a novel cross-scale window attention, reaching a state-of-the-art performance.

    So I assume the 'Model 1' configuration of Table 2 consists of Convolutional layers only, yet it still outperforms (36.27 in Vimeo90k) the best baseline (36.18). I came to wonder the reason for this. To me, the 'Model 1' configuration did not seem to have anything special (no offense) since it did not contain the proposed modules.

    Can you give an explanation on this? What was the difference that lead to a strong base model (Model 1)? Or did I miss anything on the 'Model 1' configuration...?

    opened by JHLew 1
  • Missing flow .npy file

    Missing flow .npy file

    Hi, I came across your paper recently and thought it was really impressive. I am running the test script using the command provided, but I get an error

    flow0 = np.load(flow[0])
    IndexError: list index out of range
    
    

    This line in datasets.py indicates that there should be a flows folder with the contents being .npy. Is it possible to provide this folder? flow = sorted(glob.glob(os.path.join(self.data_root.replace('vimeo_triplet', ''), 'flows', name, '*.npy'))) Thanks

    opened by repers 1
Owner
DV Lab
Deep Vision Lab
DV Lab
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 177 Jun 29, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 513 Jun 25, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

null 36 Jun 19, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 78 Jun 5, 2022
an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

revisiting-sepconv This is a reference implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation [1] using PyTorch. Given two f

Simon Niklaus 49 Jun 7, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 2.5k Jul 4, 2022
An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 975 Jun 28, 2022
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 23 Jun 28, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 3 May 27, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 91 Jun 27, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 98 Jun 26, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 196 Jun 27, 2022
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | 简体中文 This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University 382 Jul 4, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 229 Jun 29, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 210 Jun 19, 2022
Unsupervised Video Interpolation using Cycle Consistency

Unsupervised Video Interpolation using Cycle Consistency Project | Paper | YouTube Unsupervised Video Interpolation using Cycle Consistency Fitsum A.

NVIDIA Corporation 96 Jun 26, 2022
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 391 Jun 23, 2022