Video Frame Interpolation with Transformer (CVPR2022)

Overview

VFIformer

Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer

Dependencies

  • python >= 3.8
  • pytorch >= 1.8.0
  • torchvision >= 0.9.0

Prepare Dataset

  1. Vimeo90K Triplet dataset
  2. MiddleBury Other dataset
  3. UCF101 dataset
  4. SNU-FILM dataset

To train on the Vimeo90K, we have to first compute the ground-truth flows between frames using Lite-flownet, you can clone the Lite-flownet repo and put compute_flow_vimeo.py we provide under its main directory and run (remember to change the data path):

python compute_flow_vimeo.py

Get Started

  1. Clone this repo.
    git clone https://github.com/Jia-Research-Lab/VFIformer.git
    cd VFIformer
    
  2. Modify the argument --data_root in train.py according to your Vimeo90K path.

Evaluation

  1. Download the pre-trained models and place them into the pretrained_models/ folder.

    • Pre-trained models can be downloaded from Google Drive
      • pretrained_VFIformer: the final model in the main paper
      • pretrained_VFIformerSmall: the smaller version of the model mentioned in the supplementary file
  2. Test on the Vimeo90K testing set.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    

    If you want to test with the smaller model, please change the --net_name and --resume accordingly:

    python test.py --data_root [your Vimeo90K path] --testset VimeoDataset --net_name VFIformerSmall --resume ./pretrained_models/pretrained_VFIformerSmall/net_220.pth --save_result
    

    The testing results are saved in the test_results/ folder. If you do not want to save the image results, you can remove the --save_result argument in the commands optionally.

  3. Test on the MiddleBury dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your MiddleBury path] --testset MiddleburyDataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  4. Test on the UCF101 dataset.

    Modify the argument --data_root according to your data path, run:

    python test.py --data_root [your UCF101 path] --testset UFC101Dataset --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth --save_result
    
  5. Test on the SNU-FILM dataset.

    Modify the argument --data_root according to your data path. Choose the motion level and modify the argument --test_level accordingly, run:

    python FILM_test.py --data_root [your SNU-FILM path] --test_level [easy/medium/hard/extreme] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Training

  1. First train the flow estimator. (Note that skipping this step will not cause a significant impact on performance. We keep this step here only to be consistent with our paper.)
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4174 train.py --launcher pytorch --gpu_ids 0,1,2,3 \
            --loss_flow --use_tb_logger --batch_size 48 --net_name IFNet --name train_IFNet --max_iter 300 --crop_size 192 --save_epoch_freq 5
    
  2. Then train the whole framework.
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformer --name train_VFIformer --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    
  3. To train the smaller version, run:
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=4175 train.py --launcher pytorch --gpu_ids 0,1,2,3,4,5,6,7 \
            --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 24 --net_name VFIformerSmall --name train_VFIformerSmall --max_iter 300 \
            --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth
    

Test on your own data

  1. Modify the arguments --img0_path and --img1_path according to your data path, run:
    python demo.py --img0_path [your img0 path] --img1_path [your img1 path] --save_folder [your save path] --net_name VFIformer --resume ./pretrained_models/pretrained_VFIformer/net_220.pth
    

Acknowledgement

We borrow some codes from RIFE and SwinIR. We thank the authors for their great work.

Citation

Please consider citing our paper in your publications if it is useful for your research.

@inproceedings{lu2022vfiformer,
    title={Video Frame Interpolation with Transformer},
    author={Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, and Jiaya Jia},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022},
}

Contact

[email protected]

Comments
  • How long have you been learning Main training?

    How long have you been learning Main training?

    Hello. Thank you for sharing the code of a good paper.

    How long have you been learning Main training? What did the GPU use when training, and how long does it take to learn all the 300 epochs written on the code?

    opened by junsang7777 2
  • issue of  compute_flow_vimeo.py

    issue of compute_flow_vimeo.py

    Hi, I came across your paper recently and thought it was really impressive.When I run the compute_flow_vimeo.py, I got this error: AttributeError: module 'correlation' has no attribute 'FunctionCorrelation'.

    I want to ask how to solve this problem . Thank you.

    opened by houlaine1 1
  • SNU-FILM Dataset

    SNU-FILM Dataset

    According to the SNU-FILM dataset download link provided, the downloaded data is damaged, and the repaired dataset is incomplete. Can you provide a new download link?

    opened by Zhang-Hu927 0
  • Question about use_crossattn

    Question about use_crossattn

    image Hi, I wouder why the elements of use_crossattn is not all True. In my understanding, if all TFL use the CSWA, this use_crossattn should be all True. Could you help answer this question? Thanks.

    opened by SunInHeart 0
  • issue of missing checkpoints/finetuned-liteflownet-epoch1.pkl

    issue of missing checkpoints/finetuned-liteflownet-epoch1.pkl

    Hi, I came across your paper recently and thought it was really impressive. I am doing the Prepare Dataset part to create .npy file. As you said, I clone the Lite-flownet repo and put compute_flow_vimeo.py into the folder. When I run, I got this error: FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/finetuned-liteflownet-epoch1.pkl'

    I want to ask how can I get finetuned-liteflownet-epoch1.pkl file? One more question, you said remember to change the data path, is the data path is the path of Vimeo90K Triplet dataset?

    Thank you.

    opened by chautuankien 1
  • issue of code

    issue of code

    Hi, there is an issue in the code, "from Timm. models. layers import DropPath, to_2tuple, trunc_normal_, "No module named 'Timm'"?"

    opened by Zhang-Hu927 1
Owner
DV Lab
Deep Vision Lab
DV Lab
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

null 49 Jan 7, 2023
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

revisiting-sepconv This is a reference implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation [1] using PyTorch. Given two f

Simon Niklaus 59 Dec 22, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 4, 2023
An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 984 Dec 16, 2022
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 9, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 4, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

Hust Visual Learning Team 253 Dec 21, 2022
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | 简体中文 This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University 537 Jan 7, 2023
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 4, 2023
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Unsupervised Video Interpolation using Cycle Consistency

Unsupervised Video Interpolation using Cycle Consistency Project | Paper | YouTube Unsupervised Video Interpolation using Cycle Consistency Fitsum A.

NVIDIA Corporation 100 Nov 30, 2022
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 4, 2023