FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Overview

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation (CVPR 2021)

Eg1 Eg2

[project page] [paper] [Project Video]

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

Dependencies

We used the following to train and test the model.

  • Ubuntu 18.04
  • Python==3.7.4
  • numpy==1.19.2
  • PyTorch==1.5.0, torchvision==0.6.0, cudatoolkit==10.1

Model

Training model on Vimeo-90K septuplets

For training your own model on the Vimeo-90K dataset, use the following command. You can download the dataset from this link. The results reported in the paper are trained using 8GPUs.

python main.py --batch_size 32 --test_batch_size 32 --dataset vimeo90K_septuplet --loss 1*L1 --max_epoch 200 --lr 0.0002 --data_root <dataset_path> --n_outputs 1

Training on GoPro dataset is similar, change n_outputs to 7 for 8x interpolation.

Testing using trained model.

Trained Models.

You can download the pretrained FLAVR models from the following links.

Method Trained Model
2x Link
4x Link
8x Link

2x Interpolation

For testing a pretrained model on Vimeo-90K septuplet validation set, you can run the following command:

python test.py --dataset vimeo90K_septuplet --data_root <data_path> --load_from <saved_model> --n_outputs 1

8x Interpolation

For testing a multiframe interpolation model, use the same command as above with multiframe FLAVR model, with n_outputs changed accordingly.

Time Benchmarking

The testing script, in addition to computing PSNR and SSIM values, will also output the inference time and speed for interpolation.

Evaluation on Middleburry

To evaluate on the public benchmark of Middleburry, run the following.

python Middleburry_Test.py --data_root <data_path> --load_from <model_path> 

The interpolated images will be saved to the folder Middleburry in a format that can be readily uploaded to the leaderboard.

SloMo-Filter on custom video

You can use our trained models and apply the slomo filter on your own video (requires OpenCV 4.2.0). Use the following command. If you want to convert a 30FPS video to 240FPS video, simply use the command

python interpolate.py --input_video <input_video> --factor 8 --load_model <model_path>

by using our pretrained model for 8x interpolation. For converting a 30FPS video to 60FPS video, use a 2x model with factor 2.

Baseline Models

We also train models for many other previous works on our setting, and provide models for all these methods. Complete benchmarking scripts will also be released soon.

Method PSNR on Vimeo Trained Model
FLAVR 36.3 Model
AdaCoF 35.3 Model
QVI 35.15 Model
DAIN 34.19 Model
SuperSloMo* 32.90 Model
  • SuperSloMo is implemented using code repository from here. Other baselines are implemented using the official codebases.

Google Colab

Coming soon ... !

Acknowledgement

The code is heavily borrowed from Facebook's official PyTorch video repository and CAIN.

Cite

If this code helps in your work, please consider citing us.

@article{kalluri2021flavr,
  title={FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation},
  author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}
Comments
  • UCF101 testing dataset

    UCF101 testing dataset

    Hi, I found UCF101 original dataset with avi format and UCF101 triplet dataset with png format. But there is no 5-frames dataset availble. Can you provide the method to generate the UCF101 testing dataset for FLAVR.

    opened by tkkcc 11
  • about the training tricks

    about the training tricks

    your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice? image

    opened by mmSir 9
  • Finetune problem

    Finetune problem

    Hello, Thanks for your brilliant work but I have a problem about the finetune. When I finetune your model on my own dataset, the finetuned model predicted twinkled videos and I output the predicted frame, I found that the predicted frame was darker than the adjacent frames. Then I tried train the model from the start using Unet34, but got the similar results that darker. And the PSNR and training loss were improving, but the inference results were worse. Could you please explain to me a little? It's the training details python main.py --batch_size 8 --test_batch_size 8 --dataset vimeo90K_septuplet --loss 1L1 -max_epoch 200 --lr 0.00001 --n_outputs 1 Namespace(batch_size=8, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/vimeo_septuplet', dataset='vimeo90K_septuplet', exp_name='exp' , joinType='concat', load_from=None, log_iter=60, loss='1L1', lr=1e-05, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained='FLA VR_2x.pth', random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=8, upmode='transpose', use_tensorboard=False, val_freq=1

    opened by ECEMACHINE 6
  • about QVI model

    about QVI model

    Hi. Thx for your efforts on benchmarking existing models. I wonder which repo you are using for quadratic video inpainting (QVI) model? Could you please share the link?

    opened by btwbtm 5
  • Vimeo90K triplet test dataset performance issue

    Vimeo90K triplet test dataset performance issue

    Hi,

    I am impressed with your new video frame interpolation paper.

    When I tested, I got 32.59dB in vimeo90K triplet test set.

    According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.

    What is the problem in my fixed code?

    I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.

        class VimeoTriplet(Dataset):
            def __init__(self, data_root):
                self.data_root = data_root
                self.image_root = os.path.join(self.data_root, 'sequences')
            
                test_fn = os.path.join(self.data_root, 'tri_testlist.txt')
    
                with open(test_fn, 'r') as txt:
                    self.seq_list = [line.strip() for line in txt]
            
            def __getitem__(self, index):
                im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
                gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
                im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            
                im1, gt, im3 = map(to_tensor, (im1,gt,im3))
            
                return [im1, im1, im3, im3], [gt]
    
            def __len__(self):
                return len(self.seq_list)
    
    opened by JunHeum 5
  • Can't run FLAVR

    Can't run FLAVR

    At first I tried to use Flowframes, but since it gave out an error I tried following your instructions on github. When I tried to run python interpolate.py --input_video input.mp4 --factor 8 --load_model FLAVR8X.pth I got a very similar if not identical error message:

    13.000209881905063 Traceback (most recent call last): File "interpolate.py", line 133, in <module> videoTensor , resizes = video_transform(videoTensor , args.downscale) File "interpolate.py", line 121, in video_transform videoTensor = transforms(videoTensor) File "C:\Users\frangamer1892roblox\MiniConda3\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "D:\FLAVR\dataset\transforms.py", line 333, in __call__ return to_tensor(clip) File "D:\FLAVR\dataset\transforms.py", line 107, in to_tensor return clip.float().permute(3, 0, 1, 2) / 255.0 RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 25944883200 bytes.

    I am not really use what to do now. This are my specs if they help in any way:

    DxDiag.txt

    opened by FranGamer1892 4
  • Blur output

    Blur output

    Input -

    https://user-images.githubusercontent.com/55460763/121847306-fd1d0480-cd05-11eb-8c8b-7343bdc5718c.mp4

    Output -

    https://user-images.githubusercontent.com/55460763/121847470-3c4b5580-cd06-11eb-9bc5-d9a95657b20c.mp4

    Hey @tarun005 , I used the 8x pretrained model on this video. The output seems blurry mostly at the edges. Can this be improved?

    opened by around-star 3
  • periodic pause of interpolated video

    periodic pause of interpolated video

    https://user-images.githubusercontent.com/25840016/121840789-8c7ae580-cd0f-11eb-87f6-ab40ab5b5125.mp4

    https://user-images.githubusercontent.com/25840016/121840821-a3213c80-cd0f-11eb-967b-8265f38789bf.mp4

    Hi,

    I am using pretrained 8x model to interpolate the demo sprite video as shown on the project homepage. But I find that it seems to "pause" per second. Do you know why? Thx!

    opened by btwbtm 3
  • Questions about the inference time

    Questions about the inference time

    Hi, thanks for your interesting work! I tested the inference time on vimeo90K_septuplet using your script, and i got the time is 0.004 s. It seems too fast? I modified the code and tested again, and the time I got is 0.195 s. image image So, I wonder how the time in your paper was tested?

    bug 
    opened by GreyZzzzzzXh 3
  • "idxs" is missing frames that videoTensor still had when using "--is_folder"

    I have 49 video frames, and if I check the length of videoTensor, it matches (49).

    However, idxs ends up being only 46 long, resulting in the first and last frame not being interpolated.

    opened by n00mkrad 3
  • Unreliable FPS readout causes error

    Unreliable FPS readout causes error

    When I try to interpolate a video, this error pops up:

    File "interpolate.py", line 120, in <module>
        videoTensor = video_to_tensor(input_video)
      File "interpolate.py", line 101, in video_to_tensor
        fps = md["video_fps"]
    KeyError: 'video_fps'
    

    I suspect it fails to read the frame rate for some reason.

    This is one of the reasons I am asking for a manual input: https://github.com/tarun005/FLAVR/issues/4

    enhancement 
    opened by n00mkrad 3
  • Unable to write out results

    Unable to write out results

    Hey,

    I've managed to get up and running with flavr, right up until the final stage. I'm using a directory with a png sequence in it, which successfully runs through the network. But when it comes to writing it out I simply get:

    Writing to  in_2xmp4.mp4
    in_2xmp4: No such file or directory
    Traceback (most recent call last):
      File "interpolate.py", line 164, in <module>
        os.remove(output_video)
    FileNotFoundError: [WinError 2] The system cannot find the file specified: 'in_2xmp4'
    

    I'm reading a sequence of pngs from a directory, using is_folder which is great; is there a way to write out a sequence of pngs rather than a video?

    opened by frostedbrain 0
  • How to cascade different speed models?

    How to cascade different speed models?

    Hi Tarun,

    from the #32 issue I know we can cascade different models to make more speed interpolation,such cascade(2x,8x) models to make 16x interpolation, but how to do the cascade? Is that I use 2x model to generate 2x slow sequences firstly,and then apply 8x model to the 2x slow sequences?

    opened by pango99 0
  • DAVIS training or testing set for single frame interpolation

    DAVIS training or testing set for single frame interpolation

    In https://github.com/tarun005/FLAVR/blob/main/dataset/Davis_test.py, do you use DAVIS's training or testing set? The paper says 2847 quintuples are generated in total, but I found the training set can generate 2849 quintuples, while the testing set can generate 963 quintuples.

    opened by JingyunLiang 0
  • training issue about PSNR

    training issue about PSNR

    Hi, tarun ,excellent work on video interpolation! I tried run your code , but I have some trouble. I set my config as batch_size=2, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/home/Changchen/dataset./vimeo_septuplet', dataset='vimeo90K', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1*L1', lr=0.0002, max_epoch=50, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=1, upmode='transpose', use_tensorboard=False, val_freq=1 At the beginning, psnr was normal about 20,but it has gradually decreased to about 14. I wonder why it seems to be misconvergence. Thank you for any help!

    opened by ss00atbupt 4
  • Training issue

    Training issue

    Hi, author, thank you for sharing the code on GitHub. The code performed well in test, but the PSNR value was always maintained at about 17dB during training. What is the reason?

    opened by weiMytian 3
  • Training issue

    Training issue

    Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option. Do you have any idea why this occurs and any possible solutions?

    Thanks

    opened by issakh 4
Owner
Tarun K
Deep Learning. Mostly Python, PyTorch and Tensorflow.
Tarun K
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 4, 2023
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 9, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Self-Supervised Multi-Frame Monocular Scene Flow 3D visualization of estimated depth and scene flow (overlayed with input image) from temporally conse

Visual Inference Lab @TU Darmstadt 85 Dec 22, 2022
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

This is a Pytorch implementation of Janai, J., Güney, F., Ranjan, A., Black, M. and Geiger, A., Unsupervised Learning of Multi-Frame Optical Flow with

Anurag Ranjan 110 Nov 2, 2022
Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

CDFI (Compression-Driven-Frame-Interpolation) [Paper] (Coming soon...) | [arXiv] Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov IEEE Conference

Tianyu Ding 95 Dec 4, 2022
This is the official repository of XVFI (eXtreme Video Frame Interpolation)

XVFI This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206 Last Update: 20210607 We provide th

Jihyong Oh 195 Dec 29, 2022
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

revisiting-sepconv This is a reference implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation [1] using PyTorch. Given two f

Simon Niklaus 59 Dec 22, 2022
An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 984 Dec 16, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 63 Dec 16, 2022
A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Fast Symbolic Regression Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aim

VAMSHI CHOWDARY 3 Jun 22, 2022
A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

AI-Biomed @NSCC-gz 3 May 8, 2022
Just Go with the Flow: Self-Supervised Scene Flow Estimation

Just Go with the Flow: Self-Supervised Scene Flow Estimation Code release for the paper Just Go with the Flow: Self-Supervised Scene Flow Estimation,

Himangi Mittal 50 Nov 22, 2022
DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction This is the implementation of DeepSTD in

null 5 Sep 26, 2022
Provide baselines and evaluation metrics of the task: traffic flow prediction

Note: This repo is adpoted from https://github.com/UNIMIBInside/Smart-Mobility-Prediction. Due to technical reasons, I did not fork their code. Introd

Zhangzhi Peng 11 Nov 2, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022