FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Tarun K

Last update: Dec 23, 2022

Related tags

Deep Learning machine-learning video deep-learning artificial-intelligence slomo-filter 8x-interpolation

Overview

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation (CVPR 2021)

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

Dependencies

We used the following to train and test the model.

Ubuntu 18.04
Python==3.7.4
numpy==1.19.2
PyTorch==1.5.0, torchvision==0.6.0, cudatoolkit==10.1

Model

Training model on Vimeo-90K septuplets

For training your own model on the Vimeo-90K dataset, use the following command. You can download the dataset from this link. The results reported in the paper are trained using 8GPUs.

python main.py --batch_size 32 --test_batch_size 32 --dataset vimeo90K_septuplet --loss 1*L1 --max_epoch 200 --lr 0.0002 --data_root <dataset_path> --n_outputs 1

Training on GoPro dataset is similar, change n_outputs to 7 for 8x interpolation.

Testing using trained model.

Trained Models.

You can download the pretrained FLAVR models from the following links.

Method	Trained Model
2x	Link
4x	Link
8x	Link

2x Interpolation

For testing a pretrained model on Vimeo-90K septuplet validation set, you can run the following command:

python test.py --dataset vimeo90K_septuplet --data_root <data_path> --load_from <saved_model> --n_outputs 1

8x Interpolation

For testing a multiframe interpolation model, use the same command as above with multiframe FLAVR model, with n_outputs changed accordingly.

Time Benchmarking

The testing script, in addition to computing PSNR and SSIM values, will also output the inference time and speed for interpolation.

Evaluation on Middleburry

To evaluate on the public benchmark of Middleburry, run the following.

python Middleburry_Test.py --data_root <data_path> --load_from <model_path>

The interpolated images will be saved to the folder Middleburry in a format that can be readily uploaded to the leaderboard.

SloMo-Filter on custom video

You can use our trained models and apply the slomo filter on your own video (requires OpenCV 4.2.0). Use the following command. If you want to convert a 30FPS video to 240FPS video, simply use the command

python interpolate.py --input_video <input_video> --factor 8 --load_model <model_path>

by using our pretrained model for 8x interpolation. For converting a 30FPS video to 60FPS video, use a 2x model with factor 2.

Baseline Models

We also train models for many other previous works on our setting, and provide models for all these methods. Complete benchmarking scripts will also be released soon.

Method	PSNR on Vimeo	Trained Model
FLAVR	36.3	Model
AdaCoF	35.3	Model
QVI	35.15	Model
DAIN	34.19	Model
SuperSloMo*	32.90	Model

SuperSloMo is implemented using code repository from here. Other baselines are implemented using the official codebases.

Google Colab

Coming soon ... !

Acknowledgement

The code is heavily borrowed from Facebook's official PyTorch video repository and CAIN.

Cite

If this code helps in your work, please consider citing us.

@article{kalluri2021flavr,
  title={FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation},
  author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Comments

UCF101 testing dataset

Hi, I found UCF101 original dataset with avi format and UCF101 triplet dataset with png format. But there is no 5-frames dataset availble. Can you provide the method to generate the UCF101 testing dataset for FLAVR.

opened by tkkcc 11
about the training tricks

your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice?

opened by mmSir 9
Finetune problem

Hello, Thanks for your brilliant work but I have a problem about the finetune. When I finetune your model on my own dataset, the finetuned model predicted twinkled videos and I output the predicted frame, I found that the predicted frame was darker than the adjacent frames. Then I tried train the model from the start using Unet34, but got the similar results that darker. And the PSNR and training loss were improving, but the inference results were worse. Could you please explain to me a little? It's the training details python main.py --batch_size 8 --test_batch_size 8 --dataset vimeo90K_septuplet --loss 1L1 -max_epoch 200 --lr 0.00001 --n_outputs 1 Namespace(batch_size=8, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/vimeo_septuplet', dataset='vimeo90K_septuplet', exp_name='exp' , joinType='concat', load_from=None, log_iter=60, loss='1L1', lr=1e-05, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained='FLA VR_2x.pth', random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=8, upmode='transpose', use_tensorboard=False, val_freq=1

opened by ECEMACHINE 6
about QVI model

Hi. Thx for your efforts on benchmarking existing models. I wonder which repo you are using for quadratic video inpainting (QVI) model? Could you please share the link?

opened by btwbtm 5

Vimeo90K triplet test dataset performance issue

Hi,

I am impressed with your new video frame interpolation paper.

When I tested, I got 32.59dB in vimeo90K triplet test set.

According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.

What is the problem in my fixed code?

I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.

    class VimeoTriplet(Dataset):
        def __init__(self, data_root):
            self.data_root = data_root
            self.image_root = os.path.join(self.data_root, 'sequences')
        
            test_fn = os.path.join(self.data_root, 'tri_testlist.txt')

            with open(test_fn, 'r') as txt:
                self.seq_list = [line.strip() for line in txt]
        
        def __getitem__(self, index):
            im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
            im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')
        
            im1, gt, im3 = map(to_tensor, (im1,gt,im3))
        
            return [im1, im1, im3, im3], [gt]

        def __len__(self):
            return len(self.seq_list)

opened by JunHeum 5

Can't run FLAVR

At first I tried to use Flowframes, but since it gave out an error I tried following your instructions on github. When I tried to run python interpolate.py --input_video input.mp4 --factor 8 --load_model FLAVR8X.pth I got a very similar if not identical error message:

13.000209881905063 Traceback (most recent call last): File "interpolate.py", line 133, in <module> videoTensor , resizes = video_transform(videoTensor , args.downscale) File "interpolate.py", line 121, in video_transform videoTensor = transforms(videoTensor) File "C:\Users\frangamer1892roblox\MiniConda3\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "D:\FLAVR\dataset\transforms.py", line 333, in __call__ return to_tensor(clip) File "D:\FLAVR\dataset\transforms.py", line 107, in to_tensor return clip.float().permute(3, 0, 1, 2) / 255.0 RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 25944883200 bytes.

I am not really use what to do now. This are my specs if they help in any way:

DxDiag.txt

opened by FranGamer1892 4
Blur output

Input -

https://user-images.githubusercontent.com/55460763/121847306-fd1d0480-cd05-11eb-8c8b-7343bdc5718c.mp4

Output -

https://user-images.githubusercontent.com/55460763/121847470-3c4b5580-cd06-11eb-9bc5-d9a95657b20c.mp4

Hey @tarun005 , I used the 8x pretrained model on this video. The output seems blurry mostly at the edges. Can this be improved?

opened by around-star 3
periodic pause of interpolated video

https://user-images.githubusercontent.com/25840016/121840789-8c7ae580-cd0f-11eb-87f6-ab40ab5b5125.mp4

https://user-images.githubusercontent.com/25840016/121840821-a3213c80-cd0f-11eb-967b-8265f38789bf.mp4

Hi,

I am using pretrained 8x model to interpolate the demo sprite video as shown on the project homepage. But I find that it seems to "pause" per second. Do you know why? Thx!

opened by btwbtm 3
Questions about the inference time

Hi, thanks for your interesting work! I tested the inference time on vimeo90K_septuplet using your script, and i got the time is 0.004 s. It seems too fast? I modified the code and tested again, and the time I got is 0.195 s. So, I wonder how the time in your paper was tested?
bug

opened by GreyZzzzzzXh 3
"idxs" is missing frames that videoTensor still had when using "--is_folder"

I have 49 video frames, and if I check the length of videoTensor, it matches (49).

However, idxs ends up being only 46 long, resulting in the first and last frame not being interpolated.

opened by n00mkrad 3
Unreliable FPS readout causes error
When I try to interpolate a video, this error pops up:

File "interpolate.py", line 120, in <module> videoTensor = video_to_tensor(input_video) File "interpolate.py", line 101, in video_to_tensor fps = md["video_fps"] KeyError: 'video_fps'

I suspect it fails to read the frame rate for some reason.

This is one of the reasons I am asking for a manual input: https://github.com/tarun005/FLAVR/issues/4
enhancement
opened by n00mkrad 3
Unable to write out results
Hey,

I've managed to get up and running with flavr, right up until the final stage. I'm using a directory with a png sequence in it, which successfully runs through the network. But when it comes to writing it out I simply get:

Writing to in_2xmp4.mp4 in_2xmp4: No such file or directory Traceback (most recent call last): File "interpolate.py", line 164, in <module> os.remove(output_video) FileNotFoundError: [WinError 2] The system cannot find the file specified: 'in_2xmp4'

I'm reading a sequence of pngs from a directory, using is_folder which is great; is there a way to write out a sequence of pngs rather than a video?
opened by frostedbrain 0
How to cascade different speed models?

Hi Tarun,

from the #32 issue I know we can cascade different models to make more speed interpolation,such cascade(2x,8x) models to make 16x interpolation, but how to do the cascade? Is that I use 2x model to generate 2x slow sequences firstly,and then apply 8x model to the 2x slow sequences?

opened by pango99 0
DAVIS training or testing set for single frame interpolation

In https://github.com/tarun005/FLAVR/blob/main/dataset/Davis_test.py, do you use DAVIS's training or testing set? The paper says 2847 quintuples are generated in total, but I found the training set can generate 2849 quintuples, while the testing set can generate 963 quintuples.

opened by JingyunLiang 0
training issue about PSNR

Hi, tarun ,excellent work on video interpolation! I tried run your code , but I have some trouble. I set my config as batch_size=2, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/home/Changchen/dataset./vimeo_septuplet', dataset='vimeo90K', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1*L1', lr=0.0002, max_epoch=50, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=1, upmode='transpose', use_tensorboard=False, val_freq=1 At the beginning, psnr was normal about 20,but it has gradually decreased to about 14. I wonder why it seems to be misconvergence. Thank you for any help!

opened by ss00atbupt 4
Training issue

Hi, author, thank you for sharing the code on GitHub. The code performed well in test, but the PSNR value was always maintained at about 17dB during training. What is the reason?

opened by weiMytian 3
Training issue

Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option. Do you have any idea why this occurs and any possible solutions?

Thanks

opened by issakh 4

Owner

Tarun K

Deep Learning. Mostly Python, PyTorch and Tensorflow.

GitHub

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

28 Dec 9, 2022

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

4 Dec 15, 2022

Python script for Linear, Non-Linear Convection, Burger’s & Poisson Equation in 1D & 2D, 1D Diffusion Equation using Standard Wall Function, 2D Heat Conduction Convection equation with Dirichlet & Neumann BC, full Navier-Stokes Equation coupled with Poisson equation for Cavity and Channel flow in 2D using Finite Difference Method & Finite Volume Method.

Navier-Stokes-numerical-solution-using-Python- Python script for Linear, Non-Linear Convection, Burger’s & Poisson Equation in 1D & 2D, 1D D

89 Jan 4, 2023

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

11 Oct 14, 2022

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Related tags

Overview

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation (CVPR 2021)

Dependencies

Model

Training model on Vimeo-90K septuplets

Testing using trained model.

Trained Models.

2x Interpolation

8x Interpolation

Time Benchmarking

Evaluation on Middleburry

SloMo-Filter on custom video

Baseline Models

Google Colab

Acknowledgement

Cite

Comments

Owner

Tarun K

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

Self-Supervised Multi-Frame Monocular Scene Flow (CVPR 2021)

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

Video Frame Interpolation with Transformer (CVPR2022)

A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Just Go with the Flow: Self-Supervised Scene Flow Estimation

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

Provide baselines and evaluation metrics of the task: traffic flow prediction

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.