This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Jihyong Oh

Last update: Dec 29, 2022

Related tags

Deep Learning deep-learning pytorch dataset convolutional-neural-networks frame-interpolation video-frame-interpolation extreme-video-frame-interpolatioin 4k-frame

Overview

XVFI

This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206

Last Update: 20210607

We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.

Examples of the VFI (x8 Multi-Frame Interpolation) results on X-TEST

The 4K@30fps input frames are interpolated to be 4K@240fps frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.

X4K1000FPS
Requirements
Test
Test_Custom
Training
Reference
Contact

X4K1000FPS

Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion.

Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)

We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.

X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:

├──── YOUR_DIR/
    ├──── test/
       ├──── Type1/
          ├──── TEST01/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── TEST02/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── ...
       ├──── ...

X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:

├──── YOUR_DIR/
    ├──── train/
       ├──── 002/
          ├──── occ008.320/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── occ008.322/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── ...
       ├──── ...

After downloading the files from the link, decompress the encoded_test.tar.gz and encoded_train.tar.gz. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py. Please follow the instruction written in mp4_decoding.py.

Requirements

Our code is implemented using PyTorch1.7, and was tested under the following setting:

Python 3.7
PyTorch 1.7.1
CUDA 10.2
cuDNN 7.6.5
NVIDIA TITAN RTX GPU
Ubuntu 16.04 LTS

Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.

Test

Quick Start for X-TEST (x8 Multi-Frame Interpolation as in Table 2)

Download the source codes in a directory of your choice .
First download our X-TEST test dataset by following the above section 'X4K1000FPS'.
Download the pre-trained weights, which was trained by X-TRAIN, from this link to place in /checkpoint_dir/XVFInet_X4K1000FPS_exp1.

XVFI
└── checkpoint_dir
   └── XVFInet_X4K1000FPS_exp1
       ├── XVFInet_X4K1000FPS_exp1_latest.pt

Run main.py with the following options in parse_args:

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8

==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8

==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).

Description

After running with the above test option, you can get the result images in /test_img_dir/XVFInet_X4K1000FPS_exp1, then obtain the PSNR/SSIM/tOF results per each test clip as "total_metrics.csv" in the same folder.
Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.

Quick Start for Vimeo90K (as in Fig. 8)

Download the source codes in a directory of your choice .
First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.

XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt

Download the pre-trained weights (XVFI-Net_v), which was trained by Vimeo90K, from this link to place in /checkpoint_dir/XVFInet_Vimeo_exp1.

XVFI
└── checkpoint_dir
   └── XVFInet_Vimeo_exp1
       ├── XVFInet_Vimeo_exp1_latest.pt

Run main.py with the following options in parse_args:

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2

==> It would yield PSNR = 35.07 on Vimeo90K.

Description

After running with the above test option, you can get the result images in /test_img_dir/XVFInet_Vimeo_exp1.
There are certain code lines in front of the 'def main()' for a convenience when running with the Vimeo option.
The SSIM result of 0.9760 as in Fig. 8 was measured by matlab ssim function for a fair comparison after running the above guide because other SOTA methods did so. We also upload "compare_psnr_ssim.m" matlab file to obtain it.
It should be noted that there is a typo "S_trn and S_tst are set to 2" in the current version of XVFI paper, which should be modified to 1 (not 2), sorry for inconvenience.

Test_Custom

Quick Start for your own video data ('--custom_path') for any Multi-Frame Interpolation (x M)

Download the source codes in a directory of your choice .
First prepare your own video datasets in /custom_path by following a hierarchy as belows:

XVFI
└── custom_path
   ├── scene1
       ├── 'xxx.png'
       ├── ...
       └── 'xxx.png'
   ...
   
   ├── sceneN
       ├── 'xxxxx.png'
       ├── ...
       └── 'xxxxx.png'

Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.
Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):

# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'

# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'

Description

Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
It only supports for '.png' format.
Since we can not cover diverse possibilites of naming rule for custom frames, please sort your own frames properly.

Training

Quick Start for X-TRAIN

Download the source codes in a directory of your choice .
First download our X-TRAIN train/val/test datasets by following the above section 'X4K1000FPS' and place them as belows:

XVFI
└── X4K1000FPS
      ├──  train
          ├── 002
          ├── ...
          └── 172
      ├──  val
          ├── Type1
          ├── Type2
          ├── Type3
      ├──  test
          ├── Type1
          ├── Type2
          ├── Type3

Run main.py with the following options in parse_args:

python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5

Quick Start for Vimeo90K

Download the source codes in a directory of your choice .
First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.

XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt

Run main.py with the following options in parse_args:

python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1

Description

You can freely regulate other arguments in the parser of main.py, here

Reference

Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", https://arxiv.org/abs/2103.16206, 2021. (* equal contribution)

BibTeX

@article{sim2021xvfi,
  title={XVFI: eXtreme Video Frame Interpolation},
  author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
  journal={arXiv preprint arXiv:2103.16206},
  year={2021}
}

Contact

If you have any question, please send an email to either flhy5836@kaist.ac.kr or jhoh94@kaist.ac.kr.

License

The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.

Comments

Longer sequences for validation / testing?

Hi, thanks to the authors for their impressive work.

I have a question on the data for validation / testing.

The currently available version for the public seems to be aimed for testing environments which take two frames as the input, namely the 0-th and 32-th frame of each scene.

However in that case, I'm afraid methods such as QVI (NeurIPS '19) which require more than two input frames cannot be compared fairly.

We could use more intermediate frames as the input (e.g. 0-th, 16-th, 32-th frame for a framework which requires 3 input frames), but this may lead to a slightly different scenario (different fps settings), considering that the intended testing environment is interpolating from 30fps to 240fps.

More importantly, I've tried experimenting on interpolating 120fps to 960(or 1000)fps, using the intermediate frames as input, but seems like the task gets too easy and all methods that I've tried perform very well, making it hard to compare which is better. For these reasons I think it would rather be better with a longer sequence...

According to the example videos on the very first figure of this repository, it seems like the original video sequence for validation / testing seems to be longer than the public version.

Would it be possible for you to share a version of a longer sequence to the public?

opened by JHLew 6
A dataset question

i have downloaded the X4K1000FPS dataset from your link. however the data in subfolder is in .mp4 format, but in your readme.md you showed that in .png format.How can i convert the dataset in .mp4 format to .png format. the error isfollowing and when i found that the module that make the train set doesn't work. when printing the sample_path, it tell me that: 'sample_paths=[]' which means sample_path is empty.

opened by syd1997 3
Equations (1) and (2) in paper

Hi,

Thank you for sharing the dataset and I look forward to see the code as well.

Regarding the Equations (1) and (2) in the paper: since the linear motion approximation combines the bidirectional flow, shouldn't the division in the equation be separated into two terms, such that the normalization is only applied to the appropriate flow. Specifically, -F_0t should only be normalized with w_0 in Equation (1) and not w_1. Analogous conditions apply for Equation (2).

opened by ambitionforcomputervision 3
can't download dataset successfully
Thank you for sharing your code! When i try to download your training dataset in https://www.dropbox.com/sh/duisote638etlv2/AABJw5Vygk94AWjGM4Se0Goza?dl=0&preview=encoded_train.tar.gz，I cannot get it successfully.

I tried to download it in chrome directly, after it finished, but when i try to "tar zxvf encoded_train.tar.gz", it said "gzip: stdin: unexpected end of file. tar: Unexpected EOF in archive. tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now". then i check the size of the tar, only 6.2GB not 14GB, this method is not working.

Then I tried to use wget do download dataset, but it always stopped in the middle of progressbar, I think the server of dropbox is not stability.

So, could you tell me how i can download dataset successfully? or would you mind make a copy and put it to Google drive or baidu cloud disk? Thank you very much!
opened by zhouzhengguang 2
Can not download the pretrianed models from Dropbox~

Hi, thank you for your open source, I found that the pretrained model can not be downloaded from the dropbox, and it may be the problem of internet's firewall. Could you please to upload pretrained models in other tools (like one-drive or baidu netdisk)? looking forward to your reply~

opened by liangyang-mt 1
Questions about Shared Parameters

Hi! Congratulations! I got a question that why you try to share paramters between those sub-networks. Is there any other motivations except for just reducing the number of paramters, or maybe some theories, explainations and experiments on it? I will be appreciated if you could reply as soon as you could .

opened by nemoHy 3
Very inefficient inference

Hello, the inference code seems to have rather severe bottlenecks - The CUDA usage is only around 25%.

RIFE and other interpolation networks usually have a usage of 80-95%.

Are any optimizations planned to reduce this overhead?

opened by n00mkrad 1
some questions
Thanks for your wonderful jobs!

I have some questions：

Have you compared it with RIFE?

Does the method use explicit optical flow supervision?

some bad cases "python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 2 --custom_path ./test_img_dir/test4" input image1 input image2 result
opened by KevenLee 6