Unsupervised Video Interpolation using Cycle Consistency

Project | Paper | YouTube

Unsupervised Video Interpolation using Cycle Consistency
Fitsum A. Reda, Deqing Sun^*, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
NVIDIA Corporation
In International Conferene on Computer Vision (ICCV) 2019.
( * Currently affiliated with Google. )

Installation

# Get unsupervised video interpolation source codes
git clone https://github.com/NVIDIA/unsupervised-video-interpolation.git
cd unsupervised-video-interpolation
mkdir pretrained_models

# Build Docker Image
docker build -t unsupervised-video-interpolation -f Dockerfile .

If you prefer not to use docker, you can manually install the following requirements:

An NVIDIA GPU and CUDA 9.0 or higher. Some operations only have gpu implementation.
PyTorch (>= 1.0)
Python 3
numpy
scikit-image
imageio
pillow
tqdm
tensorboardX
natsort
ffmpeg
torchvision

To propose a model or change for inclusion, please submit a pull request.

Multiple GPU training and mixed precision training are supported, and the code provides examples for training and inference. For more help, type

python3 train.py --help

Network Architectures

Our repo now supports Super SloMo. Other video interpolation architectures can be integrated with our repo with minimal changes, for instance DVF or SepConv.

Pre-trained Models

We've included pre-trained models trained with cycle consistency (CC) alone, or with cycle consistency with Psuedo-supervised (CC + PS) losses.
Download checkpoints to a folder pretrained_models.

Supervised Baseline Weights

pretrained_models/baseline_superslomo_adobe.pth(Losses with Paired Ground-Truth )
pretrained_models/baseline_superslomo_adobe+youtube.pth(Losses with Paired Ground-Truth )

Unsupervised Finetuned Weights

pretrained_models/unsupervised_random2slowflow.pth(CC only)
pretrained_models/unsupervised_adobe2slowflow.pth(CC+PS)
pretrained_models/unsupervised_adobe+youtube2slowflow.pth(CC+PS)
pretrained_models/unsupervised_random2sintel.pth(CC only)
pretrained_models/unsupervised_adobe2sintel.pth(CC+PS)
pretrained_models/unsupervised_adobe+youtube2sintel.pth(CC+PS)

Fully Unsupervised Weights for UCF101 evaluation

pretrained_models/fully_unsupervised_adobe30fps.pth(CC only)
pretrained_models/fully_unsupervised_battlefield30fps.pth(CC only)

Data Loaders

We use VideoInterp and CycleVideoInterp (in datasets) dataloaders for all frame sequences, i.e. Adobe, YouTube, SlowFlow, Sintel, and UCF101.

We split Slowflow dataset into disjoint sets: A low FPS training (3.4K frames) and a high FPS test (414 frames) subset. We form the test set by selecting the first nine frames in each of the 46 clips, and train set by temporally sub-sampling the remaining frames from 240-fps to 30-fps. During evaluation, our models take as input the first and ninth frame in each test clip and interpolate seven intermediate frames. We follow a similar procedure for Sintel-1008fps, but interpolate 41 intermediate frames, i.e., conversion of frame rate from 24- to 1008-fps. Note, since SlowFlow and Sintel are of high resolution, we downsample all frames by a factor of 2 isotropically.
All training and evaluations presented in the paper are done on the spatially downsampled sequences.

For UCF101, we simply use the the test provided here.

Generating Interpolated Frames or Videos

--write_video and --write_images, if enabled will create an interpolated video and interpolated frame sequences, respectively.

#Example creation of interpolated videos, where we interleave low FPS input frames with one or more interpolated intermediate frames.
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file ${/path/to/input/sequences} \
    --name ${video_name} --save ${/path/to/output/folder} --post_fix ${output_image_tag} \
    --resume ${/path/to/pre-trained/model} --write_video

If input sequences for interpolation do not contain ground-truth intermediate frames, add --val_sample_rate 0 and --val_step_size 1 to the example script above.
For a simple test on two input frames, set --val_file to the folder containing both frames, and set --val_sample_rate 0, --val_step_size 1.

Images : Results and Comparisons

Inference for Unsupervised Models

UCF101: A total of 379 folders, each with three frames, with the middle frame being the ground-truth for a single frame interpolation.

# Evaluation of model trained with CC alone on Adobe-30fps dataset
# PSNR: 34.47, SSIM: 0.946, IE: 5.50
python3 eval.py --model CycleHJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/fully_unsupervised_adobe30fps.pth

# Evaluation of model trained with CC alone on Battlefield-30fps dataset
# PSNR: 34.55, SSIM: 0.947, IE: 5.38
python3 eval.py --model CycleHJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/fully_unsupervised_battlefield30fps.pth

SlowFlow: A total of 46 folders, each with nine frames, with the intermediate nine frames being ground-truths for a 30->240FPS multi-frame interpolation.

# Evaluation of model trained with CC alone on SlowFlow-30fps train split
# PSNR: 32.35, SSIM: 0.886, IE: 6.78
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_random2slowflow.pth

# Evaluation of model finetuned with CC+PS losses on SlowFlow-30fps train split.
# Model pre-trained with supervision on Adobe-240fps.
# PSNR: 33.05, SSIM: 0.890, IE: 6.62
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_adobe2slowflow.pth

# Evaluation of model finetuned with CC+PS losses on SlowFlow-30fps train split.
# Model pre-trained with supervision on Adobe+YouTube-240fps.
# PSNR: 33.20, SSIM: 0.891, IE: 6.56
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_adobe+youtube2slowflow.pth

Sintel: A total of 13 folders, each with 43 frames, with the intermediate 41 frames being ground-truths for a 30->1008FPS multi-frame interpolation.

We simply use the same commands used for SlowFlow, but setting `--num_interp 41`
and the corresponding `--resume *2sintel.pth` pre-trained models should lead to the number we presented in our papers.

Inference for Supervised Baseline Models

UCF101: A total of 379 folders, each with three frames, with the middle frame being the ground-truth for a single frame interpolation.

# Evaluation of model trained with Paird-GT on Adobe-240fps dataset
# PSNR: 34.63, SSIM: 0.946, IE: 5.48
python3 eval.py --model HJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/baseline_superslomo_adobe.pth

SlowFlow: A total of 46 folders, each with nine frames, with the intermediate nine frames being ground-truths for a 30->240FPS multi-frame interpolation.

# Evaluation of model trained with paird-GT on Adobe-240fps dataset
# PSNR: 32.84, SSIM: 0.887, IE: 6.67
python3 eval.py --model HJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/baseline_superslomo_adobe.pth

# Evaluation of model trained with paird-GT on Adobe+YouTube-240fps dataset
# PSNR: 33.13, SSIM: 0.889, IE: 6.63
python3 eval.py --model HJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/baseline_superslomo_adobe+youtube.pth

Sintel: We use commands similar to SlowFlow, but setting --num_interp 41.

Training and Reproducing Our Results

# CC alone: Fully unsupervised training on SlowFlow and evaluation on SlowFlow
# SlowFlow/val target PSNR: 32.35, SSIM: 0.886, IE: 6.78
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model CycleHJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 384 384 --print_freq 1 --dataset CycleVideoInterp \
    --step_size 1 --sample_rate 0 --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 \
    --train_file /path/to/SlowFlow/train --val_file SlowFlow/val --name unsupervised_slowflow --save /path/to/output 

# --nproc_per_node=16, we use a total of 16 V100 GPUs over two nodes.

# CC + PS: Unsupervised fine-tuning on SlowFlow with a baseline model pre-trained on Adobe+YouTube-240fps.
# SlowFlow/val target PSNR: 33.20, SSIM: 0.891, IE: 6.56
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model CycleHJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 384 384 --print_freq 1 --dataset CycleVideoInterp \
    --step_size 1 --sample_rate 0 --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 \
    --train_file /path/to/SlowFlow/train --val_file /path/to/SlowFlow/val --name finetune_slowflow \
    --save /path/to/output --resume ./pretrained_models/baseline_superslomo_adobe+youtube.pth

# Supervised baseline training on Adobe240-fps and evaluation on SlowFlow
# SlowFlow/val target PSNR: 32.84, SSIM: 0.887, IE: 6.67
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model HJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 352 352 --print_freq 1 --dataset VideoInterp \
    --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 --stride 32 \
    --train_file /path/to/Adobe-240fps/train --val_file /path/to/SlowFlow/val --name supervised_adobe \
    --save /path/to/output

Reference

If you find this implementation useful in your work, please acknowledge it appropriately and cite the paper or code accordingly:

@InProceedings{Reda_2019_ICCV,
author = {Fitsum A Reda and Deqing Sun and Aysegul Dundar and Mohammad Shoeybi and Guilin Liu and Kevin J Shih and Andrew Tao and Jan Kautz and Bryan Catanzaro},
title = {Unsupervised Video Interpolation Using Cycle Consistency},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019},
url={https://nv-adlr.github.io/publication/2019-UnsupervisedVideoInterpolation}
}

We encourage people to contribute to our code base and provide suggestions, point any issues, or solution using merge request, and we hope this repo is useful.

Acknowledgments

Parts of the code were inspired by NVIDIA/flownet2-pytorch, ClementPinard/FlowNetPytorch, and avinashpaliwal/Super-SloMo.

We would also like to thank Huaizu Jiang.

Coding style

4 spaces for indentation rather than tabs
80 character line length
PEP8 formatting

The performance in UCF101 is bad..

I use this code to eval the pre-train model on UCF101, but the PSNR and SSIM are lower than the values in the paper.

python3 eval.py --model CycleHJSuperSloMo --num_interp 1 --flow_scale 1 --dataset CycleVideoInterp --val_batch_size 1 --val_file /home/data/VFI/ucf101_interp_for_paper --name Pretrain_FullUnsupervised_Adobe_ucf101_3.31 --save /home/wpk/VFI/code/result_folder --post_fix wpk --resume /home/wpk/VFI/code/pretrained_models/fully_unsupervised_adobe30fps.pth --val_sample_rate 1 --val_step_size 1 --write_images

PSNR and SSIM in the paper are 34.47 and 0.946 and i get,

opened by pongkun 2
Help for the dataset

Thanks for the release of the code. However I can find the dataset of Slowflow-240fps and Sintel-1008fps on this website http://www.cvlibs.net/projects/slow_flow/. As the provided "Benchmark Dataset" and "Teaser Dataset" don't seem to be the Slowflow-240fps or Sintel-1008fps in this website.

looking forward to your replay Thanks

opened by hityzy1122 2

HTTP 401 when building the docker container

[1/15] FROM nvcr.io/nvidia/pytorch:19.04-py3@sha256:1d192a6c619bca23d9886d8eece127943ae0372711c052dc51eddbc740f805da:

failed to solve with frontend dockerfile.v0: failed to build LLB: failed to copy: httpReaderSeeker: failed open: failed to fetch anonymous token: unexpected status: 401 Unauthorized

opened by Jansza 1

TypeError: 'numpy.float64' object cannot be interpreted as an integer

If the data format is video type, it will report " 'numpy.float64' object cannot be interpreted as an integer", does this code only process pictures ?

opened by lwdoubles 1
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 10 and 11 in dimension 2

when i want to Training and Reproducing the Results. i want to train Supervised baseline model(training on Adobe240-fps and evaluation on SlowFlow).

I use this parameter

i meet the problem ,

Is any one have the same problem?

opened by s91005tw 0
How to make such a comparison video demo?

It is a wounderful work! And I like the comparison video demo on Youtube very much! I was wondering how to make such a comparison demo. by 2 video? by ffmpeg?

opened by chen-san 0

Unsupervised Video Interpolation using Cycle Consistency

Related tags

Overview

Unsupervised Video Interpolation using Cycle Consistency

Project | Paper | YouTube

Network Architectures

Pre-trained Models

Data Loaders

Generating Interpolated Frames or Videos

Images : Results and Comparisons

Inference for Unsupervised Models

Inference for Supervised Baseline Models

Training and Reproducing Our Results

Reference

Acknowledgments

Coding style

Comments

The performance in UCF101 is bad..

Help for the dataset

HTTP 401 when building the docker container

TypeError: 'numpy.float64' object cannot be interpreted as an integer

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 10 and 11 in dimension 2

How to make such a comparison video demo?

Owner

NVIDIA Corporation

Disentangled Cycle Consistency for Highly-realistic Virtual Try-On, CVPR 2021

Learning Correspondence from the Cycle-consistency of Time (CVPR 2019)

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Video Frame Interpolation with Transformer (CVPR2022)

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

pcnaDeep integrates cutting-edge detection techniques with tracking and cell cycle resolving models.