Code of paper "CDFI: Compression-Driven Network Design for Frame Interpolation", CVPR 2021

Tianyu Ding

Last update: Dec 4, 2022

Related tags

Deep Learning CDFI

Overview

CDFI (Compression-Driven-Frame-Interpolation)

[Paper] (Coming soon...) | [arXiv]

Tianyu Ding*, Luming Liang*, Zhihui Zhu, Ilya Zharkov

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Introduction

We propose a Compression-Driven network design for Frame Interpolation (CDFI), that leverages model compression to significantly reduce the model size (allows a better understanding of the current architecture) while making room for further improvements and achieving superior performance in the end. Concretely, we first compress AdaCoF and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we improve upon this compressed model with simple modifications. Note that typically it is prohibitive to implement the same improvements on the original heavy model.

We achieve a significant performance gain with only a quarter in size compared with the original AdaCoF

	Vimeo-90K	Middlebury	UCF101-DVF	#Params
	PSNR, SSIM, LPIPS	PSNR, SSIM, LPIPS	PSNR, SSIM, LPIPS
AdaCoF	34.38, 0.974, 0.019	35.74, 0.979, 0.019	35.20, 0.967, 0.019	21.8M
Compressed AdaCoF	34.15, 0.973, 0.020	35.46, 0.978, 0.019	35.14, 0.967, 0.019	2.45M
AdaCoF+	34.58, 0.975, 0.018	36.12, 0.981, 0.017	35.19, 0.967, 0.019	22.9M
Compressed AdaCoF+	34.46, 0.975, 0.019	35.76, 0.979, 0.019	35.16, 0.967, 0.019	2.56M
Our Final Model	35.19, 0.978, 0.010	37.17, 0.983, 0.008	35.24, 0.967, 0.015	4.98M

Our final model also performs favorably against other state-of-the-arts (details refer to our paper)
The proposed framework is generic and can be easily transferred to other DNN-based frame interpolation method

The above GIF is a demo of using our method to generate slow motion video, which increases the FPS from 5 to 160. We also provide a long video demonstration here (redirect to YouTube).

Environment

CUDA 11.0
python 3.8.3
torch 1.6.0
torchvision 0.7.0
cupy 7.7.0
scipy 1.5.2
numpy 1.19.1
Pillow 7.2.0
scikit-image 0.17.2

Test Pre-trained Models

Download repository:

$ git clone https://github.com/tding1/CDFI.git
$ cd CDFI/

Testing data

For user convenience, we already provide the Middlebury and UCF101-DVF test datasets in our repository, which can be found under directory test_data/.

Evaluation metrics

We use the built-in functions in skimage.metrics to compute the PSNR and SSIM, for which the higher the better. We also use LPIPS, a newly proposed metric that measures perceptual similarity, for which the smaller the better. For user convenience, we include the implementation of LPIPS in our repo under lpips_pytorch/, which is a slightly modified version of here (with an updated squeezenet backbone).

Test our pre-trained CDFI model

$ python test.py --gpu_id 0

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/cdfi_adacof/.

Test the compressed AdaCoF

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 5 --dilation 1

By default, it will load the compressed AdaCoF model checkpoints/compressed_adacof_F_5_D_1.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_5_D_1/.

Test the compressed AdaCoF+

$ python test_compressed_adacof.py --gpu_id 0 --kernel_size 11 --dilation 2

By default, it will load the compressed AdaCoF+ model checkpoints/compressed_adacof_F_11_D_2.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_11_D_2/.

Interpolate two frames

$ python interpolate_twoframe.py --gpu_id 0 --first_frame figs/0.png --second_frame figs/1.png --output_frame output.png

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth, and generate the intermediate frame output.png given two consecutive frames in a sequence.

Train Our Model

Training data

We use the Vimeo-90K triplet dataset for video frame interpolation task, which is relatively large (>32 GB).

$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Start training

$ python train.py --gpu_id 0 --data_dir path/to/vimeo_triplet/ --batch_size 8

It will generate an unique ID for each training, and all the intermediate results/records will be saved under model_weights/<training id>/. For a GPU device with memory around 10GB, the --batch_size can take a value as large as 3, otherwise CUDA may be out of memory. There are many other training options, e.g., --lr, --epochs, --loss and so on, can be found in train.py.

Apply CDFI to New Models

One nice thing about CDFI is that the framework can be easily applied to other (heavy) DNN models and potentially boost their performance. The key to CDFI is the optimization-based compression that compresses a model via fine-grained pruning. In particular, we use the efficient and easy-to-use sparsity-inducing optimizer OBPROXSG (see also paper), and summarize the compression procedure for any other model in the following.

Copy the OBPROXSG optimizer, which is already implemented as torch.optim.optimizer, to your working directory
Starting from a pre-trained model, finetune its weights by using the OBPROXSG optimizer, like using any standard PyTorch built-in optimizer such as SGD or Adam
- It is not necessarily to use the full dataset for this finetuning process
The parameters for the OBPROXSG optimizer
- lr: learning rate
- lambda_: coefficient of the L1 regularization term
- epochSize: number of batches in a epoch
- Np: number of proximal steps, which is set to be 2 for pruning AdaCoF
- No: number of orthant steps (key step to promote sparsity), for which we recommend using the default setting
- eps: threshold for trimming zeros, which is set to be 0.0001 for pruning AdaCoF
After the optimization is done (either by reaching a maximum number of epochs or achieving a high sparsity), use the layer density as the compression ratio for that layer, as described in the paper
As an example, compare the architectures in models/adacof.py and model/compressed_adacof.py for compressing AdaCoF with the above procedure

Now it's ready to make further improvements/modifications on the compressed model, based on the understanding of its flaws/drawbacks.

Citation

Coming soon...

Acknowledgements

The code is largely based on HyeongminLEE/AdaCoF-pytorch and baowenbo/DAIN.

Comments

Train problem

Hi，my friend. I use your pretrained CDFI_adacof.pth ,and train on my own dataset (5fps video, larger movement object ,about 10000 triple data,same format as vimeo_triplet)

Training from 88 epoch (your pretrained model) to 150 epoch , cost a few days on my 1080Ti.

Your pretrained model is not too bad on my dataset But after training I got very Little improvement. Here is my log. log.txt

Can you tell me why?

opened by qhdqhd 9
quantization in evaluation

Thanks for sharing your code! I just looked into it a little bit and it seems there is no quantization in the evaluation?

https://github.com/tding1/CDFI/blob/d7f79e502674187b7a7b645a7812fd9fa30a6608/test.py#L36-L47

However, it is common practice to quantize your interpolation estimate before computing any metrics as shown in the examples below. If you submit results to a benchmark, like the one from Middlebury, you will have to quantize the interpolation estimates to save them as an image so it has been the norm to quantize all results throughout the evaluation.

https://github.com/sniklaus/sepconv-slomo/blob/46041adec601a4051b86741664bb2cdc80fe4919/benchmark.py#L28 https://github.com/hzwer/arXiv2020-RIFE/blob/15cb7f2389ccd93e8b8946546d4665c9b41541a3/benchmark/Vimeo90K.py#L36 https://github.com/baowenbo/DAIN/blob/9d9c0d7b3718dfcda9061c85efec472478a3aa86/demo_MiddleBury.py#L162-L166 https://github.com/laomao0/BIN/blob/b3ec2a27d62df966cc70880bb3d13dcf147f7c39/test.py#L406-L410

The reason why this is important is that the quantization step has a negative impact on the metrics. So if one does not quantize the results of their method before computing the metrics while the results from other methods had the quantization step in place, then the evaluation is slightly biased. Would you hence be able to share the evaluation metrics for CDFI with the quantization? This would greatly benefit future work that compares to CDFI to avoid this bias. And thanks again for sharing your code!

opened by sniklaus 6
Different result in the same test data with different resolution? Why?

hello, my friend. I trained model based on my own data (same format as Vimeo-90K, 448X256 resolution). When I test my model, I found different result in the same test data with different resolution. Data with 1280X720 resolution is a little bad while the same data with 448X256 is better. Here is the result.

opened by qhdqhd 5
class Loss in loss.py return a loss_sum of size 2, which should be a scalar?

Hello, thanks for you wonderful code! I wonder why your loss function gives a loss of 2 elements'tensor instead of a scalar? I encounterd an ERROR when running loss.backward(), which is due to the non-scalar loss. I suppose the code should be fixed as: for r in self.regularize: effective_loss = r['weight'] * output[r['type']] #********************** effective_loss = sum(effective_loss) #I added this. #********************** losses.append(effective_loss) Is that correct?

opened by Lianghy15 4
About pruning layer problems.

Congratuations! This work is so helpful to me, but I am still confused how to prune these channles in detail, could you release the codes about pruning?

opened by cocowy1 3
How to determine the number of channels in a compressed model?

Hi, Bro. Thank you for your open source. I got the density according to the steps in Apply CDFI to New Models. But how to determine the number of channels of the compressed model according to the density? Can you share how to do it? Thank you!

opened by zdyshine 3
LPIPS computation issue

https://github.com/tding1/CDFI/blob/0de1f7eaa79e6f3ad7c0e9d6ad4a9f7fba891e9e/test.py#L123

You seem to use ‘squeeze net’ when testing lpips index, which may cause some problems with your Table 3 comparison experiment.

The results reported by Softsplat are consistent with EDSC, but your reported EDSC results cannot be consistent with the origin EDSC paper (refer to https://arxiv.org/pdf/2006.08070.pdf Table 4). In your paper, the LPIPS of EDSC is much better than SoftSplat. CAIN and EDSC are better than DAIN, which is counter-common sense. This makes this part of the data look very strange.

I suggest modifying this part of the data so that future researchers can follow your work well. Thank you very much.

From EDSC:

From CDFI:

opened by hzwer 3
inconsistent SSIM computation
I was surprised to see that the ratio between PSNR and SSIM deviates between the methods with dagger and the ones without dagger in Table 3, by a large margin. I noticed that the provided test.py uses the following.

https://github.com/tding1/CDFI/blob/d7f79e502674187b7a7b645a7812fd9fa30a6608/test.py#L46-L47

In doing so, it does not provide a data_range argument and skimage.metrics.structural_similarity has to guess it. However, it just uses the difference between the smallest and the largest element as a fallback. This significantly alters the results and puts the methods with dagger in Table 3 at a substantial disadvantage though (and half of the methods have a dagger).

I just updated the test.py as follows (which also addresses the quantization issue from #1).

... gt = (gt * 255).round() / 255 frame_out = (frame_out * 255).round() / 255 psnr = skimage.metrics.peak_signal_noise_ratio(image_true=gt, image_test=frame_out) ssim = skimage.metrics.structural_similarity(np.transpose(gt, (1, 2, 0)), np.transpose(frame_out, (1, 2, 0)), data_range=1.0, multichannel=True) ...

With this fix, the SSIM of CDFI on the Middlebury test drops from 0.983 to 0.966 which is quite significant. It would hence be great if Table 3 could get revised such that future work that references it is not subject to the same inconsistencies. Thanks!
opened by sniklaus 3
cupy and CUDA compatibility

what is the pip3 command for installing cupy ?

For CUDA 11.0 the command should be pip3 install cupy-cuda110

but cupy version 7.7.0(from README) is not compatible with cupy-cuda110

pip3 install cupy-cuda110==7.7.0 gives errors..

the only command works is pip3 install cupy-cuda100==7.7.0 but that refers to cuda 10.0. Very confusing. can you help clarify?

opened by sramakintel 2
CDFI inference speed

What is the CDFI inference speed and what is FPS in 1080P video? Why is it slower than AdaCof when I see somebody's evaluation? Isn't it that the CDFI compression model is smaller and faster?

opened by qhdqhd 2
Testing result on vimeo90k_septuplet

Hello, my friend! I tested the model with pretrained model 'FLAVR_4x.pth' (yours) and dataset 'vimeo90k_septuplet', and the result of psnr I had got was 28.376122. I don't konw why it occurs.

opened by silence-moon 2

Owner

Tianyu Ding

Ph.D. in Applied Mathematics \\ Master in Computer Science

GitHub

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

87 Jan 8, 2023

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

59 Oct 31, 2022

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

334 Dec 23, 2022

Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

55 Nov 14, 2022

Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

158 Jan 4, 2023

Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

How Well Do Self-Supervised Models Transfer? This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Mod

157 Dec 16, 2022

Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

130 Dec 25, 2022

Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

496 Dec 30, 2022

the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

87 Nov 29, 2022

the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

205 Nov 9, 2022

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

182 Dec 30, 2022

Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

473 Dec 31, 2022

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

33 Dec 5, 2022

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

134 Dec 16, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

68 Dec 6, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

83 Dec 14, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

24 Oct 4, 2022