Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Media Computing Group @ Nankai University

Last update: Jan 7, 2023

Related tags

Overview

E²FGVI (CVPR 2022)

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li^#, Cheng-Ze Lu^#, Jianhua Qin, Chun-Le Guo^*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here:

⭐ News

2022.05.15: We release E²FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. 🔗 Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]
2022.04.06: Our code is publicly available.

Demo

More examples (click for details):

Coco (click me)	Tennis
Space	Motocross

Overview

🚀 Highlights:

SOTA performance: The proposed E²FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

Update website page
Hugging Face demo
Efficient inference

Dependencies and Installation

Clone Repo

git clone https://github.com/MCG-NKU/E2FGVI.git

Create Conda Environment and Install Dependencies
```
conda env create -f environment.yml
conda activate e2fgvi
```
- Python >= 3.7
- PyTorch >= 1.5
- CUDA >= 9.2
- mmcv-full (following the pipeline to install)
If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

Model	🔗 Download Links	Support Arbitrary Resolution ?	PSNR / SSIM / VFID (DAVIS)
E²FGVI	[Google Drive] [Baidu Disk]	❌	33.01 / 0.9721 / 0.116
E²FGVI-HQ	[Google Drive] [Baidu Disk]	⭕	33.06 / 0.9722 / 0.117

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E²FGVI always rescales the input video to a fixed resolution (432x240), while E²FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

Dataset	YouTube-VOS	DAVIS
Details	For training (3,471) and evaluation (508)	For evaluation (50 in 90)
Images	[Official Link] (Download train and test all frames)	[Official Link] (2017, 480p, TrainVal)
Masks	[Google Drive] [Baidu Disk] (For reproducing paper results)

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E²FGVI. The scores of E²FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E²FGVI) and train_e2fgvi_hq.json (for E²FGVI-HQ).

Run one of the following commands for training:

 # For training E2FGVI
 python train.py -c configs/train_e2fgvi.json
 # For training E2FGVI-HQ
 python train.py -c configs/train_e2fgvi_hq.json

You could run the same command if you want to resume your training.

The training loss can be monitored by running:

tensorboard --logdir release_model

You could follow this pipeline to evaluate your model.

Results

Quantitative results

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{liCvpr22vInpainting,
   title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
   author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2022}
}

Contact

If you have any question, please feel free to contact us via zhenli1031ATgmail.com or czlu919AToutlook.com.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Zhen Li and Cheng-Ze Lu.

This code is based on STTN, FuseFormer, Focal-Transformer, and MMEditing.

Comments

About custom datasets

Hello, very lucky to learn about your model. I was able to successfully train the davis dataset, but there are some issues with defining the dataset. There are 320 zip files in JPEGImages, each zip has ten photos. There are 320 normal mask files in test_masks, each with ten mask photos. test.json is the same as train.json.

But when we run our own file, the following error occurs：is invalid for input of size 11272192

Custom dataset directory： dataset ——ballet ————JPEGImages —————— xxx.zip —————— ......... ———test_masks —————— xxx ————train.json ————test.json

specific error： Traceback (most recent call last): File "/home/u202080087/data/E2FGVI/train.py", line 84, in mp.spawn(main_worker, nprocs=config['world_size'], args=(config, )) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/u202080087/data/E2FGVI/train.py", line 64, in main_worker trainer.train() File "/home/u202080087/data/E2FGVI/core/trainer.py", line 288, in train self._train_epoch(pbar) File "/home/u202080087/data/E2FGVI/core/trainer.py", line 307, in _train_epoch pred_imgs, pred_flows = self.netG(masked_frames, l_t) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/e2fgvi_hq.py", line 255, in forward trans_feat = self.transformer([trans_feat, fold_output_size]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 551, in forward attn_windows = self.attn(x_windows_all, mask_all=x_window_masks_all) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 252, in forward 0] * self.window_size[1], C // self.num_heads), (q, k, v)) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 248, in lambda t: window_partition(t, self.window_size).view( File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 132, in window_partition window_size[1], C) RuntimeError: shape '[2, 8, 6, 5, 4, 9, 512]' is invalid for input of size 11272192

Looking forward to your reply

opened by caimaomao315 10
Solving environment: failed
I try installing on both windows and linux however I get Solving environment: failed

conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

libffi==3.3=he6710b0_2

lcms2==2.12=h3be6417_0

matplotlib-base==3.4.2=py37hab158f2_0

tornado==6.1=py37h27cfd23_0

brotli==1.0.9=he6710b0_2

scipy==1.6.2=py37had2a1c9_1

bzip2==1.0.8=h7b6447c_0

locket==0.2.1=py37h06a4308_1

libpng==1.6.37=hbc83047_0

ffmpeg==4.2.2=h20bf706_0

freetype==2.10.4=h5ab3b9f_0

expat==2.4.1=h2531618_2

xz==5.2.5=h7b6447c_0

ncurses==6.2=he6710b0_1

openh264==2.1.0=hd408876_0

qt==5.9.7=h5867ecd_1pt150_0

pywavelets==1.1.1=py37h7b6447c_2

libgfortran-ng==7.5.0=ha8ba4b0_17

libwebp-base==1.2.0=h27cfd23_0

pcre==8.45=h295c915_0

jpeg==9d=h7f8727e_0

ca-certificates==2022.2.1=h06a4308_0

certifi==2021.10.8=py37h06a4308_2

gstreamer==1.14.0=h28cd5cc_2

lame==3.100=h7b6447c_0

libtiff==4.2.0=h85742a9_0

tk==8.6.11=h1ccaba5_0

glib==2.69.1=h5202010_0

pillow==8.3.1=py37h2c7a002_0

libgcc-ng==9.3.0=h5101ec6_17

openssl==1.1.1m=h7f8727e_0

libstdcxx-ng==9.3.0=hd4cf53a_17

fontconfig==2.13.1=h6c09931_0

zstd==1.4.9=haebb681_0

zlib==1.2.11=h7b6447c_3

_openmp_mutex==4.5=1_gnu

pyqt==5.9.2=py37h05f1152_2

libvpx==1.7.0=h439df22_0

libgomp==9.3.0=h5101ec6_17

python==3.7.11=h12debd9_0

dbus==1.13.18=hb2f20db_0

x264==1!157.20191217=h7b6447c_0

openjpeg==2.4.0=h3ad879b_0

libtasn1==4.16.0=h27cfd23_0

lz4-c==1.9.3=h295c915_1

cytoolz==0.11.0=py37h7b6447c_0

mkl_fft==1.3.0=py37h42c9631_2

sqlite==3.36.0=hc218d9a_0

gnutls==3.6.15=he1e5248_0

icu==58.2=he6710b0_3

pytorch==1.5.1=py3.7_cuda9.2.148_cudnn7.6.3_0

libgfortran4==7.5.0=ha8ba4b0_17

yaml==0.2.5=h7b6447c_0

ninja==1.10.2=hff7bd54_1

nettle==3.7.3=hbbd107a_1

kiwisolver==1.3.1=py37h2531618_0

setuptools==58.0.4=py37h06a4308_0

libopus==1.3.1=h7b6447c_0

libunistring==0.9.10=h27cfd23_0

matplotlib==3.4.2=py37h06a4308_0

sip==4.19.8=py37hf484d3e_0

gmp==6.2.1=h2531618_2

pip==21.2.2=py37h06a4308_0

numpy-base==1.20.3=py37h74d4b33_0

libidn2==2.3.2=h7f8727e_0

pyyaml==5.4.1=py37h27cfd23_1

libxcb==1.14=h7b6447c_0

gst-plugins-base==1.14.0=h8213a91_2

ld_impl_linux-64==2.35.1=h7274673_9

mkl-service==2.4.0=py37h7f8727e_0

libuuid==1.0.3=h7f8727e_2

mkl_random==1.2.2=py37h51133e4_0

mkl==2021.3.0=h06a4308_520

libxml2==2.9.12=h03d6c58_0

intel-openmp==2021.3.0=h06a4308_3350

numpy==1.20.3=py37hf144106_0

good first issue
opened by Tobe2d 8
Not able to reproduce the results listed in the paper with my trained model

I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give your loss curve? @Paper99

opened by LigZhong 7
Question of Focal Transformer

Hey, thanks for your wonderful work. I think it may be a bug: https://github.com/MCG-NKU/E2FGVI/blob/924b56c133fffe37327f9c9b90290fc3d0538581/model/modules/tfocal_transformer.py#L342,

should we first transpose and then do view operation?

Thanks in advance!

opened by sydney0zq 6
Error reported in training

作者您好！最近在阅读您的这篇文章及尝试调试代码。我有一个问题想咨询您。在使用youtube-vos数据集来训练e2fgvi模型时，出现了以下问题。

索引越界了。查看 datasets/youtube-vos/train.json 这个文件，猜测是“数据编号:帧数量”的意思，例如 "003234408d": 180 的意思是youtube-vos数据集里面编号为003234408d的数据一共有180帧。可是 datasets/youtube-vos/JPEGImages 这个文件夹里面并没有编号为003234408d的数据，因此我猜测可能是我下载的数据集出错了。但是我是按照着Prepare dataset for training and evaluation的指引下载了youtube-vos2018（或者Google Drive）的train.zip和test_all_frames.zip这两个文件并解压，mask也用的是指引提供的。是因为我弄错了数据集吗？

opened by Lynchrocket 5
Question about learning rate

你好，感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.0001，reduce at 400k by factor of 10 但在对比工作fuseformer中initial learning rate is 0.01，之后分别在200k，400k和450k时reduce by factor of 10 您是否测试过这二者的区别？是什么让您选择没有follow fuseformer的配置呢？希望得到您的解答！！！

opened by unclebuff 5
Demo videos to contribute

Hi,

Thanks for this great repo and project.

Not really an issue, more a question: I see the demo video section is TBD, would you be interested by some inferenced test videos in the wild for the read me? I am planning to run some anyway, hopefully in the next week or so, let me know and I ll share.

Would be great to have higher res trained model to produce better quality demo videos too, but I see it is on the book of work.

opened by Tetsujinfr 5
About the pretrained model of discriminator and opt.pth

Hello, I'm very lucky and happy to know your work. What a fantastic work! I am doing some research which also contains video inpainting. I'd like to finetune your pretrained model on my new dataset. However, I could only find the generator model in the link given in README.md. Could you please upload the discriminator model as well (also the opt.pth)? Or could you please tell me how to get access to it in case I missed the downloading link? Thank you very much!

opened by nlx0021 4
Output encoding settings

Hello. After a long while of trial and error, I managed to get this software running. It still doesn't run well, giving me OOM with more than 250 frames of 120x144 video. I have an 8GB 3060ti, which should be fine for this, in my opinion. Needing to split tasks many times is a pain, but might be manageable.

What isn't manageable are the output settings. H.263 is outdated and with tiny input sizes and lengths, lossy is a baffling pick. Maybe I missed a customizing option somewhere? I would like to have lossless h264 or FFV1. In addition, I would like to decide the video's framerate (very important for syncing) and not have the video resized. That causes distortions that look bad.

Thank you. Looking forward to the high-resolution model.

opened by Troceleng 4
frame_idx和flow_idx

[syujung] 在七月20号问了以下这个问题（Issue#25） “作者您好，您的这片工作非常精彩，效果也很棒！我有一个关于代码的问题，您上传的代码models/modules/feat_prop.py里面，我对比了一下basicvsr++的源码，感觉在backward_propagation的时候，得到cond_n1所用的光流是不是有问题，您写的for循环frame_idx和flow_idx应该保持顺序一致？我看basicvsr++是这样的，想询问一下”

您可以具体说一下应该怎样修改现在的代码？多谢！

opened by Roowenliang 3
How to generate object-like masks

Hi authors,

Thanks for your awesome work!

I'm wondering if you used the same 'create_random_shape_with_random_motion' function for both video completion and object removal, if so, can I say this model has only been trained once for both tasks?

Besides, does this moving mask (https://github.com/MCG-NKU/E2FGVI/blob/master/core/utils.py#L209) refer to the object-like masks mentioned in your paper (experiment settings)?

opened by sczhou 2
Windows environment

Hey there, Just a FYI, this works for me in Windows with Cuda 11.1 and a 3090.

conda create -n e2fgvi python=3.7 conda activate e2fgvi python -m pip install --upgrade pip pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts matplotlib==3.4.1 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts opencv-python==4.5.5.62 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts vispy==0.9.3 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transforms3d==0.3.1 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts networkx==2.3 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts scikit-image==0.19.2 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts pyaml==21.10.1 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts moviepy==1.0.3 pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts pyqt6==6.3.0 pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9/index.html pip install tensorboard matplotlib

The original environment solve bug thread resulted in getting to the inference portion after an abnormally long time with no feedback.. and then it crashing saying that there was no kernel on the device.

After setting up the environment this way, python test.py --model e2fgvi_hq --video examples/tennis --mask examples/tennis_mask --ckpt release_model/E2FGVI-HQ-CVPR22.pth processed the test video in about 10 seconds.

opened by Teravus 0
How to convert the pytorch model to the onnx model?

How to convert the pytorch model to the onnx model? I tried the conversion process, but I reported an error. I don't know what the problem is. I'm Xiaobai. Thank you for your advice.My script as follows: import torch import importlib

device = torch.device("cpu") model = "e2fgvi_hq"

ckpt = 'release_model/E2FGVI-HQ-CVPR22.pth'

net = importlib.import_module('model.' + model) model = net.InpaintGenerator().to(device) data = torch.load(ckpt, map_location=device) model.load_state_dict(data) print(f'Loading model from: {ckpt}') model.eval() x = torch.randn(1,1, 3, 240, 864, requires_grad=True) torch.onnx.export(model, # model being run (x,2), # model input (or a tuple for multiple inputs) "E2FGVI-HQ-CVPR22.onnx", # where to save the model (can be a file or file-like object) export_params=True, # store the trained parameter weights inside the model file opset_version=16, # the ONNX version to export the model to do_constant_folding=True, # whether to execute constant folding for optimization input_names = ['input'], # the model's input names output_names = ['output'], # the model's output names dynamic_axes={'input' : {1 : 'batch_size'}}) the error as follows: torch.onnx.symbolic_registry.UnsupportedOperatorError: Exporting the operator ::col2im to ONNX opset version 16 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.

opened by DidaDidaDidaD 0
GPU is not working for prediction

Hi, I meet a problem when I was predicting using the E2FGVI-HQ. My CPU and Memory working for whole time but GPU does not work at all. I have ensure my CUDA is installed successfully, and the device for this code is return cuda. device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

opened by tchen0623 0
Request for the visualization codes

Hi, thanks for your wonderful work. I notice you visualize the feature maps of local frames' features, could you please provide it?

In supp material: To further investigate the effectiveness of the feature propagation module, we visualize averaged local neighboring features with the temporal size of 5 before conducting content hallucination in Fig. 10.

opened by sydney0zq 0

Owner

Media Computing Group @ Nankai University

Media Computing Group at Nankai University, led by Prof. Ming-Ming Cheng.

GitHub

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

503 Jan 4, 2023

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

58 Dec 6, 2022

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa