Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Overview

Implicit Internal Video Inpainting

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

paper | project website | 4K data | demo video

Introduction

Want to remove objects from a video without days of training and thousands of training videos? Try our simple but effective internal video inpainting method. The inpainting process is zero-shot and implicit, which does not need any pretraining on large datasets or optical-flow estimation. We further extend the proposed method to more challenging tasks: video object removal with limited annotated masks, and inpainting on ultra high-resolution videos (e.g., 4K videos).

TO DO

  • Release code for 4K video inpainting

Setup

Installation

git clone https://github.com/Tengfei-Wang/Implicit-Internal-Video-Inpainting.git
cd Implicit-Internal-Video-Inpainting

Environment

This code is based on tensorflow 2.x (tested on tensorflow 2.2, 2.4).

The environment can be simply set up by Anaconda:

conda create -n IIVI python=3.7
conda activate IIVI
conda install tensorflow-gpu tensorboard
pip install pyaml 
pip install opencv-python
pip install tensorflow-addons

Or, you can also set up the environment from the provided environment.yml:

conda env create -f environment.yml
conda activate IIVI

Usage

Quick Start

We provide an example sequence 'bmx-trees' in ./inputs/ . To try our method:

python train.py

The default iterations is set to 50,000 in config/train.yml, and the internal learning takes ~4 hours with a single GPU. During the learning process, you can use tensorboard to check the inpainting results by:

tensorboard --logdir ./exp/logs

After the training, the final results can be saved in ./exp/results/ by:

python test.py

You can also modify 'model_restore' in config/test.yml to save results with different checkpoints.

Try Your Own Data

Data preprocess

Before training, we advise to dilate the object masks first to exclude some edge pixels. Otherwise, the imperfectly-annotated masks would lead to artifacts in the object removal task.

You can generate and preprocess the masks by this script:

python scripts/preprocess_mask.py --annotation_path inputs/annotations/bmx-trees

Basic training

Modify the config/train.yml, which indicates the video path, log path, and training iterations,etc.. The training iterations depends on the video length, and it typically takes 30,000 ~ 80,000 iterations for convergence for 100-frame videos. By default, we only use reconstruction loss for training, and it works well for most cases.

python train.py

Improve the sharpness and consistency

For some hard videos, the former training may not produce a pleasing result. You can fine-tune the trained model with another losses. To this end, modify the 'model_restore' in config/test.yml to the checkpoint path of basic training. Also set ambiguity_loss or stabilization_loss to True. Then fine-tune the basic checkpoint for 20,000-40,000 iterations.

python train.py

Inference

Modify the ./config/test.yml, which indicates the video path, log path, and save path.

python test.py

Mask Propagation from A Single Frame

When you only annotate the object mask of one frame (or few frames), our method can propagate it to other frames automatically.

Modify ./config/train_mask.yml. We typically set the training iterations to 4,000 ~ 20,000, and the learning rate to 1e-5 ~ 1e-4.

python train_mask.py

After training, modify ./config/test_mask.yml, and then:

python test_mask.py

High-resolution Video Inpainting

Our 4K videos and mask annotations can be downloaded in 4K data.

More Results

Our results on 70 DAVIS videos (including failure cases) can be found here for your reference :)
If you need the PNG version of our uncompressed results, please contact the authors.

Citation

If you find this work useful for your research, please cite:

@inproceedings{ouyang2021video,
  title={Internal Video Inpainting by Implicit Long-range Propagation},
  author={Ouyang, Hao and Wang, Tengfei and Chen, Qifeng},
  booktitle={International Conference on Computer Vision (ICCV) },
  year={2021}
} 

If you are also interested in the image inpainting or internal learning, this paper can be also helpful :)

@inproceedings{wang2021image,
  title={Image Inpainting with External-internal Learning and Monochromic Bottleneck},
  author={Wang, Tengfei and Ouyang, Hao and Chen, Qifeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5120--5129},
  year={2021}
}

Contact

Please send emails to Hao Ouyang or Tengfei Wang if there is any question

Comments
  •  About the pipeline

    About the pipeline

    Hi, thanks for your released code.

    It takes too long to train&test a video, will it possible to fastly test any input? The example "bmx-trees" takes several hours to finish.

    Thanks.

    opened by sydney0zq 6
  • the resolution problem of saving the result's picture

    the resolution problem of saving the result's picture

    Hello, thank you for the code. And I can train and test the bmx-trees datasets. But the resolution(320, 600) of the picture result is different from the input (480, 854). I found that the train.yml and test.yml file has the parameterimg_shapes: [320, 600]. But when I try to change it to img_shapes: [480, 854] to retrain, the following error occurred: tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [5,480,856,3] vs. [5,480,854,1] [Op:Mul]. So if I want to save the inpainting image with the same resolution as the input image, what should I do?

    opened by weiningwei 1
  • Reduce time for inference

    Reduce time for inference

    As for each video that I want to repaint I must re-train, what parameters do you recommend changing to decrease the algorithm's training times (4 hours is too much, I would like to achieve 1h <). Thanks a lot in advance for the help and the code!

    opened by italosalgado14 1
  • Virtualenv users support

    Virtualenv users support

    Hi there. Thank you very much for the code!. Could you generate the "requirements.txt" file for venv users? This gives us more performance and control over the CUDA versions for users with newer RTX 30XX cards. Thank you very much in advance!

    opened by italosalgado14 1
  • GPU out of memory when set ambiguity_loss or stabilization_loss to True

    GPU out of memory when set ambiguity_loss or stabilization_loss to True

    Hi, I am trying to run your code with ambiguity loss and stabilization loss. But I met the gpu out of memory problem.

    May I ask what is the batch size you set in the experiments with ambiguity loss and stabilization loss and what kind of gpu and how many gpus are used to train the model?

    Many thanks!

    opened by Huihui1002 1
  • PNG version of our uncompressed results and segmentation results

    PNG version of our uncompressed results and segmentation results

    Hi,

    Many thanks for publishing the code for this nice work. I am very interested in your work.

    May I ask could you please share

    1. the PNG version of results
    2. the segmentation results
    3. inpainting results with only the first frame segmentation mask

    of all the videos in DAVIS dataset?

    Many thanks!

    opened by Huihui1002 1
  • error when using train_dist.py

    error when using train_dist.py

    when I use tensorflow 2.4: AttributeError: 'MirroredStrategy' object has no attribute 'experimental_run_v2' I need to downgrade to tensorflow 2.0

    when I use tensorflow 2.0: ImportError:cannot import name 'keras_tensor' from 'tensorflow.python.keras.engine' (/home/ivdai/anaconda3/envs/IIVI/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/init.py) I need to upgrade to tensorflow 2.4

    funny

    opened by Altheim 1
  • About test

    About test

    Hi, I‘m a little confused about this method. If I want to test 10 different videos, should I train 10 different models? and every model needs about 4 hours? or just need to train a model which can be applied to other videos?

    opened by lixixin 1
  • What

    What "Mask Propagation from A Single Frame" usage?

    1. How much annotation files I am must provide?
    2. Must I am provide corresponding mask for them or not?
    3. Why annotation pictures red and green and what difference?
    4. How much frames I am must provide for video?
    5. Is video must be without fast object movement?
    opened by Vadim2S 1
  • multi-GPUs - only using vram, not processing

    multi-GPUs - only using vram, not processing

    Hi Tengfei Wang, such a amazing reasearch and many thanks for sharing the code. Very intersting results...

    I was able to reproduce some results and really liked the work flow you created of CNN and not Oflow, seams it handles perspective shifts and background better (still playing with it). The dilate mask makes totally sense...

    My question is about multi-GPU to speed up training....im doing these below:

    on train.py i removed the # on mirrored_strategy = tf.distribute.MirroredStrategy() line and added # on os.environ["CUDA_VISIBLE_DEVICES"] = FLAGS.GPU_ID.

    With that seams that is Training is using both GPUs, but also shows that the GPU_0 is using CUDA and processing but GPU_1 only using vram, does not seams to be using CUDA and process, only VRAM. Is that correct?

    Also saw @tf.function down below, but not sure if i should remove # on those lines. Also found #dist_full_ds = mirrored_strategy, tried but seams to do the same thing on second gpu, only using vram, not processing

    Is that correct behavior?

    Thank you Tengfei Wang and once again, amazing research.

    opened by optfx 1
  • Dataset directory for training

    Dataset directory for training

    @ken-ouyang @Tengfei-Wang Thank you for releasing the codebase. Could you please share what the exact data directory should look like for training your model on a video? I am training on a random youtube video. So, do I need to create a separate frames directory and mask directory, where each file in the mask folder corresponds to an image file in the frames directory?

    opened by SURABHI-GUPTA 0
  • I haven't understand your network.

    I haven't understand your network.

    Hi, Thanks for providing code. I look at your code, I find you train one video, and then use the same to do the inference. I think it is tricky. The CNN should be train with multiple videos, and then use different video to do inference. Same video to train and same video to inference, it is of cause produce a good result.

    I try to understand your model. Is kind like the input video has an foreground object and its mask, then, you give another mask for augmentation. Then, after training, the output video will have no foreground and the background is re-drawed. Am I right?

    Can I train your model with multiple videos then inference with a different video? For example, I want to train 10 different videos, then I inference with another different video? What will be happened on the inference? How can I train the model with multiple videos?

    opened by ztrobertyang 1
  • the single gpu infer for multi-gpu train

    the single gpu infer for multi-gpu train

    Hello, thank you for the code. Now I can train the model using multi-gpu. But when I use single gpu to infer it, the following error occurred: tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for exp/logs/city_day/0_3/checkpoint_200000 So if I use multi-GPU to train, Do I have to use multi-GPU to infer?

    opened by weiningwei 0
  • 4K pipeline and performance

    4K pipeline and performance

    Hello,

    Great work. I would like to test the 4K pipeline. Could you provide a sample or some hints on how long it took to train? Thank you in advance. Barnabas

    opened by BarnabasTakacs 0
My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

Joshua V Evans 1 Dec 12, 2021
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

POPPY: Physical Optics Propagation in Python POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propaga

Space Telescope Science Institute 132 Dec 15, 2022
Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

QData 440 Jan 2, 2023
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

StructDepth PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimat

SJTU-ViSYS 112 Nov 28, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 2, 2022
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

null 52 Nov 21, 2022
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | 简体中文 This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University 537 Jan 7, 2023
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 77 Dec 16, 2022
The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

TDANet: Text-Guided Neural Image Inpainting, MM'2020 (Oral) MM | ArXiv This repository implements the paper "Text-Guided Neural Image Inpainting" by L

LisaiZhang 75 Dec 22, 2022
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Rex Cheng 364 Jan 3, 2023
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

Rex Cheng 106 Jan 3, 2023
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Holy Wu 35 Jan 1, 2023
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 2 Dec 7, 2021
Official repository of "BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment"

BasicVSR_PlusPlus (CVPR 2022) [Paper] [Project Page] [Code] This is the official repository for BasicVSR++. Please feel free to raise issue related to

Kelvin C.K. Chan 227 Jan 1, 2023
PyTorch implementation of 1712.06087 "Zero-Shot" Super-Resolution using Deep Internal Learning

Unofficial PyTorch implementation of "Zero-Shot" Super-Resolution using Deep Internal Learning Unofficial Implementation of 1712.06087 "Zero-Shot" Sup

Jacob Gildenblat 196 Nov 27, 2022