Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

Overview

E2FGVI (CVPR 2022)

PWC PWC

Python 3.7 pytorch 1.6.0

English | 简体中文

This repository contains the official implementation of the following paper:

Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

[Paper] [Demo Video (Youtube)] [演示视频 (B站)] [Project Page (TBD)] [Poster (TBD)]

You can try our colab demo here: Open In Colab

News

  • 2022.05.15: We release E2FGVI-HQ, which can handle videos with arbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs better than our original model on both PSNR and SSIM metrics. 🔗 Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]

  • 2022.04.06: Our code is publicly available.

Demo

teaser

More examples (click for details):

Coco (click me)
Tennis
Space
Motocross

Overview

overall_structure

🚀 Highlights:

  • SOTA performance: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
  • Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods.

Work in Progress

  • Update website page
  • Hugging Face demo
  • Efficient inference

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/MCG-NKU/E2FGVI.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate e2fgvi
    • Python >= 3.7
    • PyTorch >= 1.5
    • CUDA >= 9.2
    • mmcv-full (following the pipeline to install)

    If the environment.yml file does not work for you, please follow this issue to solve the problem.

Get Started

Prepare pretrained models

Before performing the following steps, please download our pretrained model first.

Model 🔗 Download Links Support Arbitrary Resolution ? PSNR / SSIM / VFID (DAVIS)
E2FGVI [Google Drive] [Baidu Disk] 33.01 / 0.9721 / 0.116
E2FGVI-HQ [Google Drive] [Baidu Disk] 33.06 / 0.9722 / 0.117

Then, unzip the file and place the models to release_model directory.

The directory structure will be arranged as:

release_model
   |- E2FGVI-CVPR22.pth
   |- E2FGVI-HQ-CVPR22.pth
   |- i3d_rgb_imagenet.pt (for evaluating VFID metric)
   |- README.md

Quick test

We provide two examples in the examples directory.

Run the following command to enjoy them:

# The first example (using split video frames)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
# The second example (using mp4 format video)
python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask  --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)

The inpainting video will be saved in the results directory. Please prepare your own mp4 video (or split frames) and frame-wise masks if you want to test more cases.

Note: E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the --set_size flag and set the values of --width and --height.

Example:

# Using this command to output a 720p video
python test.py --model e2fgvi_hq --video <video_path> --mask <mask_path>  --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720

Prepare dataset for training and evaluation

Dataset YouTube-VOS DAVIS
Details For training (3,471) and evaluation (508) For evaluation (50 in 90)
Images [Official Link] (Download train and test all frames) [Official Link] (2017, 480p, TrainVal)
Masks [Google Drive] [Baidu Disk] (For reproducing paper results)

The training and test split files are provided in datasets/<dataset_name>.

For each dataset, you should place JPEGImages to datasets/<dataset_name>.

Then, run sh datasets/zip_dir.sh (Note: please edit the folder path accordingly) for compressing each video in datasets/<dataset_name>/JPEGImages.

Unzip downloaded mask files to datasets.

The datasets directory structure will be arranged as: (Note: please check it carefully)

datasets
   |- davis
      |- JPEGImages
         |- <video_name>.zip
         |- <video_name>.zip
      |- test_masks
         |- <video_name>
            |- 00000.png
            |- 00001.png   
      |- train.json
      |- test.json
   |- youtube-vos
      |- JPEGImages
         |- <video_id>.zip
         |- <video_id>.zip
      |- test_masks
         |- <video_id>
            |- 00000.png
            |- 00001.png
      |- train.json
      |- test.json   
   |- zip_file.sh

Evaluation

Run one of the following commands for evaluation:

 # For evaluating E2FGVI model
 python evaluate.py --model e2fgvi --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth
 # For evaluating E2FGVI-HQ model
 python evaluate.py --model e2fgvi_hq --dataset <dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth

You will get scores as paper reported if you evaluate E2FGVI. The scores of E2FGVI-HQ can be found in [Prepare pretrained models].

The scores will also be saved in the results/<model_name>_<dataset_name> directory.

Please --save_results for further evaluating temporal warping error.

Training

Our training configures are provided in train_e2fgvi.json (for E2FGVI) and train_e2fgvi_hq.json (for E2FGVI-HQ).

Run one of the following commands for training:

 # For training E2FGVI
 python train.py -c configs/train_e2fgvi.json
 # For training E2FGVI-HQ
 python train.py -c configs/train_e2fgvi_hq.json

You could run the same command if you want to resume your training.

The training loss can be monitored by running:

tensorboard --logdir release_model                                                   

You could follow this pipeline to evaluate your model.

Results

Quantitative results

quantitative_results

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{liCvpr22vInpainting,
   title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},
   author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},
   booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2022}
}

Contact

If you have any question, please feel free to contact us via zhenli1031ATgmail.com or czlu919AToutlook.com.

License

Licensed under a Creative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only. Any commercial use should get formal permission first.

Acknowledgement

This repository is maintained by Zhen Li and Cheng-Ze Lu.

This code is based on STTN, FuseFormer, Focal-Transformer, and MMEditing.

Comments
  • About custom datasets

    About custom datasets

    Hello, very lucky to learn about your model. I was able to successfully train the davis dataset, but there are some issues with defining the dataset. There are 320 zip files in JPEGImages, each zip has ten photos. There are 320 normal mask files in test_masks, each with ten mask photos. test.json is the same as train.json.

    But when we run our own file, the following error occurs:is invalid for input of size 11272192


    Custom dataset directory: dataset ——ballet ————JPEGImages —————— xxx.zip —————— ......... ———test_masks —————— xxx ————train.json ————test.json


    specific error: Traceback (most recent call last): File "/home/u202080087/data/E2FGVI/train.py", line 84, in mp.spawn(main_worker, nprocs=config['world_size'], args=(config, )) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/u202080087/data/E2FGVI/train.py", line 64, in main_worker trainer.train() File "/home/u202080087/data/E2FGVI/core/trainer.py", line 288, in train self._train_epoch(pbar) File "/home/u202080087/data/E2FGVI/core/trainer.py", line 307, in _train_epoch pred_imgs, pred_flows = self.netG(masked_frames, l_t) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/e2fgvi_hq.py", line 255, in forward trans_feat = self.transformer([trans_feat, fold_output_size]) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 551, in forward attn_windows = self.attn(x_windows_all, mask_all=x_window_masks_all) File "/home/u202080087/.conda/envs/e2f/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 252, in forward 0] * self.window_size[1], C // self.num_heads), (q, k, v)) File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 248, in lambda t: window_partition(t, self.window_size).view( File "/home/u202080087/data/E2FGVI/model/modules/tfocal_transformer_hq.py", line 132, in window_partition window_size[1], C) RuntimeError: shape '[2, 8, 6, 5, 4, 9, 512]' is invalid for input of size 11272192


    Looking forward to your reply

    opened by caimaomao315 10
  • Solving environment: failed

    Solving environment: failed

    I try installing on both windows and linux however I get Solving environment: failed

    conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • libffi==3.3=he6710b0_2
    • lcms2==2.12=h3be6417_0
    • matplotlib-base==3.4.2=py37hab158f2_0
    • tornado==6.1=py37h27cfd23_0
    • brotli==1.0.9=he6710b0_2
    • scipy==1.6.2=py37had2a1c9_1
    • bzip2==1.0.8=h7b6447c_0
    • locket==0.2.1=py37h06a4308_1
    • libpng==1.6.37=hbc83047_0
    • ffmpeg==4.2.2=h20bf706_0
    • freetype==2.10.4=h5ab3b9f_0
    • expat==2.4.1=h2531618_2
    • xz==5.2.5=h7b6447c_0
    • ncurses==6.2=he6710b0_1
    • openh264==2.1.0=hd408876_0
    • qt==5.9.7=h5867ecd_1pt150_0
    • pywavelets==1.1.1=py37h7b6447c_2
    • libgfortran-ng==7.5.0=ha8ba4b0_17
    • libwebp-base==1.2.0=h27cfd23_0
    • pcre==8.45=h295c915_0
    • jpeg==9d=h7f8727e_0
    • ca-certificates==2022.2.1=h06a4308_0
    • certifi==2021.10.8=py37h06a4308_2
    • gstreamer==1.14.0=h28cd5cc_2
    • lame==3.100=h7b6447c_0
    • libtiff==4.2.0=h85742a9_0
    • tk==8.6.11=h1ccaba5_0
    • glib==2.69.1=h5202010_0
    • pillow==8.3.1=py37h2c7a002_0
    • libgcc-ng==9.3.0=h5101ec6_17
    • openssl==1.1.1m=h7f8727e_0
    • libstdcxx-ng==9.3.0=hd4cf53a_17
    • fontconfig==2.13.1=h6c09931_0
    • zstd==1.4.9=haebb681_0
    • zlib==1.2.11=h7b6447c_3
    • _openmp_mutex==4.5=1_gnu
    • pyqt==5.9.2=py37h05f1152_2
    • libvpx==1.7.0=h439df22_0
    • libgomp==9.3.0=h5101ec6_17
    • python==3.7.11=h12debd9_0
    • dbus==1.13.18=hb2f20db_0
    • x264==1!157.20191217=h7b6447c_0
    • openjpeg==2.4.0=h3ad879b_0
    • libtasn1==4.16.0=h27cfd23_0
    • lz4-c==1.9.3=h295c915_1
    • cytoolz==0.11.0=py37h7b6447c_0
    • mkl_fft==1.3.0=py37h42c9631_2
    • sqlite==3.36.0=hc218d9a_0
    • gnutls==3.6.15=he1e5248_0
    • icu==58.2=he6710b0_3
    • pytorch==1.5.1=py3.7_cuda9.2.148_cudnn7.6.3_0
    • libgfortran4==7.5.0=ha8ba4b0_17
    • yaml==0.2.5=h7b6447c_0
    • ninja==1.10.2=hff7bd54_1
    • nettle==3.7.3=hbbd107a_1
    • kiwisolver==1.3.1=py37h2531618_0
    • setuptools==58.0.4=py37h06a4308_0
    • libopus==1.3.1=h7b6447c_0
    • libunistring==0.9.10=h27cfd23_0
    • matplotlib==3.4.2=py37h06a4308_0
    • sip==4.19.8=py37hf484d3e_0
    • gmp==6.2.1=h2531618_2
    • pip==21.2.2=py37h06a4308_0
    • numpy-base==1.20.3=py37h74d4b33_0
    • libidn2==2.3.2=h7f8727e_0
    • pyyaml==5.4.1=py37h27cfd23_1
    • libxcb==1.14=h7b6447c_0
    • gst-plugins-base==1.14.0=h8213a91_2
    • ld_impl_linux-64==2.35.1=h7274673_9
    • mkl-service==2.4.0=py37h7f8727e_0
    • libuuid==1.0.3=h7f8727e_2
    • mkl_random==1.2.2=py37h51133e4_0
    • mkl==2021.3.0=h06a4308_520
    • libxml2==2.9.12=h03d6c58_0
    • intel-openmp==2021.3.0=h06a4308_3350
    • numpy==1.20.3=py37hf144106_0
    good first issue 
    opened by Tobe2d 8
  • Not able to reproduce the results listed in the paper with my trained model

    Not able to reproduce the results listed in the paper with my trained model

    I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give your loss curve? @Paper99

    opened by LigZhong 6
  • Question about learning rate

    Question about learning rate

    你好,感谢您的工作。我有一个关于学习率的问题。我注意到您文章中写到 initial learning rate is 0.0001,reduce at 400k by factor of 10 但在对比工作fuseformer中initial learning rate is 0.01,之后分别在200k,400k和450k时reduce by factor of 10 您是否测试过这二者的区别?是什么让您选择没有follow fuseformer的配置呢? 希望得到您的解答!!!

    opened by unclebuff 5
  • Demo videos to contribute

    Demo videos to contribute

    Hi,

    Thanks for this great repo and project.

    Not really an issue, more a question: I see the demo video section is TBD, would you be interested by some inferenced test videos in the wild for the read me? I am planning to run some anyway, hopefully in the next week or so, let me know and I ll share.

    Would be great to have higher res trained model to produce better quality demo videos too, but I see it is on the book of work.

    opened by Tetsujinfr 5
  • Output encoding settings

    Output encoding settings

    Hello. After a long while of trial and error, I managed to get this software running. It still doesn't run well, giving me OOM with more than 250 frames of 120x144 video. I have an 8GB 3060ti, which should be fine for this, in my opinion. Needing to split tasks many times is a pain, but might be manageable.

    What isn't manageable are the output settings. H.263 is outdated and with tiny input sizes and lengths, lossy is a baffling pick. Maybe I missed a customizing option somewhere? I would like to have lossless h264 or FFV1. In addition, I would like to decide the video's framerate (very important for syncing) and not have the video resized. That causes distortions that look bad.

    Thank you. Looking forward to the high-resolution model.

    opened by Troceleng 4
  • About the pretrained model of discriminator and opt.pth

    About the pretrained model of discriminator and opt.pth

    Hello, I'm very lucky and happy to know your work. What a fantastic work! I am doing some research which also contains video inpainting. I'd like to finetune your pretrained model on my new dataset. However, I could only find the generator model in the link given in README.md. Could you please upload the discriminator model as well (also the opt.pth)? Or could you please tell me how to get access to it in case I missed the downloading link? Thank you very much!

    opened by nlx0021 3
  • How to generate object-like masks

    How to generate object-like masks

    Hi authors,

    Thanks for your awesome work!

    I'm wondering if you used the same 'create_random_shape_with_random_motion' function for both video completion and object removal, if so, can I say this model has only been trained once for both tasks?

    Besides, does this moving mask (https://github.com/MCG-NKU/E2FGVI/blob/master/core/utils.py#L209) refer to the object-like masks mentioned in your paper (experiment settings)?

    opened by sczhou 2
  • Request a suggestion for model distillation

    Request a suggestion for model distillation

    This model is great, but the calculation speed is a bit slow, I want to try to distill this model, can you give some advice? Such as which layers can be reduced or removed

    opened by 980202006 2
  • frame_idx和flow_idx

    frame_idx和flow_idx

    [syujung] 在七月20号 问了以下这个问题(Issue#25) “作者您好,您的这片工作非常精彩,效果也很棒! 我有一个关于代码的问题,您上传的代码models/modules/feat_prop.py里面,我对比了一下basicvsr++的源码,感觉在backward_propagation的时候,得到cond_n1所用的光流是不是有问题,您写的for循环frame_idx和flow_idx应该保持顺序一致?我看basicvsr++是这样的,想询问一下”

    您可以具体说一下应该怎样修改现在的代码?多谢!

    opened by Roowenliang 2
  • 怎么优化GPU 内存?

    怎么优化GPU 内存?

    作者您好, 非常优秀的算法! 我在尝试训练时遇到: RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 2; 11.93 GiB total capacity; 11.07 GiB already allocated; 56.12 MiB free; 11.43 GiB reserved in total by PyTorch) 我的机器有四个·GPU, 把 "batch_size": 从8改为4后 才工作。 对于每个GPU来说batch_size 是 2. 如果我想要增加batch_size, 不知道有没可能优化内存, 您有什么建议吗? 多谢!

    opened by Roowenliang 2
  • about the loss

    about the loss

    Hello, when I try to train your e2fgvi_hq model on the youtube-vos datasets, after 400k iterations the model will always collapse, and the flow loss will became nan. Have you ever face this problem? Looking forward to your answer!

    opened by Davidcoach 1
  • How to tune the parameters for high resolution video and low memory GPU

    How to tune the parameters for high resolution video and low memory GPU

    Firstly, this work is awesome. I'm trying this for a 640*480 video on v100 GPU, and I met the OOM issue. I've successfuly got a reasonable result after resizing the video to a very small resolution, but that's not what I want. So I wonder is there any way to fit the 640p video to my v100 GPU by tuning the parameters, thanks.

    opened by zhangyuting 1
  • Hi about the memory error

    Hi about the memory error

    When I was trying to run my own video, it meet the problem of memory.

    RuntimeError: CUDA out of memory. Tried to allocate 1.62 GiB (GPU 0; 8.00 GiB total capacity; 5.05 GiB already allocated; 0 bytes free; 7.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

    Frame number and video are small than the tennis or school girl demo. I am able to run those two demo, but not able to run my own demo.

    opened by tchen0623 3
  • About the

    About the

    作者您好,您的这片工作非常精彩,效果也很棒! 我有一个关于代码的问题,您上传的代码models/modules/feat_prop.py里面,我对比了一下basicvsr++的源码,感觉在backward_propagation的时候,得到cond_n1所用的光流是不是有问题,您写的for循环frame_idx和flow_idx应该保持顺序一致?我看basicvsr++是这样的,想询问一下

    opened by syujung 4
Owner
Media Computing Group @ Nankai University
Media Computing Group at Nankai University, led by Prof. Ming-Ming Cheng.
Media Computing Group @ Nankai University
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 384 Oct 3, 2022
This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

demonsjin 40 Sep 27, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 31 Sep 19, 2022
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 15 Sep 21, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

null 64 Sep 28, 2022
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jiale Cao 18 Sep 8, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Bhchen 57 Sep 30, 2022
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

null 51 Sep 23, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 122 Sep 24, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 72 Sep 9, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 94 Sep 24, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 144 Sep 23, 2022
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 180 Sep 28, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

null 48 Sep 26, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 52 Sep 25, 2022
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 59 Sep 28, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 89 Sep 24, 2022
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Intelligent Vision for Robotics in Complex Environment 82 Sep 25, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 202 Sep 25, 2022