CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Overview

Diverse Structure Inpainting

ArXiv | Papar | Supplementary Material | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

If our method is useful for your research, please consider citing.

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

Results on the Places2 validation set using the center-mask Places2 model.

CelebA-HQ Results

Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository:

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

  • CelebA-HQ: the high-resolution face images from Growing GANs. 24183 images for training, 2993 images for validation and 2824 images for testing.
  • Places2: the challenge data from 365 scene categories. 8 Million images for training, 36K images for validation and 328K images for testing.
  • ImageNet: the data from 1000 natural categories. 1 Million images for training and 50K images for validation.

Training

  • Collect the dataset. For CelebA-HQ, we collect the 1024x1024 version. For Places2 and ImageNet, we collect the original version.
  • Prepare the file list. Collect the path of each image and make a file, where each line is a path (end with a carriage return except the last line).
  • Modify checkpoints_dir, dataset, train_flist and valid_flist arguments in train_vqvae.py, train_structure_generator.py and train_texture_generator.py.
  • Modify data/data_loader.py according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256
  • Run python train_vqvae.py to train VQ-VAE.
  • Modify vqvae_network_dir argument in train_structure_generator.py and train_texture_generator.py based on the path of pre-trained VQ-VAE.
  • Modify the mask setting arguments in train_structure_generator.py and train_texture_generator.py to choose center mask or random mask.
  • Run python train_structure_generator.py to train the structure generator.
  • Run python train_texture_generator.py to train the texture generator.
  • Modify structure_generator_dir and texture_generator_dir arguments in save_full_model.py based on the paths of pre-trained structure generator and texture generator.
  • Run python save_full_model.py to save the whole model.

Testing

  • Collect the testing set. For CelebA-HQ, we resize each image to 256x256. For Places2 and ImageNet, we crop a center 256x256.
  • Collect the corresponding mask set (2D grayscale, 0 indicates the known region, 255 indicates the missing region).
  • Prepare the img file list and the mask file list as training.
  • Modify checkpoints_dir, dataset, img_flist and mask_flist arguments in test.py.
  • Download the pre-trained model and put model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001 and checkpoint under model_logs/ directory.
  • Run python test.py

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

Comments
  • Mask images and file lists

    Mask images and file lists

    Thank you for sharing your great work!

    I want to run test.py using your pretrained models. Do I need to prepare the mask images and file lists by myself? If you have the code to generate them, would you mind sharing it?

    opened by naoki7090624 14
  • train_vqvae.py and train_texture_generator.py can not accelerated by GPU ?

    train_vqvae.py and train_texture_generator.py can not accelerated by GPU ?

    Hi, thank you for your great work! When training the vqvae network, the training time is very long, could you tell me how to accelerate? Thanks for replying!

    opened by wwlCape 3
  • What's the effect of discrete textural features?

    What's the effect of discrete textural features?

    Hi! Thanks for your great work.

    I wonder about the effect of discrete textural features. It seems that these discrete textural features do not enroll in the image generation part. Are the benefits to splits structural and textural features are better disentanglement?

    opened by 07hyx06 3
  • filelist

    filelist

    I am a newcomer in this field and am very interested in your program. Can you tell me how filelist is generated? Can you publish your code? Thank you.

    opened by xiumin123 2
  • Assign requires shapes of both tensors to match. lhs shape= [256] rhs shape= [512]

    Assign requires shapes of both tensors to match. lhs shape= [256] rhs shape= [512]

    I use a data set with picture sizes of 256*256 and use the pre-trained model you gave for testing. The program reports such an error. Do you know what I did wrong? Thank you. image

    opened by BigbigVegetable 2
  • image processing

    image processing

    Could you please elaborate a little on input image size and size of mask for respective datasets? like celeb input image is 266x266 and the mask is written 256x256, which seems to be wrong.

    Specifically, could you share the input image size places 2, and the size of the mask?

    In training: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/main/README.md#:~:text=Modify%20data/data_loader.py%20according%20to%20the%20dataset.%20For%20CelebA-HQ%2C%20we%20resize%20each%20image%20to%20266x266%20and%20randomly%20crop%20a%20256x256.%20For%20Places2%20and%20ImageNet%2C%20we%20randomly%20crop%20a%20256x256

    In testing https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/main/README.md#:~:text=Collect%20the%20testing%20set.%20For%20CelebA-HQ%2C%20we%20resize%20each%20image%20to%20256x256.%20For%20Places2%20and%20ImageNet%2C%20we%20crop%20a%20center%20256x256.

    I have tested the pre-trained model and found the results to be very off, attaching two sample-generated images. 00000 00001

    opened by jawadMansoor 2
  • Why concatenating a matrix of ones?

    Why concatenating a matrix of ones?

    Hi and first of all thanks for the great code!

    Can I ask why you stack a matrix of ones in the penultimate channel here? https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/467b7a9ea3b783c09ec5beb941085bd8e75de2d3/net/structure_generator.py#L30

    Wouldn't it make more sense to have x = tf.concat([x, ones_x*(1-mask), ones_x*mask], 3) ? (i.e. you concat the mask as its one-hot encoding).

    Best, P.

    opened by pmorerio 2
  • Use of VQ-VAE in test.py

    Use of VQ-VAE in test.py

    I am wondering if there is a discrepancy between the how inference is explained in the paper and how it is implemented in test.py. To be specific, the paper says "During inference, only Gs and Gt are used." but the VQ-VAE model is loaded in test.py and used to do inference (see here and here for example).

    I am new to tensorflow and not super familiar with VQ-VAE, so I might be missing something, but I think since the VQ-VAE encoder is getting the full image as input (see here) there might information leak from the full image to the inpainting module at inference time. Please correct me if I'm wrong.

    Thank you.

    opened by saeidnp 2
  • Crop or Resize

    Crop or Resize

    Hi, you mention in the README : "For Places2 and ImageNet, we crop a center 256x256." It seems that in your test.py code, you resize all images to 256x256 and don't do any cropping. What did you do for your results in the paper ?

    opened by samuro95 1
  • Colab Demo? Inference on Custom Images ?

    Colab Demo? Inference on Custom Images ?

    Hello, thank you for your amazing implementation Can you provide a demo file or a colab notebook for custom images, it will be much appreciated and it'll help everyone,

    Thank you

    opened by Adeel-Intizar 1
  • Genearting unmasked faces from faces

    Genearting unmasked faces from faces

    Hi! I am using the model for generating new faces from masked faces. I have run test.py and modify the code a little for removing face mask.

    The model is not performing on CelebA masked faces , kindly guide for training for masked face, do i need to pass masks ?

    opened by chandniagarwal 23
  • occur indexError

    occur indexError

    hello,when I use your code training, it occurs indexError: in this part: for i in range(4): gt_i = ((gt[i] + 1.) * 127.5).astype(np.uint8) masked_i = ((masked[i] + 1.) * 127.5).astype(np.uint8) complete_i = ((complete[i] + 1.) * 127.5).astype(np.uint8) recons_gt_i = ((recons_gt[i] + 1.) * 127.5).astype(np.uint8)

    train_structure_generator.py: Traceback (most recent call last): File "train_structure_generator.py", line 366, in nn.structure_visual(gt_np, masked_np, recons_gen_np, recons_gt_np, (i + 1), args.image_size, folder_path) File "D:\pythonProject\5_19\VQ_VAE\net\nn.py", line 173, in structure_visual gt_i = ((gt[i] + 1.) * 127.5).astype(np.uint8) IndexError: index 1 is out of bounds for axis 0 with size 1

    How many dimensions is the parameter gt? What data parameter is gt?thank you

    opened by CodeMadUser 2
  • Insufficient video memory

    Insufficient video memory

    Training structure_generator.In order to solve the problem of insufficient video memory during generator, reducing batch_size leads to an error when saving the training results : Index 1 is out of bounds for axis 0 with size 1.I hope I can get your help. Thank you!

    opened by userLx888 5
  • The problem that GPU(A4000)  does not adapt to tf1.12 version

    The problem that GPU(A4000) does not adapt to tf1.12 version

    The paper is exciting! Thanks to the authors. The program runs perfectly on the GPUs(two 2080), but the same version(TensorFlow-gpu=1.12.0 cuda=9.0 cudnn=7.6.5) cannot be adapted to the two A4000. Does anyone have a solution to the version problem?

    opened by plusleft 3
Owner
null
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

andreas 104 Oct 26, 2022
[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

null 144 Dec 24, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 3, 2023
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

involution Official implementation of a neural operator as described in Involution: Inverting the Inherence of Convolution for Visual Recognition (CVP

Duo Li 1.3k Dec 28, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR2021] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator Overview This is the entire codebase for the paper

null 35 Dec 1, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Back to the Feature with PixLoc We introduce PixLoc, a neural network for end-to-end learning of camera localization from an image and a 3D model via

Computer Vision and Geometry Lab 610 Jan 5, 2023
Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

SMSR Reposity for "Exploring Sparsity in Image Super-Resolution for Efficient Inference" [arXiv] Highlights Locate and skip redundant computation in S

Longguang Wang 225 Dec 26, 2022
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Rex Cheng 364 Jan 3, 2023
Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks This repository contains the code that accompanies our CVPR 20

Despoina Paschalidou 161 Dec 20, 2022