CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Last update: Nov 4, 2022

Related tags

Deep Learning tensorflow attention generative-adversarial-networks inpainting multimodal vq-vae autoregressive-neural-networks

Overview

Diverse Structure Inpainting

ArXiv | Papar | Supplementary Material | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

If our method is useful for your research, please consider citing.

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

Results on the Places2 validation set using the center-mask Places2 model.

CelebA-HQ Results

Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository：

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

CelebA-HQ: the high-resolution face images from Growing GANs. 24183 images for training, 2993 images for validation and 2824 images for testing.
Places2: the challenge data from 365 scene categories. 8 Million images for training, 36K images for validation and 328K images for testing.
ImageNet: the data from 1000 natural categories. 1 Million images for training and 50K images for validation.

Training

Collect the dataset. For CelebA-HQ, we collect the 1024x1024 version. For Places2 and ImageNet, we collect the original version.
Prepare the file list. Collect the path of each image and make a file, where each line is a path (end with a carriage return except the last line).
Modify checkpoints_dir, dataset, train_flist and valid_flist arguments in train_vqvae.py, train_structure_generator.py and train_texture_generator.py.
Modify data/data_loader.py according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256
Run python train_vqvae.py to train VQ-VAE.
Modify vqvae_network_dir argument in train_structure_generator.py and train_texture_generator.py based on the path of pre-trained VQ-VAE.
Modify the mask setting arguments in train_structure_generator.py and train_texture_generator.py to choose center mask or random mask.
Run python train_structure_generator.py to train the structure generator.
Run python train_texture_generator.py to train the texture generator.
Modify structure_generator_dir and texture_generator_dir arguments in save_full_model.py based on the paths of pre-trained structure generator and texture generator.
Run python save_full_model.py to save the whole model.

Testing

Collect the testing set. For CelebA-HQ, we resize each image to 256x256. For Places2 and ImageNet, we crop a center 256x256.
Collect the corresponding mask set (2D grayscale, 0 indicates the known region, 255 indicates the missing region).
Prepare the img file list and the mask file list as training.
Modify checkpoints_dir, dataset, img_flist and mask_flist arguments in test.py.
Download the pre-trained model and put model.ckpt.meta, model.ckpt.index, model.ckpt.data-00000-of-00001 and checkpoint under model_logs/ directory.
Run python test.py

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

center_mask model: CelebA-HQ_center | Places2_center | ImageNet_center
random_mask model: CelebA-HQ_random | Places2_random | ImageNet_random

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

Comments

Mask images and file lists

Thank you for sharing your great work!

I want to run test.py using your pretrained models. Do I need to prepare the mask images and file lists by myself? If you have the code to generate them, would you mind sharing it?

opened by naoki7090624 14
train_vqvae.py and train_texture_generator.py can not accelerated by GPU ?

Hi, thank you for your great work! When training the vqvae network, the training time is very long, could you tell me how to accelerate? Thanks for replying!

opened by wwlCape 3
What's the effect of discrete textural features?

Hi! Thanks for your great work.

I wonder about the effect of discrete textural features. It seems that these discrete textural features do not enroll in the image generation part. Are the benefits to splits structural and textural features are better disentanglement?

opened by 07hyx06 3
filelist

I am a newcomer in this field and am very interested in your program. Can you tell me how filelist is generated? Can you publish your code? Thank you.

opened by xiumin123 2
Assign requires shapes of both tensors to match. lhs shape= [256] rhs shape= [512]

I use a data set with picture sizes of 256*256 and use the pre-trained model you gave for testing. The program reports such an error. Do you know what I did wrong? Thank you.

opened by BigbigVegetable 2
image processing

Could you please elaborate a little on input image size and size of mask for respective datasets? like celeb input image is 266x266 and the mask is written 256x256, which seems to be wrong.

Specifically, could you share the input image size places 2, and the size of the mask?

In training: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/main/README.md#:~:text=Modify%20data/data_loader.py%20according%20to%20the%20dataset.%20For%20CelebA-HQ%2C%20we%20resize%20each%20image%20to%20266x266%20and%20randomly%20crop%20a%20256x256.%20For%20Places2%20and%20ImageNet%2C%20we%20randomly%20crop%20a%20256x256

In testing https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/main/README.md#:~:text=Collect%20the%20testing%20set.%20For%20CelebA-HQ%2C%20we%20resize%20each%20image%20to%20256x256.%20For%20Places2%20and%20ImageNet%2C%20we%20crop%20a%20center%20256x256.

I have tested the pre-trained model and found the results to be very off, attaching two sample-generated images.

opened by jawadMansoor 2
Why concatenating a matrix of ones?

Hi and first of all thanks for the great code!

Can I ask why you stack a matrix of ones in the penultimate channel here? https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting/blob/467b7a9ea3b783c09ec5beb941085bd8e75de2d3/net/structure_generator.py#L30

Wouldn't it make more sense to have x = tf.concat([x, ones_x*(1-mask), ones_x*mask], 3) ? (i.e. you concat the mask as its one-hot encoding).

Best, P.

opened by pmorerio 2
Use of VQ-VAE in test.py

I am wondering if there is a discrepancy between the how inference is explained in the paper and how it is implemented in test.py. To be specific, the paper says "During inference, only Gs and Gt are used." but the VQ-VAE model is loaded in test.py and used to do inference (see here and here for example).

I am new to tensorflow and not super familiar with VQ-VAE, so I might be missing something, but I think since the VQ-VAE encoder is getting the full image as input (see here) there might information leak from the full image to the inpainting module at inference time. Please correct me if I'm wrong.

Thank you.

opened by saeidnp 2
Crop or Resize

Hi, you mention in the README : "For Places2 and ImageNet, we crop a center 256x256." It seems that in your test.py code, you resize all images to 256x256 and don't do any cropping. What did you do for your results in the paper ?

opened by samuro95 1
Colab Demo? Inference on Custom Images ?

Hello, thank you for your amazing implementation Can you provide a demo file or a colab notebook for custom images, it will be much appreciated and it'll help everyone,

Thank you

opened by Adeel-Intizar 1
Genearting unmasked faces from faces

Hi! I am using the model for generating new faces from masked faces. I have run test.py and modify the code a little for removing face mask.

The model is not performing on CelebA masked faces , kindly guide for training for masked face, do i need to pass masks ?

opened by chandniagarwal 23
occur indexError

hello,when I use your code training, it occurs indexError: in this part: for i in range(4): gt_i = ((gt[i] + 1.) * 127.5).astype(np.uint8) masked_i = ((masked[i] + 1.) * 127.5).astype(np.uint8) complete_i = ((complete[i] + 1.) * 127.5).astype(np.uint8) recons_gt_i = ((recons_gt[i] + 1.) * 127.5).astype(np.uint8)

train_structure_generator.py： Traceback (most recent call last): File "train_structure_generator.py", line 366, in nn.structure_visual(gt_np, masked_np, recons_gen_np, recons_gt_np, (i + 1), args.image_size, folder_path) File "D:\pythonProject\5_19\VQ_VAE\net\nn.py", line 173, in structure_visual gt_i = ((gt[i] + 1.) * 127.5).astype(np.uint8) IndexError: index 1 is out of bounds for axis 0 with size 1

How many dimensions is the parameter gt? What data parameter is gt?thank you

opened by CodeMadUser 2
Insufficient video memory

Training structure_generator.In order to solve the problem of insufficient video memory during generator, reducing batch_size leads to an error when saving the training results : Index 1 is out of bounds for axis 0 with size 1.I hope I can get your help. Thank you!

opened by userLx888 5
The problem that GPU(A4000) does not adapt to tf1.12 version

The paper is exciting! Thanks to the authors. The program runs perfectly on the GPUs(two 2080), but the same version(TensorFlow-gpu=1.12.0 cuda=9.0 cudnn=7.6.5) cannot be adapted to the two A4000. Does anyone have a solution to the version problem?

opened by plusleft 3

Owner

GitHub

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

44 Dec 12, 2022

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Related tags

Overview

Diverse Structure Inpainting

Introduction

Places2 Results

CelebA-HQ Results

Installation

Datasets

Training

Testing

Pre-trained Models

Inference Time

Comments

Owner

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CVPR 2021 Challenge on Super-Resolution Space

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Code for our CVPR 2021 paper "MetaCam+DSCE"

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose (CVPR 2021)

Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021