Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Qiaole Dong

Last update: Dec 27, 2022

Related tags

Deep Learning ZITS_inpainting

Overview

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

by Qiaole Dong*, Chenjie Cao*, Yanwei Fu

Paper and Supplemental Material (arXiv)

Pipeline

Click to expand

The overview of our ZITS. At first, the TSR model is used to restore structures with low resolutions. Then the simple CNN based upsampler is leveraged to upsample edge and line maps. Moreover, the upsampled sketch space is encoded and added to the FTR through ZeroRA to restore the textures.

TO DO

We have updated weights of TSR!

Our project page is available at https://dqiaole.github.io/ZITS_inpainting/.

Releasing inference codes.
Releasing pre-trained moodel.
Releasing training codes.

Preparation

Click to expand

Preparing the environment:

as there are some bugs when using GP loss with DDP (link), we strongly recommend installing Apex without CUDA extensions via torch1.9.0 for the multi-gpu training

conda create -n train_env python=3.6
conda activate train_env
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirement.txt
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" ./

For training, MST provide irregular and segmentation masks (download) with different masking rates. And you should define the mask file list before the training as in MST.
Download the pretrained masked wireframe detection model to the './ckpt' fold: LSM-HAWP (MST ICCV2021 retrained from HAWP CVPR2020).
Prepare the wireframes:

as the MST train the LSM-HAWP in Pytorch 1.3.1 and it causes problem (link) when tested in Pytorch 1.9, we recommand to inference the lines(wireframes) with torch==1.3.1. If the line detection is not based on torch1.3.1, the performance may drop a little.
```
conda create -n wireframes_inference_env python=3.6
conda activate wireframes_inference_env
pip install torch==1.3.1 torchvision==0.4.2
pip install -r requirement.txt
```
then extract wireframes with following code
```
python lsm_hawp_inference.py --ckpt_path <best_lsm_hawp.pth> --input_path <input image path> --output_path <output image path> --gpu_ids '0'
```

If you need to train the model, please download the pretrained models for perceptual loss, provided by LaMa:

mkdir -p ade20k/ade20k-resnet50dilated-ppm_deepsup/
wget -P ade20k/ade20k-resnet50dilated-ppm_deepsup/ http://sceneparsing.csail.mit.edu/model/pytorch/ade20k-resnet50dilated-ppm_deepsup/encoder_epoch_20.pth

Eval

Click to expand

Download pretrained models on Places2 here.

Link for BaiduDrive, password:qnm5

Batch Test

For batch test, you need to complete steps 3 and 4 above.

Put the pretrained models to the './ckpt' fold. Then modify the config file according to you image, mask and wireframes path.

Test on 256 images:

conda activate train_env
python FTR_inference.py --path ./ckpt/zits_places2 --config_file ./config_list/config_ZITS_places2.yml --GPU_ids '0'

Test on 512 images:

conda activate train_env
python FTR_inference.py --path ./ckpt/zits_places2_hr --config_file ./config_list/config_ZITS_HR_places2.yml --GPU_ids '0'

Single Image Test

Note: For single image test, environment 'wireframes_inference_env' in step 4 is recommended for a better line detection. This code only supports squared images (or they will be center cropped).

conda activate wireframes_inference_env
python single_image_test.py --path <ckpt_path> --config_file <config_path> \
 --GPU_ids '0' --img_path ./image.png --mask_path ./mask.png --save_path ./

Training

Click to expand

⚠️ Warning: The training codes is not fully tested yet after refactoring

Training TSR

python TSR_train.py --name places2_continous_edgeline --data_path [training_data_path] \
 --train_line_path [training_wireframes_path] \
 --mask_path ['irregular_mask_list.txt', 'coco_mask_list.txt'] \
 --train_epoch 12 --validation_path [validation_data_path] \
 --val_line_path [validation_wireframes_path] \
 --valid_mask_path [validation_mask] --nodes 1 --gpus 1 --GPU_ids '0' --AMP

python TSR_train.py --name places2_continous_edgeline --data_path [training_data_path] \
 --train_line_path [training_wireframes_path] \
 --mask_path ['irregular_mask_list.txt', 'coco_mask_list.txt'] \
 --train_epoch 15 --validation_path [validation_data_path] \
 --val_line_path [validation_wireframes_path] \
 --valid_mask_path [validation_mask] --nodes 1 --gpus 1 --GPU_ids '0' --AMP --MaP

Train SSU

We recommend to use the pretrained SSU. You can also train your SSU refered to https://github.com/ewrfcas/StructureUpsampling.

Training LaMa First

python FTR_train.py --nodes 1 --gpus 1 --GPU_ids '0' --path ./ckpt/lama_places2 \
--config_file ./config_list/config_LAMA.yml --lama

Training FTR

256:

python FTR_train.py --nodes 1 --gpus 2 --GPU_ids '0,1' --path ./ckpt/places2 \
--config_file ./config_list/config_ZITS_places2.yml --DDP

256~512:

python FTR_train.py --nodes 1 --gpus 2 --GPU_ids '0,1' --path ./ckpt/places2_HR \
--config_file ./config_list/config_ZITS_HR_places2.yml --DDP

More 1K Results

Click to expand

Acknowledgments

This repo is built upon MST, ICT and LaMa.

Cite

If you found our program helpful, please consider citing:

@inproceedings{dong2022incremental,
      title={Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding}, 
      author={Qiaole Dong and Chenjie Cao and Yanwei Fu},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      year={2022}
}

Comments

lsm_hawp_inference.py_result_bad

I try to use the lsm_hawp_inference.py to generate the .pkl of my dataset (place365). I used the best_lsm_hawp.pth which you provided. But the result is really bad. I tried reduce threshold=0.8 > 0.5 but it still has bad result.

Do you have the best_palce365_lsm_hawp.pth? Or how do we train our own hawp.

The iamge is the sample from training.(14001.jpg)

opened by bobo0303 13
Is there a demo code to make inference on custom image and mask

Hi, I tried single_image_test.py, but it is hard coded for Places 365 Standard. Is there any simpler demo code to show the results based on a pair of inputs such as image and corresponding mask?

opened by yijingru 6
inpainting_metrics.py中ValueError: axes don't match array错误

作者你好，在文件的这行代码出现上述问题如果此时传入的是indoor数据集的原图像，不是256*256大小的，根据博客：https://stackoverflow.com/questions/37747021/create-numpy-array-of-images 给出的方法，对图像进行处理后可以正常运行，但是不知道这样是否会对结果产生影响。我想知道你们有遇到过这个问题吗？能否给我一点建议呢？期待您的回复，谢谢

opened by Ellohiye 4
Single image test

你好，作者，你做的工作非常棒，只是我在进行源码测试时有一些疑问：在下面的配置设置中 python single_image_test.py --path <ckpt_path> --config_file <config_path>
--GPU_ids '0' --img_path ./image.png --mask_path ./mask.png --save_path ./ 权重path使用哪个呢？config_file使用哪个文件呢？我自己设置的设置如下： python single_image_test.py --path ./ckpt/zits_places2_hr --config_file ./config_list/config_ZITS_HR_places2.yml --GPU_ids '0' --img_path ./test_i/img1.png --mask_path ./test_i/mask1.png --save_path ./test_i/ 但是出现了下面的错误： Traceback (most recent call last): File "single_image_test.py", line 322, in model = ZITS(config, 0, 0, True) File "D:\pythonProject\7_4\inpaint\ZITS_inpainting-main\src\FTR_trainer.py", line 296, in init min_sigma=min_sigma, max_sigma=max_sigma) File "D:\pythonProject\7_4\inpaint\ZITS_inpainting-main\datasets\dataset_FTR.py", line 178, in init f = open(flist, 'r') FileNotFoundError: [Errno 2] No such file or directory: '/home/wmlce/places365_standard/places2_all/test_sub_list.txt'

单图测试也需要跟数据集一样的设置吗？希望能将测试步骤更加详细一些。希望能回复，非常感谢。希望能将测试步骤更加详细一些。

opened by CodeMadUser 4
Bad results

I am getting some very poor results. I am using the single_image script and resizing images to 512,512

Can some of the images + masks from the showed resuts can be shared? This way I could verify if I did something weird

opened by mhashas 4
可否预上传一份pth样本，直接调试
config_ZITS_places2.yml

transformer_ckpt_path: './ckpt/best_transformer_places2.pth' gen_weights_path0: './ckpt/lama_places2/InpaintingModel_gen.pth' # Not required at the time of eval dis_weights_path0: './ckpt/lama_places2/InpaintingModel_dis.pth' # Not required at the time of eval structure_upsample_path: './ckpt/StructureUpsampling.pth'

D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\src\models\FTR_model.py

data = torch.load(config.structure_upsample_path, map_location='cpu')

发生异常: AttributeError 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

During handling of the above exception, another exception occurred:

File "D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\src\models\FTR_model.py", line 165, in init data = torch.load(config.structure_upsample_path, map_location='cpu') File "D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\src\models\FTR_model.py", line 427, in init super().init(*args, gpu=gpu, name='InpaintingModel', rank=rank, test=test, **kwargs) File "D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\src\FTR_trainer.py", line 256, in init self.inpaint_model = DefaultInpaintingTrainingModule(config, gpu=gpu, rank=rank, test=test, **kwargs).to(gpu) File "D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\single_image_test.py", line 323, in model = ZITS(config, 0, 0, True)

PS D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main> & 'D:\pm\python\python38\python.exe' 'c:\Users\Administrator.vscode\extensions\ms-python.python-2022.4.1\pythonFiles\lib\python\debugpy\launcher' '40191' '--' 'd:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\single_image_test.py' '--path=D:\pm\python\lama\LaMa_models\lama-places\lama-fourier\models' '--config_file=D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\config_list\config_ZITS_places2.yml' '--GPU_ids=-1' '--img_path=D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\imgs\y\i1.png' '--mask_path=D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\imgs\mask\i1.png' '--save_path=D:\pm\python\inpaint\ZITS_inpainting-main\ZITS_inpainting-main\imgs' Backend TkAgg is interactive backend. Turning interactive mode on. BaseInpaintingTrainingModule init called Loading InpaintingModel StructureUpsampling...
opened by time888 4

ERROR: Could not find a version that satisfies the requirement torch==1.3.1

Hi, You recommend to inference the wireframes with torch 1.3.1 on README, but could not find the version by pip.

ERROR: Could not find a version that satisfies the requirement torch==1.3.1 (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.3.1

How did you install the old version?

opened by naoki7090624 4

Question about loss and activation function
Hi, I have questions about activation function and loss.

Why you calculate the loss before the activation function? According to your code, the cross entropy loss is calculated before sigmoid function. In the general CNN, I think the loss is calculated after the activation. Could you tell me why.

Why you use only cross entropy loss? According to your code, only the cross entropy loss is calculated in the TSR. I wonder if you could use other losses (L1 loss, feature matching loss) after upsampling because there are convolution layers after transformer blocks.
opened by naoki7090624 4
Access to the pre-trained model

Loved the paper! The results compare to LaMa are amazing. Can I have access to the lightest pre-trained model? (Benchmarking on mobile devices)

Best regards, Roi

opened by roimulia2 4
Pretrained Indoor Model

Hi, Can you upload the pretrained Indoor data model - the results of which you share in your paper? Also, can you share the trained models of the comparative methods you show results for in your paper?

Thank you.

opened by toshi2k2 3
wireframe model is irrelevant
Hi,

I've been playing quite a bit with your model due to the amazing results. Something that i've noticed is that the wireframe model is irrelevant. If I return a full zeros tensor of lines_tensor of the same shape as the actual output in wf_inference_test, I get the same final outputs. Is there a bug somewhere?

To replicate:

return torch.zeros_like(lines_tensor.detach()) in wf_inference_test

Update:

It seems that also the edges seem to be useless.

batch["line_256"] = torch.zeros_like(batch["mask_256"]) batch["line"] = torch.zeros_like(batch["mask_512"]) batch["edge"] = torch.zeros_like(batch["mask_512"])

Do this change gives me the same results

Let me know if im doing something wrong
opened by mhashas 3
the path in config

modify the image path
# origin images? TRAIN_FLIST: ./data_list/sp_large_train_list.txt
VAL_FLIST: ./data_list/sp_large_val_list.txt TEST_FLIST: ./data_list/sp_large_val_list.txt

set the GT images folder for metrics computation
# origin Val image? GT_Val_FOLDER: './datasets/inpaint_data/val_images/'

modify the mask path
# the mask of random generation? TRAIN_MASK_FLIST: [ './data_list/mask_large_train_list.txt', './data_list/mask_large_train_list.txt' ]

the real mask of object when object remove? TEST_MASK_FLIST: ./datasets/inpaint_data/val_SH_binary_masks/

Could you tell me that the mean of these path in my mind is right?

opened by ErisGe 1

Owner

Qiaole Dong

GitHub

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

5 Jan 27, 2022

AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

214 Jan 3, 2023

My implementation of Image Inpainting - A deep learning Inpainting model

Image Inpainting What is Image Inpainting Image inpainting is a restorative process that allows for the fixing or removal of unwanted parts within ima

1 Dec 12, 2021

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | 简体中文 This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University

537 Jan 7, 2023

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

96 Nov 30, 2022

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

152 Nov 4, 2022

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

122 Dec 11, 2022

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

1 Jan 6, 2022

Implementation of Common Image Evaluation Metrics by Sayed Nadim (sayednadim.github.io). The repo is built based on full reference image quality metrics such as L1, L2, PSNR, SSIM, LPIPS. and feature-level quality metrics such as FID, IS. It can be used for evaluating image denoising, colorization, inpainting, deraining, dehazing etc. where we have access to ground truth.

Image Quality Evaluation Metrics Implementation of some common full reference image quality metrics. The repo is built based on full reference image q

10 Jan 1, 2023

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral) Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia [Paper] News This

254 Dec 29, 2022

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

209 Dec 30, 2022

Implementation of Convolutional enhanced image Transformer

CeiT : Convolutional enhanced image Transformer This is an unofficial PyTorch implementation of Incorporating Convolution Designs into Visual Transfor

82 Dec 13, 2022

Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

63 Dec 16, 2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentati

253 Dec 21, 2022

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

29 Sep 2, 2022

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

73 Dec 24, 2022

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

12 Jan 6, 2023

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots Blind2Unblind Citing Blind2Unblind @inproceedings{wang2022blind2unblind, tit

58 Dec 6, 2022

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

32 Dec 15, 2022