MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Last update: Dec 29, 2022

Related tags

Deep Learning MAT

Overview

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral)

Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia

[Paper]

News

This is the official implementation of MAT. The training and testing code is released. We also provide our masks for CelebA-HQ-val and Places-val here.

Visualization

We present a transformer-based model (MAT) for large hole inpainting with high fidelity and diversity.

Compared to other methods, the proposed MAT restores more photo-realistic images with fewer artifacts.

Usage

Clone the repository.

git clone https://github.com/fenglinglwb/MAT.git

Install the dependencies.
- Python 3.7
- PyTorch 1.7.1
- Cuda 11.0
- Other packages
```
pip install -r requirements.txt
```

Quick Test

We provide models trained on CelebA-HQ and Places365-Standard at 512x512 resolution. Download models from One Drive and put them into the 'pretrained' directory. The released models are retrained, and hence the visualization results may slightly differ from the paper.
Obtain inpainted results by running
```
python generate_image.py --network model_path --dpath data_path --outdir out_path [--mpath mask_path]
```
where the mask path is optional. If not assigned, random 512x512 masks will be generated. Note that 0 and 1 values in a mask refer to masked and remained pixels.

For example, run
```
python generate_image.py --network pretrained/CelebA-HQ.pkl --dpath test_sets/CelebA-HQ/images --mpath test_sets/CelebA-HQ/masks --outdir samples
```
Note. Our implementation only supports generating an image whose size is a multiple of 512. You need to pad or resize the image to make its size a multiple of 512. Please pad the mask with 0 values.

Train

For example, if you want to train a model on Places, run a bash script with

python train.py \
    --outdir=output_path \
    --gpus=8 \
    --batch=32 \
    --metrics=fid36k5_full \
    --data=training_data_path \
    --data_val=val_data_path \
    --dataloader=datasets.dataset_512.ImageFolderMaskDataset \
    --mirror=True \
    --cond=False \
    --cfg=places512 \
    --aug=noaug \
    --generator=networks.mat.Generator \
    --discriminator=networks.mat.Discriminator \
    --loss=losses.loss.TwoStageLoss \
    --pr=0.1 \
    --pl=False \
    --truncation=0.5 \
    --style_mix=0.5 \
    --ema=10 \
    --lr=0.001

Description of arguments:

outdir: output path for saving logs and models
gpus: number of used gpus
batch: number of images in all gpus
metrics: find more metrics in 'metrics/metric_main.py'
data: training data
data_val: validation data
dataloader: you can define your own dataloader
mirror: use flip augmentation or not
cond: use class info, default: false
cfg: configuration, find more details in 'train.py'
aug: use augmentation of style-gan-ada or not, default: false
generator: you can define your own generator
discriminator: you can define your own discriminator
loss: you can define your own loss
pr: ratio of perceptual loss
pl: use path length regularization or not, default: false
truncation: truncation ratio proposed in stylegan
style_mix: style mixing ratio proposed in stylegan
ema: exponoential moving averate, ~K samples
lr: learning rate

Evaluation

We provide evaluation scrtips for FID/U-IDS/P-IDS/LPIPS/PSNR/SSIM/L1 metrics in the 'evaluation' directory. Only need to give paths of your results and GTs.

Citation

@inproceedings{li2022mat,
    title={MAT: Mask-Aware Transformer for Large Hole Image Inpainting},
    author={Li, Wenbo and Lin, Zhe and Zhou, Kun and Qi, Lu and Wang, Yi and Jia, Jiaya},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2022}
}

License and Acknowledgement

The code and models in this repo are for research purposes only. Our code is bulit upon StyleGAN2-ADA.

Comments

check_ddp_consistency error

Traceback (most recent call last): File "/home/dmsheng/anaconda3/envs/lama/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/dmsheng/demo/image_inpainting/MAT/my_train.py", line 405, in subprocess_fn my_training_loop.training_loop(rank=rank, **args) File "/home/dmsheng/demo/image_inpainting/MAT/training/my_training_loop.py", line 404, in training_loop misc.check_ddp_consistency(module, ignore_regex=[r'.*\.w_avg', r'.*\.relative_position_index', r'.*\.avg_weight', r'.*\.attn_mask', r'.*\.resample_filter']) File "/home/dmsheng/demo/image_inpainting/MAT/torch_utils/misc.py", line 195, in check_ddp_consistency assert (nan_to_num(tensor) == nan_to_num(other)).all(), fullname AssertionError: Generator.synthesis.first_stage.conv_first.conv.weight Thanks for your great work! I have no idea what 'check_ddp_consistency' function for. Any ideas to solve the problem?

opened by ImmortalSdm 8
Quick Test have some questions

`Setting up PyTorch plugin "bias_act_plugin"... Failed! ..\torch_utils\ops\bias_act.py:50: UserWarning: Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:

Traceback (most recent call last): File "..\torch_utils\ops\bias_act.py", line 48, in _init _plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math']) File "..\torch_utils\custom_ops.py", line 64, in get_plugin raise RuntimeError(f'Could not find MSVC/GCC/CLANG installation on this computer. Check _find_compiler_bindir() in "{file}".') RuntimeError: Could not find MSVC/GCC/CLANG installation on this computer. Check _find_compiler_bindir() in "..\torch_utils\custom_ops.py".` hello，I didn't download vs. what is the specific solution to this problem

opened by liuxingyu123 6
How can I use 512x512 pretrained model to inpaint 1024x1024 images?

Hi! Thanks for the great work. Tough I have some little questions, as you said in readme :

"Our implementation only supports generating an image whose size is a multiple of 512. You need to pad or resize the image to make its size a multiple of 512. Please pad the mask with 0 values."

So I use generate_image.py and set --resolution into 1024 , with 512x512 pretrained model you offered, but it seems not working out. The error is below:

File "MAT-main/networks/mat.py", line 20, in nf return NF[2 ** stage] KeyError: 1024

It seems that it lacks parameters for 1024x1024 resolution, how can I solve this? Or to say, how can I use 512x512 pretrained model to inpaint 1024x1024 images as you said?

opened by Lifedecoder 6
The test results were terrible.

Thank you so much for a great job. I had a few problems when I was testing. I made a few minor changes to the code, which worked fine in training, but in testing, it printed out a lot of intermediate parameters, probably executed the code(fig3), and the test result were terrible. I want to know do you have encountered this problem and how to solve it?

fig1is the test result and the fig2 is the snapshot of training. Thanks a lot!

opened by yumengWang112 5
How long for the training time?

Hi @fenglinglwb. Thank you so much for your awesome work. Could you tell me about the training time for CelebA-HQ 512 and Place2 512 datasets? Another question is why finally you set 1.8 M for training the place2 dataset, instead of others, e.g. 2.M 1.M 3.5M, etc?

Thanks!

opened by LonglongaaaGo 5
Issues with the network
Thanks for sharing your great work!

I'm researching image inpainting solutions and find MAT the best i've seen so far, though there are some issues I'd like to point:

Network only supports square 512px input image - while other solutions like Lama support any image aspect ratio

Face model doesn't give any output only Places model works.
opened by ofirkris 5
Test set for comparison

Hello, you reported results on CelebA-HQ at 256 × 256 size in Table F.3. what is your test set and how we can access it for comparison? How did you use Places (512 × 512) to train and test the model? @fenglinglwb

opened by givkashi 5
Why do use the same checkpoint to eval on the PID and UID twice or more, and the results are different?

Congratulations on CVPR2022 Best Paper Finalists!!

Why do use the same checkpoint to eval on the PID and UID twice or more, and the results are different?

opened by xinxinxing 4
Pretrained Model on Places2

Hi! Very impressed on your Great work. Currently, I am doing research on Inpainting models on Places2 dataset. I find there is only pretrained model on Places dataset. May I know if there is a pretrained model on Places2?

opened by lyumingzhi 3
an issue when running the metric

Hello How are you? Thanks for contributing to this project. I have 40K samples as training images without mask and 10K samples as validation images without mask. I am going to train my own model on this dataset. I have an issue at the step when running the metric.

It seems that's because the index is greater than the number of the validation samples. Could u guide me to fix this issue? Thanks

opened by rose-jinyang 2
Loss functions comprehension used in work
Hi! Thank you for the great work!

Currently, I am going to finetune your Places model on my own dataset and try to understand the loss functions that use in your solution.

Looking at the tensorboard dashboard, I saw a lot of charts that can be confusing for me. Could you explain loss functions which we can see in tensorboard dashboard (specifically Loss/D/loss, Loss/D/loss_s1, Loss/D/reg, Loss/D/reg_s1, Loss/G/l1_loss, Loss/G/loss, Loss/G/loss_s1, Loss/G/pcp_loss, Loss/r1_penalty, Loss/r1_penaly_s1, Loss/scores/fake, Loss/scores/fake_s1, Loss/scores/real, Loss/scores/real_s1, Loss/signs/fake, Loss/signs/fake_s1, Loss/signs/real, Loss/signs/real_s1) in simple words.

As I figure out, after some training progress, the process should be convergence. However, so far, I cannot understand what are we minimizing and what are we maximizing.

What simple interpretation of these loss functions?

What are the key points to pay attention to?

If say simply, which loss should fall and which should rise?

Particularly, how should loss/scores/fake and loss/real/fake behave themselves?

I will be grateful for your answer. Thanks in advance!
opened by VladyslavML2022 2
finetuning cfg_specs

Hi @fenglinglwb Great work! I have a dataset with about 500000 images and 1 GPU. I plan to train a 128x128 resolution first if good scale it up. Can I copy celeba512 directly and change ref_gpus to 1 or could you suggest the parameters for cfg_specs in train.py? 'celeba512': dict(ref_gpus=1, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), Thanks

opened by world2vec 4
questions with places512 trainning

HI @fenglinglwb ,thanks for your work, i use source code to train places2 datasets, but meet the self._load_raw_image out of bounds in datasets/dataset_512.py, but executing this file alone is no problem. thanks for your reply!

opened by YIFanH 8
mask_ratio

I want to know the mask_ratio of the masks you provided for evaluation. Do they have a percentage range?

the small_masks have a lot of large_ratio masks, the same as the larges_masks A partial display of large_masks you proxided

A partial display of lsmall_masks you provided

and i tried to generate masks by the mask_generator_512.py and mask_generator_512_small .py , I had the similar results

It's confusing to me. So i wonder do the large masks and small masks have a certain range? Sorry for my poor english ,looking for your reply

opened by yumengWang112 1

ImportError: No module named 'upfirdn2d_plugin'

Hello, When I run the code, I get the following warnings. I have followed all the steps according to the ReadMe file. Installing VS C++, windows SDK, and other techniques provided on GitHub were not helpful. I still get a result but the result is not good at all. I attach the input, mask, and the output.

Loading data from: C:\Users\User 2\Downloads\mat input
Loading mask from: C:\Users\User 2\Downloads\mat mask
Loading networks from: pretrained/Places_512_FullData.pkl
Prcessing: indoor1.png
Setting up PyTorch plugin "bias_act_plugin"... Failed!
C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.py:50: UserWarning: Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:

Traceback (most recent call last):
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1533, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Python\Python38\lib\subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.py", line 48, in _init
    _plugin = custom_ops.get_plugin('bias_act_plugin', sources=sources, extra_cuda_cflags=['--use_fast_math'])
  File "C:\Users\User 2\PycharmProjects\MAT\torch_utils\custom_ops.py", line 110, in get_plugin
    torch.utils.cpp_extension.load(name=module_name, verbose=verbose_build, sources=sources, **build_kwargs)
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 986, in load
    return _jit_compile(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1193, in _jit_compile
    _write_ninja_file_and_build_library(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1297, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "C:\fatemeh\project\environments\MAT\lib\site-packages\torch\utils\cpp_extension.py", line 1555, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'bias_act_plugin': [1/3] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --use_fast_math -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cu" -o bias_act.cuda.o 
FAILED: bias_act.cuda.o 
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --use_fast_math -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cu" -o bias_act.cuda.o 
nvcc fatal   : Unsupported gpu architecture 'compute_86'
[2/3] cl /showIncludes -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\torch\csrc\api\include -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\TH -IC:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\Python\Python38\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -c "C:\Users\User 2\PycharmProjects\MAT\torch_utils\ops\bias_act.cpp" /Fobias_act.o 
Microsoft (R) C/C++ Optimizing Compiler Version 19.33.31630 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

C:\fatemeh\project\environments\MAT\lib\site-packages\torch\include\pybind11\detail/common.h(106): warning C4005: 'HAVE_SNPRINTF': macro redefinition
C:\Users\Python\Python38\Include\pyerrors.h(315): note: see previous definition of 'HAVE_SNPRINTF'
ninja: build stopped: subcommand failed.


  warnings.warn('Failed to build CUDA kernels for bias_act. Falling back to slow reference implementation. Details:\n\n' + traceback.format_exc())

I'm usaing :

Python 3.7
Pytorch 1.7.1
Cuda 11.0
Windows 10

I wonder what is the role of upfirdn2d_plugin and how it affects the predictions.

opened by FBehrad 1

train.py tries to allocate too much RAM when running with multiple GPUs

I am now working on training the MAT network from scratch with a full Places dataset (512*512 resolution) and I'm now facing this error. Above error was produced by followinig command. python train.py --outdir=/home/work/outputs/MAT/baseline_places_full --gpus=2 --batch=8 --metrics=fid36k5_full --data=/home/work/Dataset/places/full/data_large --data_val=/home/work/Dataset/places/full/val_large --dataloader=datasets.dataset_512.ImageFolderMaskDataset --mirror=True --cond=False --cfg=places512 --aug=noaug --generator=networks.mat.Generator --discriminator=networks.mat.Discriminator --loss=losses.loss.TwoStageLoss --pr=0.1 --pl=False --truncation=0.5 --style_mix=0.5 --ema=10 --lr=0.001 However, this error does not occur when I only use one GPU with the same number of batch for each GPU. Above result was produced by the following command. python train.py --outdir=/home/work/outputs/MAT/baseline_places_full --gpus=1 --batch=4 --metrics=fid36k5_full --data=/home/work/Dataset/places/full/data_large --data_val=/home/work/Dataset/places/full/val_large --dataloader=datasets.dataset_512.ImageFolderMaskDataset --mirror=True --cond=False --cfg=places512 --aug=noaug --generator=networks.mat.Generator --discriminator=networks.mat.Discriminator --loss=losses.loss.TwoStageLoss --pr=0.1 --pl=False --truncation=0.5 --style_mix=0.5 --ema=10 --lr=0.001 I am now using 8 GPUs with 40GB of RAM for each.

opened by yesung31 0

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Related tags

Overview

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR2022, Oral)

Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia

News

Visualization

Usage

Quick Test

Train

Evaluation

Citation

License and Acknowledgement

Comments

Owner

My implementation of Image Inpainting - A deep learning Inpainting model

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

The Face Mask recognition system uses AI technology to detect the person with or without a mask.

Ray tracing of a Schwarzschild black hole written entirely in TensorFlow.

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

The pytorch implementation of the paper "text-guided neural image inpainting" at MM'2020

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Facial Image Inpainting with Semantic Control

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

Auto-Lama combines object detection and image inpainting to automate object removals