An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Zhendong Wang

Last update: Dec 22, 2022

Related tags

Deep Learning Uformer

Overview

Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu

Paper: https://arxiv.org/abs/2106.03106

Update:

2021.08.19 Release a pre-trained model(Uformer32)! Add a script for FLOP/GMAC calculation.
2021.07.29 Add a script for testing the pre-trained model on the arbitrary-resolution images.

In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feed-forward network to further improve its potential for capturing local context. The second key element is that we explore three skip-connection schemes to effectively deliver information from the encoder to the decoder. Powered by these two designs, Uformer enjoys a high capability for capturing useful dependencies for image restoration. Extensive experiments on several image restoration tasks demonstrate the superiority of Uformer, including image denoising, deraining, deblurring and demoireing. We expect that our work will encourage further research to explore Transformer-based architectures for low-level vision tasks.

Package dependencies

The project is built with PyTorch 1.7.1, Python3.7, CUDA10.1. For package dependencies, you can install them by:

pip3 install -r requirements.txt

Pretrained model

uformer32_denoising_sidd.pth [Google Drive]: PSNR 39.77 dB.

Data preparation

Denoising

For training data of SIDD, you can download the SIDD-Medium dataset from the official url. Then generate training patches for training by:

python3 generate_patches_SIDD.py --src_dir ../SIDD_Medium_Srgb/Data --tar_dir ../datasets/denoising/sidd/train

For evaluation, we use the same evaluation data as here, and put it into the dir ../datasets/denoising/sidd/val.

Training

Denoising

To train Uformer32(embed_dim=32) on SIDD, we use 2 V100 GPUs and run for 250 epochs:

python3 ./train.py --arch Uformer --batch_size 32 --gpu '0,1' \
    --train_ps 128 --train_dir ../datasets/denoising/sidd/train --env 32_0705_1 \
    --val_dir ../datasets/denoising/sidd/val --embed_dim 32 --warmup

More configuration can be founded in train.sh.

Evaluation

Denoising

To evaluate Uformer32 on SIDD, you can run:

python3 ./test.py --arch Uformer --batch_size 1 --gpu '0' \
    --input_dir ../datasets/denoising/sidd/val --result_dir YOUR_RESULT_DIR \
    --weights YOUR_PRETRAINED_MODEL_PATH --embed_dim 32

Computational Cost

We provide a simple script to calculate the flops by ourselves, a simple script has been added in model.py. You can change the configuration and run it via:

python3 model.py

The manual calculation of GMacs in this repo differs slightly from the main paper, but they do not influence the conclusion. We will correct the paper later.

Citation

If you find this project useful in your research, please consider citing:

@article{wang2021uformer,
	title={Uformer: A General U-Shaped Transformer for Image Restoration},
	author={Wang, Zhendong and Cun, Xiaodong and Bao, Jianmin and Liu, Jianzhuang},
	journal={arXiv preprint 2106.03106},
	year={2021}
}

Acknowledgement

This code borrows heavily from MIRNet and SwinTransformer.

Contact

Please contact us if there is any question or suggestion(Zhendong Wang [email protected], Xiaodong Cun [email protected]).

Comments

SIDD Benchmark Issue: I get a PSNR 39.49db rather than 39.89db

Thank you for the nice code! I use the Uformer-32 model and match the 39.77dB on the validation srgb dataset for SIDD. However, I get a result of 39.49dB from the SIDD server on the benchmark srgb dataset. Would you mind releasing the script you use to create your submission file from the benchmarking data?

opened by gauenk 13

TypeError: forward() takes 2 positional arguments but 3 were given

test_in_any_resolution.py has an error and It is the full stack of error message.

 `Traceback (most recent call last):
  File "/content//Uformer/test_in_any_resolution.py", line 104, in <module>
    rgb_restored = model_restoration(rgb_noisy, 1 - mask)
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

TypeError: forward() takes 2 positional arguments but 3 were given`

Actually, I was trying with my custom data but Its shape is the same as sidd data so I can't understand why.

opened by parkhy0106 8

How do you compute Flops of uformer16 and 32?

I used the same package as your code. (from ptflops import get_model_complexity_info)

And got 2.51 GMac, 5.15 M for Uformer16, 9.98 GMac 20.47 M for Uformer32.

It seems that I need to define some ops in this model. Can you provide a solution or relative code on computing Flops and Params?

Thanks!

opened by leonmakise 4
Which SIDD data do you use exactly?

Hi, you use SIDD-Medium sRGB data, right? but there are mirror 1 and mirror 2, so which mirror do you use? and what's the difference of mirror 1 and 2?

opened by JZPeterPan 3
using pertrained weight, but raise a RuntimeError

hello! thanks for your devotion. I train the Uformer using the SIDD on 2 V100 as you suggestion. i trained nearly 69 epoches and stoped it. i got a weight file. i valid it and it perform well. but when i want to fintune on 2 V100, i add --resume command line.

in train.py line 169: loss_scaler(loss, optimizer, parameters=model_restoration.parameters()) i raise a RuntimeError:Expeted all tensors to be on the same device, but found at least two devices, cude:0 and cpu

i dont know how to solve this matter, have you met this problem? thx!

opened by shiguangliuguo 3
how to test a image which resolution is not (256,256)?

hi there, thanks for your job for offerring a script "test_in_any_resolution.py" but, in this script, a image with random size has been processed in expand2square function, but such size cant feed into the Uformer model.

so i wonder if this network cant process the size isn't (256,256)? if i want to denoise the image with random size, i have to resize the size of image to (256,256)? thanks!

opened by shiguangliuguo 2
about epochs in training time

hi friend: Uformer is interesting. Paper reports that you train Uformer_16 for 250 epochs with batch size 32 to get 39.66 PNSR in SIDD. So how many iters in training phase？What is PSNR when 40 epochs are trained？ I just want to reproduce this result in a short time

opened by Rookielike 2
Typos? token_projection is not refered there.

Such as this line, token_embed will affect the choice of linear, linear_concat and conv. https://github.com/ZhendongWang6/Uformer/blob/f96302885bb1734857c6f09032f8ddde073b103a/utils/model_utils.py#L64

opened by leonmakise 1
why dataset is small but performence is sota?

hi, thanks for your meaningful work previous work about Transformer in vision hava a common opinion which transformer needs huge dataset to feed if you want its performence great

in this work, you just train the network in SIDD patches, which nearly about 9w patches , but other works train their Transformer in nealy 100w.

so, can you explain this reason? or can i say Transformer actually does not need too many data to feed?

opened by shiguangliuguo 1
Support arbitrary input resolution?

Hi your work is very inspiring!

I didn't find in your paper on how you apply Uformer during inference. For example, on SIDD, the training patches are 128x128, and evaluation patches are 256x256. Were you directly applying your network on the whole 256x256 patch, or in a sliding window form? In other words, does Uformer supports arbitrary input resolution?

opened by vztu 1
I would like to ask, have you encountered this kind of error? AttributeError: partially initialized module 'torch' has no attribute 'cuda' (most likely due to a circular import)

I would like to ask, have you encountered this kind of error? AttributeError: partially initialized module 'torch' has no attribute 'cuda' (most likely due to a circular import)

opened by Smile-QT 0
想请问下defocus deblurring result的问题

您好，在您的文章中table 3 展示你了文章在DPDD上实现了26.28dB PSNR的性能，想请问您是用的combined还是用的dual-pixel作为的输入，因为您的文章没写这个。还有就是为什么在restromer的文章 table 3中显示uformer在使用dual-pixel的情况下只能达到25.66dB PSNR的性能呢？

opened by TPZZZ 0
Asking for the code of SPAIR

Hi. I tried to understand SPAIR's algorithm through its source code, but I found that its code is currently not available. I saw that SPAIR was compared in your paper. If you have the source code, would you please share it with me? Looking forward to your reply.

opened by wdhudiekou 0
Log Files from Training

Hello,

Thank you for your awesome code!

I am hoping you might open-source the log files you have from training. Maybe the training and validation loss as a function of epoch (and/or batch) with an estimate of the runtime?

opened by gauenk 0
Why is the effect of naked training motiondeblur poor?

I according to the parameter settings in train_motiondeblur.sh, the datasets used GroPro, but the test effect is far worse than the effect of your opened model。 (https://mailustceducn-my.sharepoint.com/:u:/g/personal/zhendongwang_mail_ustc_edu_cn/EfCPoTSEKJRAshoE6EAC_3YB7oNkbLUX6AUgWSCwoJe0oA?e=jai90x) Is there a problem with the training parameter settings？ Or some other reason？ as showed follow，left is your model result， right is my model result。 Hope you reply。 thanks for lot !!! @vinthony @ZhendongWang6

opened by aiaini66 0

An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Related tags

Overview

Uformer: A General U-Shaped Transformer for Image Restoration

Update:

Package dependencies

Pretrained model

Data preparation

Denoising

Training

Denoising

Evaluation

Denoising

Computational Cost

Citation

Acknowledgement

Contact

Comments

Owner

Zhendong Wang

The repository offers the official implementation of our paper in PyTorch.

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

Official repository of the paper 'Essentials for Class Incremental Learning'

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official repository for the paper "Instance-Conditioned GAN"

The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.