An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Overview

Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu

PWC PWC

Paper: https://arxiv.org/abs/2106.03106

Update:

  • 2021.08.19 Release a pre-trained model(Uformer32)! Add a script for FLOP/GMAC calculation.
  • 2021.07.29 Add a script for testing the pre-trained model on the arbitrary-resolution images.

In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feed-forward network to further improve its potential for capturing local context. The second key element is that we explore three skip-connection schemes to effectively deliver information from the encoder to the decoder. Powered by these two designs, Uformer enjoys a high capability for capturing useful dependencies for image restoration. Extensive experiments on several image restoration tasks demonstrate the superiority of Uformer, including image denoising, deraining, deblurring and demoireing. We expect that our work will encourage further research to explore Transformer-based architectures for low-level vision tasks.

Uformer

Details

Package dependencies

The project is built with PyTorch 1.7.1, Python3.7, CUDA10.1. For package dependencies, you can install them by:

pip3 install -r requirements.txt

Pretrained model

Data preparation

Denoising

For training data of SIDD, you can download the SIDD-Medium dataset from the official url. Then generate training patches for training by:

python3 generate_patches_SIDD.py --src_dir ../SIDD_Medium_Srgb/Data --tar_dir ../datasets/denoising/sidd/train

For evaluation, we use the same evaluation data as here, and put it into the dir ../datasets/denoising/sidd/val.

Training

Denoising

To train Uformer32(embed_dim=32) on SIDD, we use 2 V100 GPUs and run for 250 epochs:

python3 ./train.py --arch Uformer --batch_size 32 --gpu '0,1' \
    --train_ps 128 --train_dir ../datasets/denoising/sidd/train --env 32_0705_1 \
    --val_dir ../datasets/denoising/sidd/val --embed_dim 32 --warmup

More configuration can be founded in train.sh.

Evaluation

Denoising

To evaluate Uformer32 on SIDD, you can run:

python3 ./test.py --arch Uformer --batch_size 1 --gpu '0' \
    --input_dir ../datasets/denoising/sidd/val --result_dir YOUR_RESULT_DIR \
    --weights YOUR_PRETRAINED_MODEL_PATH --embed_dim 32 

Computational Cost

We provide a simple script to calculate the flops by ourselves, a simple script has been added in model.py. You can change the configuration and run it via:

python3 model.py

The manual calculation of GMacs in this repo differs slightly from the main paper, but they do not influence the conclusion. We will correct the paper later.

Citation

If you find this project useful in your research, please consider citing:

@article{wang2021uformer,
	title={Uformer: A General U-Shaped Transformer for Image Restoration},
	author={Wang, Zhendong and Cun, Xiaodong and Bao, Jianmin and Liu, Jianzhuang},
	journal={arXiv preprint 2106.03106},
	year={2021}
}

Acknowledgement

This code borrows heavily from MIRNet and SwinTransformer.

Contact

Please contact us if there is any question or suggestion(Zhendong Wang [email protected], Xiaodong Cun [email protected]).

Comments
  • SIDD Benchmark Issue: I get a PSNR 39.49db rather than 39.89db

    SIDD Benchmark Issue: I get a PSNR 39.49db rather than 39.89db

    Thank you for the nice code! I use the Uformer-32 model and match the 39.77dB on the validation srgb dataset for SIDD. However, I get a result of 39.49dB from the SIDD server on the benchmark srgb dataset. Would you mind releasing the script you use to create your submission file from the benchmarking data?

    opened by gauenk 13
  • TypeError: forward() takes 2 positional arguments but 3 were given

    TypeError: forward() takes 2 positional arguments but 3 were given

    test_in_any_resolution.py has an error and It is the full stack of error message.

     `Traceback (most recent call last):
      File "/content//Uformer/test_in_any_resolution.py", line 104, in <module>
        rgb_restored = model_restoration(rgb_noisy, 1 - mask)
      
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
        return self.module(*inputs[0], **kwargs[0])
      
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
    
    TypeError: forward() takes 2 positional arguments but 3 were given`
    

    Actually, I was trying with my custom data but Its shape is the same as sidd data so I can't understand why.

    opened by parkhy0106 8
  • How do you compute Flops of uformer16 and 32?

    How do you compute Flops of uformer16 and 32?

    I used the same package as your code. (from ptflops import get_model_complexity_info)

    And got 2.51 GMac, 5.15 M for Uformer16, 9.98 GMac 20.47 M for Uformer32.

    It seems that I need to define some ops in this model. Can you provide a solution or relative code on computing Flops and Params?

    Thanks!

    opened by leonmakise 4
  • Which SIDD data do you use exactly?

    Which SIDD data do you use exactly?

    Hi, you use SIDD-Medium sRGB data, right? but there are mirror 1 and mirror 2, so which mirror do you use? and what's the difference of mirror 1 and 2?

    opened by JZPeterPan 3
  • using pertrained weight, but raise a RuntimeError

    using pertrained weight, but raise a RuntimeError

    hello! thanks for your devotion. I train the Uformer using the SIDD on 2 V100 as you suggestion. i trained nearly 69 epoches and stoped it. i got a weight file. i valid it and it perform well. but when i want to fintune on 2 V100, i add --resume command line.

    in train.py line 169: loss_scaler(loss, optimizer, parameters=model_restoration.parameters()) i raise a RuntimeError:Expeted all tensors to be on the same device, but found at least two devices, cude:0 and cpu

    i dont know how to solve this matter, have you met this problem? thx!

    opened by shiguangliuguo 3
  • how to test a image which resolution is not (256,256)?

    how to test a image which resolution is not (256,256)?

    hi there, thanks for your job for offerring a script "test_in_any_resolution.py" but, in this script, a image with random size has been processed in expand2square function, but such size cant feed into the Uformer model.

    so i wonder if this network cant process the size isn't (256,256)? if i want to denoise the image with random size, i have to resize the size of image to (256,256)? thanks!

    opened by shiguangliuguo 2
  • about epochs in training time

    about epochs in training time

    hi friend: Uformer is interesting. Paper reports that you train Uformer_16 for 250 epochs with batch size 32 to get 39.66 PNSR in SIDD. So how many iters in training phase?What is PSNR when 40 epochs are trained? I just want to reproduce this result in a short time

    opened by Rookielike 2
  • Typos? token_projection is not refered there.

    Typos? token_projection is not refered there.

    Such as this line, token_embed will affect the choice of linear, linear_concat and conv. https://github.com/ZhendongWang6/Uformer/blob/f96302885bb1734857c6f09032f8ddde073b103a/utils/model_utils.py#L64

    opened by leonmakise 1
  • why dataset is small but performence is sota?

    why dataset is small but performence is sota?

    hi, thanks for your meaningful work previous work about Transformer in vision hava a common opinion which transformer needs huge dataset to feed if you want its performence great

    in this work, you just train the network in SIDD patches, which nearly about 9w patches , but other works train their Transformer in nealy 100w.

    so, can you explain this reason? or can i say Transformer actually does not need too many data to feed?

    opened by shiguangliuguo 1
  • Support arbitrary input resolution?

    Support arbitrary input resolution?

    Hi your work is very inspiring!

    I didn't find in your paper on how you apply Uformer during inference. For example, on SIDD, the training patches are 128x128, and evaluation patches are 256x256. Were you directly applying your network on the whole 256x256 patch, or in a sliding window form? In other words, does Uformer supports arbitrary input resolution?

    opened by vztu 1
  • I would like to ask, have you encountered this kind of error? AttributeError: partially initialized module 'torch' has no attribute 'cuda' (most likely due to a circular import)

    I would like to ask, have you encountered this kind of error? AttributeError: partially initialized module 'torch' has no attribute 'cuda' (most likely due to a circular import)

    I would like to ask, have you encountered this kind of error? AttributeError: partially initialized module 'torch' has no attribute 'cuda' (most likely due to a circular import)

    opened by Smile-QT 0
  • 想请问下defocus deblurring result的问题

    想请问下defocus deblurring result的问题

    您好,在您的文章中table 3 展示你了文章在DPDD上实现了26.28dB PSNR的性能,想请问您是用的combined还是用的dual-pixel作为的输入,因为您的文章没写这个。 还有就是为什么在restromer的文章 table 3中显示uformer在使用dual-pixel的情况下只能达到25.66dB PSNR的性能呢?

    opened by TPZZZ 0
  • Asking for the code of SPAIR

    Asking for the code of SPAIR

    Hi. I tried to understand SPAIR's algorithm through its source code, but I found that its code is currently not available. I saw that SPAIR was compared in your paper. If you have the source code, would you please share it with me? Looking forward to your reply.

    opened by wdhudiekou 0
  • Log Files from Training

    Log Files from Training

    Hello,

    Thank you for your awesome code!

    I am hoping you might open-source the log files you have from training. Maybe the training and validation loss as a function of epoch (and/or batch) with an estimate of the runtime?

    opened by gauenk 0
  • Why is the effect of naked training motiondeblur poor?

    Why is the effect of naked training motiondeblur poor?

    I according to the parameter settings in train_motiondeblur.sh, the datasets used GroPro, but the test effect is far worse than the effect of your opened model。 (https://mailustceducn-my.sharepoint.com/:u:/g/personal/zhendongwang_mail_ustc_edu_cn/EfCPoTSEKJRAshoE6EAC_3YB7oNkbLUX6AUgWSCwoJe0oA?e=jai90x) Is there a problem with the training parameter settings? Or some other reason? as showed follow,left is your model result, right is my model result。 Hope you reply。 thanks for lot !!! @vinthony @ZhendongWang6 截屏2022-09-05 下午1 51 01

    opened by aiaini66 0
Owner
Zhendong Wang
Deep learning, Computer Vision, Low-level Vision, Image Generation.
Zhendong Wang
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 1, 2022
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

Imanol Schlag 18 Oct 12, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

Lea Müller 68 Dec 6, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

Lea Müller 83 Dec 14, 2022
Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape

IDSIA 36 Nov 15, 2022
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 8, 2022
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

null 38 Dec 27, 2022
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

null 137 Dec 15, 2022
Official repository of the paper 'Essentials for Class Incremental Learning'

Essentials for Class Incremental Learning Official repository of the paper 'Essentials for Class Incremental Learning' This Pytorch repository contain

null 33 Nov 27, 2022
This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

MOTR: End-to-End Multiple-Object Tracking with TRansformer This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object

null 348 Jan 7, 2023
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

StableNet StableNet is a deep stable learning method for out-of-distribution generalization. This is the official repo for CVPR21 paper "Deep Stable L

null 120 Dec 28, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 7, 2023
Official repository for the paper "Instance-Conditioned GAN"

Official repository for the paper "Instance-Conditioned GAN" by Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano.

Facebook Research 510 Dec 30, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Official repository of the paper (UAI 2021) "A Variational Approximation for Analyzing the Dynamics of Panel Data", Mixed Effect Neural ODE. Panel dat

Jurijs Nazarovs 7 Nov 26, 2022