A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators
The authors are hidden for the purpose of double blind in the process of review.
Main idea
Introduce attention U-net into the field of blind real world image super resolution. We aims to provide a super resolution method with sharper result and less distortion.
Sharper:
Less distortion:
Network Architecture
The overall architecture of the A-ESRGAN, where the generator is adopted from ESRGAN:
The architecture of a single attention U-net discriminator:
The attention block is modified from 3D attention U-net's attention gate:
Attention Map
We argue it is the attention map that plays the main role in improving the quality of super resolution images. To support our idea, we visualize how the attention coefficients changes in time and space.
We argue that during the training process the attention will gradually focus on regions where color changes abruptly, i.e. edges. And attention layer in different depth will give us edges of different granularity.
Attention coefficients changes across time.
Attention coefficients changes across space.
Multi Scale
Multi scale discriminator has to learn whether parts of the image is clear enough from different receptive fields. From this perspective, different discriminator can learn complementary knowledge. From the figure below, normal discriminator learn to focus on edges, while down-sampled discriminator learn patch-like patterns such as textures.
Thus, comparing with the single attention u-net discriminator, multi-scale u-net discriminator can generate more realistic and detailed images.
Better Texture:
Test Sets
The datasets for test in our A-ESRGAN model are the standard benchmark datasets Set5, Set14, BSD100, Sun-Hays80, Urban100. Noted that we directly apply 4X super resolution to the original real world images and use NIQE to test the perceptual quality of the result. As shown in the figure below, these 5 datasets have covered a large variety of images.
A combined dataset can be find in DatasetsForSR.zip.
We compare with ESRGAN, RealSR, BSRGAN, RealESRGAN on the above 5 datasets and use NIQE as our metrics. The result can be seen in the table below:
Note a lower NIQE score shows a better perceptual quality.
Quick Use
Inference Script
! We now only provides 4X super resolution now.
Download pre-trained models: A-ESRGAN-Single.pth to the experiments/pretrained_models
.
wget https://github.com/aergan/A-ESRGAN/releases/download/v1.0.0/A_ESRGAN_Single.pth
Inference:
python inference_aesrgan.py --model_path=experiments/pretrained_models/A_ESRGAN_Single.pth --input=inputs
Results are in the results
folder
NIQE Script
The NIQE Script is used to give the Mean NIQE score of a certain directory of images.
Cacluate NIQE score:
cd NIQE_Script
python niqe.py --path=../results
Visualization Script
The Visualization Script is used to visualize the attention coefficient of each attention layer in the attention based U-net discriminator. It has two scripts. One script discriminator_attention_visual(Single).py
is used to visualize how the attention of each layer is updated during the training process on a certain image. Another Script combine.py
is used to combine the heat map together with original image.
Generate heat maps:
First download single.zip and unzip to experiments/pretrained_models/single
cd Visualization_Script
python discriminator_attention_visual(Single).py --img_path=../inputs/img_015_SRF_4_HR.png
The heat maps will be contained in Visualization_Script/Visual
If you want to see how the heat map looks when combining with the original image, run:
python combine.py --img_path=../inputs/img_015_SRF_4_HR.png
The combined images will be contained in Visualization_Script/Combined
! Multi-scale discriminator attention map visualization:
Download multi.zip and unzip to experiments/pretrained_models/multi
Run discriminator_attention_visual(Mulit).py
similar to discriminator_attention_visual(Single).py
.
!See what the multi-scale discriminator output
Run Multi_discriminator_Output.py
and you could see the visualization of pixel-wise loss from the discriminators.
! Note we haven't provided a combined script for multi attention map yet.
Model_Zoo
The following models are the generators, used in the A-ESRGAN
- A_ESRGAN_Multi.pth: X4 model trained with multi scale U-net based discriminators.
- A_ESRGAN_Single.pth: X4 model trained with a single U-net based discriminators.
- RealESRNet_x4plus.pth: official Real-ESRNet model (X4), where A-ESRGAN is fine-tuned on.
The following models are discriminators, which are usually used for fine-tuning.
The following models are the checkpoints of discriminators during A-ESRGAN training process, which are provided for visualization attention.
Training and Finetuning on your own dataset
We follow the same setting as RealESRGAN, and a detailed guide can be found in Training.md.
Acknowledgement
Our implementation of A-ESRGAN is based on the BasicSR and Real-ESRGAN.