Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Overview

Revisiting RCAN: Improved Training for Image Super-Resolution

Introduction

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight. However, most SR models were optimized with dated training strategies. In this work, we revisit the popular RCAN model and examine the effect of different training options in SR. Surprisingly (or perhaps as expected), we show that RCAN can outperform or match nearly all the CNN-based SR architectures published after RCAN on standard benchmarks with a proper training strategy and minimal architecture change. Besides, although RCAN is a very large SR architecture with more than four hundred convolutional layers, we draw a notable conclusion that underfitting is still the main problem restricting the model capability instead of overfitting. We observe supportive evidence that increasing training iterations clearly improves the model performance while applying regularization techniques generally degrades the predictions. We denote our simply revised RCAN as RCAN-it and recommend practitioners to use it as baselines for future research. Please check our pre-print for more information.

Environment Setup

Create a new conda environment and install PyTorch:

conda create -n ptsr python=3.8 numpy
conda activate ptsr
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

Install the required packages:

git clone https://github.com/zudi-lin/rcan-it.git
cd rcan-it
pip install --editable .

Our package is called ptsr, abbreviating A PyTorch Framework for Image Super-Resolution. Then run tests to validate the installation:

python -m unittest discover -b tests

Multi-processing Distributed Data Parallel Training

For different hardware conditions, please first update the config files accordingly. Even for single-node single-GPU training, we use distributed data parallel (DDP) for consistency.

Single Node

Single GPU training:

CUDA_VISIBLE_DEVICES=0 python -u -m torch.distributed.run --nproc_per_node=1 \
--master_port=9988 main.py --distributed --config-base configs/RCAN/RCAN_Improved.yaml \
--config-file configs/RCAN/RCAN_x2.yaml

Single node with multiple (e.g., 4) GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -u -m torch.distributed.run --nproc_per_node=4 \
--master_port=9977 main.py --distributed --config-base configs/RCAN/RCAN_Improved.yaml \
--config-file configs/RCAN/RCAN_x2.yaml

By default the configuration file, model checkpoints and validation curve will be saved under outputs/, which is added to .gitignore and will be untracked by Git.

Multiple Nodes

After activating the virtual environment with PyTorch>=1.9.0, run hostname -I | awk '{print $1}' to get the ip address of the master node. Suppose the master ip address is 10.31.133.85, and we want to train the model on two nodes with multiple GPUs, then the commands are:

Node 0 (master node):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 \ 
--node_rank=0 --master_addr="10.31.133.85" --master_port=9922 main.py --distributed \
--config-base configs/RCAN/RCAN_Improved.yaml --config-file configs/RCAN/RCAN_x2.yaml

Node 1:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 \ 
--node_rank=1 --master_addr="10.31.133.85" --master_port=9922 main.py --distributed \
--config-base configs/RCAN/RCAN_Improved.yaml --config-file configs/RCAN/RCAN_x2.yaml

Description of the options:

  • --nproc_per_node: number of processes on each node. Set this to the number of GPUs on the node to maximize the training efficiency.
  • --nnodes: total number of nodes for training.
  • --node_rank: rank of the current node within all nodes.
  • --master_addr: the ip address of the master (rank 0) node.
  • --master_port: a free port to communicate with the master node.
  • --distributed: multi-processing Distributed Data Parallel (DDP) training.
  • --local_world_size: number of GPUs on the current node.

For a system with Slurm Workload Manager, please load required modules: module load cuda cudnn.

Data Parallel Training

Data Parallel training only works on single node with one or multiple GPUs. Different from the DDP scheme, it will create only one process. Single GPU training:

CUDA_VISIBLE_DEVICES=0 python main.py --config-base configs/RCAN/RCAN_Base.yaml \
--config-file configs/RCAN/RCAN_x2.yaml

Single node with multiple (e.g., 4) GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config-base configs/RCAN/RCAN_Base.yaml \
--config-file configs/RCAN/RCAN_x2.yaml

Citation

Please check this pre-print for details. If you find this work useful for your research, please cite:

@misc{lin2022revisiting,
      title={Revisiting RCAN: Improved Training for Image Super-Resolution}, 
      author={Zudi Lin and Prateek Garg and Atmadeep Banerjee and Salma Abdel Magid and Deqing Sun and Yulun Zhang and Luc Van Gool and Donglai Wei and Hanspeter Pfister},
      year={2022},
      eprint={2201.11279},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Very interesting work! But there is a small question

    Very interesting work! But there is a small question

    WX20220213-163454@2x

    It seems that the Baseline is obtained w/o self-ensemble. However, the PSNR obtained with "Larger Patches" seems to be consistent with the RCAN-it+ (a self-ensemble version).

    WX20220213-164957@2x

    Is this a reasonable comparison? Or is there something wrong with my understanding, looking forward to your reply : )

    opened by blackcow 2
  • Pre-trained model weights release?

    Pre-trained model weights release?

    Hi! Thanks for an interesting work. Do you plan to release pre-trained model weights so that future researches could benchmark image quality independently?

    opened by zakajd 1
  • Fixed a bug when saving the best model

    Fixed a bug when saving the best model

    The bug saves the latest model as the best model during training.

    For example, when the code runs to line 194,

    iteration 1000: PSNR: 30 best[0][idx_data, idx_scale]: 30 self.best_val_score: 0 is_best = True (30 >=0)

    iteration 2000: PSNR: 29 (poor) best[0][idx_data, idx_scale]: 30 self.best_val_score: 30 is_best = True (30 >= 30. BUG appears, the code will save the model with PSNR=29 as the best model.)

    opened by jinpeng-s 0
  • Some issues on MyOwn data

    Some issues on MyOwn data

    Hi, I have collected about 20000 images. And the HR-resolution is around 600*600. When the training starts, everything looks fine. After about 10000 iterations, the loss become larger and larger. Finally, the loss become nan. Any suggestions on this kind of issue?

    Best regards, Thank you.

    opened by iamweiweishi 1
  • About Download the pretrained model

    About Download the pretrained model

    Hi,I have some trouble about downloading the pretrained model from google drive, because we can't search it unless use vpn. Can u place the pretrained model file in BaiDu driver or aliYun driver? Thanks for a lot!!

    opened by Young-Zzz 0
  • New Super-Resolution Benchmarks

    New Super-Resolution Benchmarks

    Hello,

    MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

    If you are interested in participating, you can add your algorithm following the submission steps:

    We would be grateful for your feedback on our work!

    opened by EvgeneyBogatyrev 0
  • Questions regarding the quant. results (Table 3-5)

    Questions regarding the quant. results (Table 3-5)

    Dear authors,

    This work shines a light on many interesting things from the right training strategy to the regularizations that hinder network performance. However, it would be wonderful if you find time to address the following questions:

    • How many times experiments were run for Tables 3-5? Can you provide statistics of experimental results (e.g. mean, variance)?
    • What was the reason for using self-ensemble?
    • Do you have any plans to extend the results to other image restoration methods (e.g. deblur, denoise)?

    Thank you!

    opened by Magauiya 0
  • > DATASET.DATA_EXT =

    > DATASET.DATA_EXT = "bin"

    DATASET.DATA_EXT = "bin"

    Not able to create the binary file. I have kept 18 HR images in 'rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_HR/' folder and their corresponding LR images in 'rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_LR_bicubic/X2' folder. But the error is shown below: `Total number of parameters: 15444667 Using Dataset(s): DF2K for training

    /content/drive/MyDrive/rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/bin/train_bin_HR.pt does not exist. Now making binary... Bin pt file with name and image Traceback (most recent call last): File "main.py", line 135, in main() File "main.py", line 98, in main loader = Data(cfg) File "/content/drive/MyDrive/rcan-it/ptsr/data/init.py", line 35, in init datasets.append(getattr(m, module_name)(cfg, name=d)) File "/content/drive/MyDrive/rcan-it/ptsr/data/df2k.py", line 17, in init cfg, name=name, train=train, benchmark=benchmark File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 39, in init cfg.DATASET.DATA_EXT, list_hr, self._name_hrbin() File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 148, in _check_and_load } for _l in l] File "/content/drive/MyDrive/rcan-it/ptsr/data/srdata.py", line 148, in } for _l in l] File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 221, in imread reader = read(uri, format, "i", **kwargs) File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 130, in get_reader request = Request(uri, "r" + mode, **kwargs) File "/usr/local/lib/python3.7/dist-packages/imageio/core/request.py", line 125, in init self._parse_uri(uri) File "/usr/local/lib/python3.7/dist-packages/imageio/core/request.py", line 273, in _parse_uri raise FileNotFoundError("No such file: '%s'" % fn) FileNotFoundError: No such file: '/content/drive/MyDrive/rcan-it/ptsr/datasets/SR/BIX2X3X4/DF2K/DF2K_train_HR/0019.png' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1678) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 723, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 719, in main run(args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:`

    Please help me to solve this issue @zudi-lin.

    Originally posted by @TriB90 in https://github.com/zudi-lin/rcan-it/issues/3#issuecomment-1086562126

    opened by Techzist16 1
Owner
Zudi Lin
CS Ph.D. student at Harvard
Zudi Lin
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 2, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Fast and Context-Aware Framework for Space-Time Video Super-Resolution Preparation Dependencies PyTorch 1.2.0 CUDA 10.0 DCNv2 cd model/DCNv2 bash make

Xueheng Zhang 1 Mar 29, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
A framework for joint super-resolution and image synthesis, without requiring real training data

SynthSR This repository contains code to train a Convolutional Neural Network (CNN) for Super-resolution (SR), or joint SR and data synthesis. The met

null 83 Jan 1, 2023
Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

SMSR Reposity for "Exploring Sparsity in Image Super-Resolution for Efficient Inference" [arXiv] Highlights Locate and skip redundant computation in S

Longguang Wang 225 Dec 26, 2022
MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

MASA-SR Official PyTorch implementation of our CVPR2021 paper MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Re

DV Lab 126 Dec 20, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

GCResNet PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021. The code will

null 11 May 19, 2022
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

null 143 Dec 28, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

SRDenseNet-pytorch Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch (http://openaccess.thecvf.com/content_ICC

wxy 114 Nov 26, 2022
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
Practical Single-Image Super-Resolution Using Look-Up Table

Practical Single-Image Super-Resolution Using Look-Up Table [Paper] Dependency Python 3.6 PyTorch glob numpy pillow tqdm tensorboardx 1. Training deep

Younghyun Jo 116 Dec 23, 2022
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Jan 1, 2023
pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

Jiu XU 436 Jan 9, 2023
Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

Image Super-Resolution via Iterative Refinement Paper | Project Brief This is a unoffical implementation about Image Super-Resolution via Iterative Re

LiangWei Jiang 2.5k Jan 2, 2023