Pytorch Implementation of Residual Vision Transformers(ResViT)

ICON Lab

Last update: Dec 8, 2022

Related tags

Deep Learning ResViT

Overview

ResViT

Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper:

Onat Dalmaz and Mahmut Yurt and Tolga Çukur ResViT: Residual vision transformers for multi-modal medical image synthesis. arXiv. 2021.

Dependencies

python>=3.6.9
torch>=1.7.1
torchvision>=0.8.2
visdom
dominate
cuda=>11.2

Installation

Clone this repo:

git clone https://github.com/icon-lab/ResViT
cd ResViT

Download pre-trained ViT models from Google

Pre-trained ViT models:

wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz &&
mkdir ../model/vit_checkpoint/imagenet21k &&
mv {MODEL_NAME}.npz ../model/vit_checkpoint/imagenet21k/R50-ViT-B_16.npz

Dataset

You should structure your aligned dataset in the following way:

/Datasets/BRATS/
  ├── T1_T2
  ├── T2_FLAIR
  .
  .
  ├── T1_FLAIR_T2

/Datasets/BRATS/T2__FLAIR/
  ├── train
  ├── val  
  ├── test

Note that for many-to-one tasks, source modalities should be in the Red and Green channels. (For 2 input modalities)

Pre-training of ART blocks without the presence of transformers

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_pre_trained --gpu_ids 0 --model resvit_many --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_pre_trained --gpu_ids 0 --model resvit_one --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

Fine tune ResViT

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_PD_IXI_pre_trained/latest_net_G.pth --lr 0.001

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_IXI_pre_trained/latest_net_G.pth --lr 0.001

Testing

For many-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 3 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

For one-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 1 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

Citation

You are encouraged to modify/distribute this code. However, please acknowledge this code and cite the paper appropriately.

@misc{dalmaz2021resvit,
      title={ResViT: Residual vision transformers for multi-modal medical image synthesis}, 
      author={Onat Dalmaz and Mahmut Yurt and Tolga Çukur},
      year={2021},
      eprint={2106.16031},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

For any questions, comments and contributions, please contact Onat Dalmaz (onat[at]ee.bilkent.edu.tr)

Acknowledgments

This code uses libraries from pGAN and pix2pix repository.

Comments

Hello，I have a question

Dear professor, Hello，I am a fresher of vit. When I run your code，I have some questions. For example, what's that mean and why this error? Mabe a stupid question from a student. --

opened by dontknow123456 5
A question about visdom in this code

After I run the test code, the visdom tool generate nothing. But as the picture shown there is nothing code bug about visdom as the second picture shown. BUT there is nothing in http://localhost:8097/, as the third picture shown.

opened by dontknow123456 3
hi professor，I have a question about data size

As the picture shown， #training images = 2086 #Validation images = 2086 In fact，my training images are not equal with Validation images. Could you please tell me that whether the train data and validation data are same in your code？

opened by dontknow123456 3
Hello,I have a question

In the first issue of icon-lab/Resvit，you gave this image I know that the left is source and the right is target. Could you please tell me which one is t1，t2 or flair ？

opened by dontknow123456 3
Datasets

Hello, I don't understand very well how your dataset is placed. Can you describe in detail？I can see aligned dataset, unaligned dataset, and single dataset here. But in aligned datset, are the data of different modalities placed in the same folder? such as t1 and t2 modalities, and whether the data format is in the form of jpg images.I hope you can describe the placement of the dataset images in detail, thank you

opened by 2805413893 2
Download pre-trained ViT models from Google Problem

The following command is wrong: wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz

Should be wget https://storage.googleapis.com/vit_models/imagenet21k/R50+ViT-B_16.npz

opened by lewisj34 1
OSError: Failed to interpret file './model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz' as a pickle

hi,

when i try to fine tune ResViT by your provided code, a pickle error is raised :

OSError: Failed to interpret file './model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz' as a pickle

how can I solve it?

opened by jiyoonshincml 1
datasets

Excuse me,i have some problem in structuring my aligned dataset. the form of dataset BRATS are as follow, the item is not the same as you said " T1_T2 ├── T2_FLAIR . . ├── T1_FLAIR_T2" so how can i deal with it. plz~

opened by D-YL 1
Plz explain on many to one case synthesis .How data should be organized

can you please explain on multi-modal synthesis data organization? How data should be arranged if T1,T2 ->Flair in Aligned data mode In train folder there are two subfolders A & B . How Do we keep T1&T2 images all in folder A or in folder A &B. What about target images(Flairs)? Secondly, you mentioned for multimodal to be in Green & red channel.. So do we need to convert them in green & red. Forexample : i have 2CT images at different time pts. i want to synthesize nd get target images as shown Red T1 Green T2 Target image or Stack Red&green together in folder A nd targets in folder B

opened by jamalgardezi 1
Data preprocessing

I have a lot of questions about data preprocessing as shown below， How to normalize brats and ixi image volume? Are the intensity values of each volume normalized into a range [-1,1] ? (Max-Min Normalization) Since the brats dataset contains images acquired under different clinical protocols and scanners, how the data normalization differs from IXI？

Could you share your python script for data preprocessing? Thanks！

opened by huaibovip 1
Datasets used in this paper

Hello Mr. Dalmaz, thank you for your nice work. To better reproduce your experiment, could you provide the standard datasets that have been processed for use in the paper ？

opened by ElegantLee 0
Pixel-wise consistency loss between acquired and reconstructed source modalities based on an L1 distance

Dear Mr. Dalmaz and rest of the team,

First, I would like to thank you for your work and for making it available to the public.

I am trying to use it for the task of sCT generation with my dataset. Unfortunately, I cannot find where in the code is the second term of the loss (as explained in your paper): the pixel-wise consistency loss or Lrec. Based on my understanding, this loss computes the L1 distance between the source image (MR in this case) and the MR image generated by the generator from the sCT?

Would you mind pointing where does that happen in the code? I can only locate the pixel-wise L1 loss between the CT and the sCT, and the adversarial loss.

Thanks in advance!

Best regards

opened by AgustinaLaGreca 0

Owner

ICON Lab

GitHub

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

262 Jan 3, 2023

PyTorch implementation of the Pose Residual Network (PRN)

Pose Residual Network This repository contains a PyTorch implementation of the Pose Residual Network (PRN) presented in our ECCV 2018 paper: Muhammed

289 Nov 28, 2022

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

193 Jan 3, 2023

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 1, 2023

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

127 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

1.2k Dec 26, 2022

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

2.1k Jan 1, 2023

Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

296 Dec 27, 2022

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

967 Jan 4, 2023

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

625 Dec 30, 2022

Explainability for Vision Transformers (in PyTorch)

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

442 Jan 4, 2023

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

4.2k Jan 3, 2023

This repository contains PyTorch code for Robust Vision Transformers.

117 Dec 7, 2022

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

72 Dec 13, 2022

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

6.6k Jan 6, 2023

Pytorch Implementation of Residual Vision Transformers(ResViT)

Related tags

Overview

ResViT

Dependencies

Installation

Download pre-trained ViT models from Google

Dataset

Pre-training of ART blocks without the presence of transformers

Fine tune ResViT

Testing

Citation

Acknowledgments

Comments

Owner

ICON Lab

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

PyTorch implementation of the Pose Residual Network (PRN)

PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

Wide Residual Networks (WideResNets) in PyTorch

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

Explainability for Vision Transformers (in PyTorch)

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

This repository contains PyTorch code for Robust Vision Transformers.

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

A PyTorch library for Vision Transformers

Implementation of various Vision Transformers I found interesting

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.