Pytorch Implementation of Residual Vision Transformers(ResViT)

Related tags

Deep Learning ResViT
Overview

ResViT

Official Pytorch Implementation of Residual Vision Transformers(ResViT) which is described in the following paper:

Onat Dalmaz and Mahmut Yurt and Tolga Çukur ResViT: Residual vision transformers for multi-modal medical image synthesis. arXiv. 2021.

Dependencies

python>=3.6.9
torch>=1.7.1
torchvision>=0.8.2
visdom
dominate
cuda=>11.2

Installation

  • Clone this repo:
git clone https://github.com/icon-lab/ResViT
cd ResViT

Download pre-trained ViT models from Google

wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz &&
mkdir ../model/vit_checkpoint/imagenet21k &&
mv {MODEL_NAME}.npz ../model/vit_checkpoint/imagenet21k/R50-ViT-B_16.npz

Dataset

You should structure your aligned dataset in the following way:

/Datasets/BRATS/
  ├── T1_T2
  ├── T2_FLAIR
  .
  .
  ├── T1_FLAIR_T2   
/Datasets/BRATS/T2__FLAIR/
  ├── train
  ├── val  
  ├── test   

Note that for many-to-one tasks, source modalities should be in the Red and Green channels. (For 2 input modalities)

Pre-training of ART blocks without the presence of transformers

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_pre_trained --gpu_ids 0 --model resvit_many --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_pre_trained --gpu_ids 0 --model resvit_one --which_model_netG res_cnn --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 50 --niter_decay 50 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0

Fine tune ResViT

For many-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 3 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_PD_IXI_pre_trained/latest_net_G.pth --lr 0.001

For one-to-one tasks:
python3 train.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --which_direction AtoB --lambda_A 100 --dataset_mode aligned --norm batch --pool_size 0 --output_nc 1 --input_nc 1 --loadSize 256 --fineSize 256 --niter 25 --niter_decay 25 --save_epoch_freq 5 --checkpoints_dir checkpoints/ --display_id 0 --pre_trained_transformer 1 --pre_trained_resnet 1 --pre_trained_path checkpoints/T1_T2_IXI_pre_trained/latest_net_G.pth --lr 0.001

Testing

For many-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2__PD/ --name T1_T2_PD_IXI_resvit --gpu_ids 0 --model resvit_many --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 3 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

For one-to-one tasks:
python3 test.py --dataroot Datasets/IXI/T1_T2/ --name T1_T2_IXI_resvit --gpu_ids 0 --model resvit_one --which_model_netG resvit --dataset_mode aligned --norm batch --phase test --output_nc 1 --input_nc 1 --how_many 10000 --serial_batches --fineSize 256 --loadSize 256 --results_dir results/ --checkpoints_dir checkpoints/ --which_epoch latest

Citation

You are encouraged to modify/distribute this code. However, please acknowledge this code and cite the paper appropriately.

@misc{dalmaz2021resvit,
      title={ResViT: Residual vision transformers for multi-modal medical image synthesis}, 
      author={Onat Dalmaz and Mahmut Yurt and Tolga Çukur},
      year={2021},
      eprint={2106.16031},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

For any questions, comments and contributions, please contact Onat Dalmaz (onat[at]ee.bilkent.edu.tr)

(c) ICON Lab 2021

Acknowledgments

This code uses libraries from pGAN and pix2pix repository.

Comments
  • Hello,I have a question

    Hello,I have a question

    Dear professor, Hello,I am a fresher of vit. When I run your code,I have some questions. For example, what's that mean and why this error? Mabe a stupid question from a student. -- image

    opened by dontknow123456 5
  • A question about visdom in this code

    A question about visdom in this code

    After I run the test code, the visdom tool generate nothing. But as the picture shown 屏幕截图 2022-04-08 111617 屏幕截图 2022-04-08 111750 image there is nothing code bug about visdom as the second picture shown. BUT there is nothing in http://localhost:8097/, as the third picture shown.

    opened by dontknow123456 3
  • hi professor,I have a question about data size

    hi professor,I have a question about data size

    As the picture shown, Uploading image.png… #training images = 2086 #Validation images = 2086 In fact,my training images are not equal with Validation images. Could you please tell me that whether the train data and validation data are same in your code?

    opened by dontknow123456 3
  • Hello,I have a question

    Hello,I have a question

    In the first issue of icon-lab/Resvit,you gave this image image I know that the left is source and the right is target. Could you please tell me which one is t1,t2 or flair ?

    opened by dontknow123456 3
  • Datasets

    Datasets

    Hello, I don't understand very well how your dataset is placed. Can you describe in detail?I can see aligned dataset, unaligned dataset, and single dataset here. But in aligned datset, are the data of different modalities placed in the same folder? such as t1 and t2 modalities, and whether the data format is in the form of jpg images.I hope you can describe the placement of the dataset images in detail, thank you

    opened by 2805413893 2
  • Download pre-trained ViT models from Google Problem

    Download pre-trained ViT models from Google Problem

    The following command is wrong: wget https://storage.googleapis.com/vit_models/imagenet21k/R50-ViT-B_16.npz

    Should be wget https://storage.googleapis.com/vit_models/imagenet21k/R50+ViT-B_16.npz

    opened by lewisj34 1
  • OSError: Failed to interpret file './model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz' as a pickle

    OSError: Failed to interpret file './model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz' as a pickle

    hi,

    when i try to fine tune ResViT by your provided code, a pickle error is raised :

    OSError: Failed to interpret file './model/vit_checkpoint/imagenet21k/R50+ViT-B_16.npz' as a pickle

    how can I solve it?

    opened by jiyoonshincml 1
  • datasets

    datasets

    Excuse me,i have some problem in structuring my aligned dataset. the form of dataset BRATS are as follow, the item is not the same as you said " T1_T2 ├── T2_FLAIR . . ├── T1_FLAIR_T2" so how can i deal with it. plz~ image

    opened by D-YL 1
  • Plz explain on many to one case synthesis .How data should be organized

    Plz explain on many to one case synthesis .How data should be organized

    can you please explain on multi-modal synthesis data organization? How data should be arranged if T1,T2 ->Flair in Aligned data mode In train folder there are two subfolders A & B . How Do we keep T1&T2 images all in folder A or in folder A &B. What about target images(Flairs)? Secondly, you mentioned for multimodal to be in Green & red channel.. So do we need to convert them in green & red. Forexample : i have 2CT images at different time pts. i want to synthesize nd get target images as shown Red T1 valA0006 Green T2 valA0017 Target image trainB0007 or Stack Red&green together in folder A nd targets in folder B Combined Stacks0015

    opened by jamalgardezi 1
  • Data preprocessing

    Data preprocessing

    I have a lot of questions about data preprocessing as shown below, How to normalize brats and ixi image volume? Are the intensity values of each volume normalized into a range [-1,1] ? (Max-Min Normalization) Since the brats dataset contains images acquired under different clinical protocols and scanners, how the data normalization differs from IXI?

    Could you share your python script for data preprocessing? Thanks!

    opened by huaibovip 1
  • Datasets used in this paper

    Datasets used in this paper

    Hello Mr. Dalmaz, thank you for your nice work. To better reproduce your experiment, could you provide the standard datasets that have been processed for use in the paper ?

    opened by ElegantLee 0
  • Pixel-wise consistency loss between acquired and reconstructed source modalities based on an L1 distance

    Pixel-wise consistency loss between acquired and reconstructed source modalities based on an L1 distance

    Dear Mr. Dalmaz and rest of the team,

    First, I would like to thank you for your work and for making it available to the public.

    I am trying to use it for the task of sCT generation with my dataset. Unfortunately, I cannot find where in the code is the second term of the loss (as explained in your paper): the pixel-wise consistency loss or Lrec. Based on my understanding, this loss computes the L1 distance between the source image (MR in this case) and the MR image generated by the generator from the sCT?

    Would you mind pointing where does that happen in the code? I can only locate the pixel-wise L1 loss between the CT and the sCT, and the adversarial loss.

    Thanks in advance!

    Best regards

    opened by AgustinaLaGreca 0
Owner
ICON Lab
ICON Lab
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 3, 2023
PyTorch implementation of the Pose Residual Network (PRN)

Pose Residual Network This repository contains a PyTorch implementation of the Pose Residual Network (PRN) presented in our ECCV 2018 paper: Muhammed

Salih Karagoz 289 Nov 28, 2022
PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

Rishikesh (ऋषिकेश) 193 Jan 3, 2023
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

null 73 Jan 1, 2023
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Hong-Jia Chen 127 Dec 23, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Jan 1, 2023
Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

Jason Kuen 296 Dec 27, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Bee Lim 625 Dec 30, 2022
Explainability for Vision Transformers (in PyTorch)

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

Jacob Gildenblat 442 Jan 4, 2023
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supe

Facebook Research 4.2k Jan 3, 2023
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

null 117 Dec 7, 2022
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 6, 2023
A PyTorch library for Vision Transformers

VFormer A PyTorch library for Vision Transformers Getting Started Read the contributing guidelines in CONTRIBUTING.rst to learn how to start contribut

Society for Artificial Intelligence and Deep Learning 142 Nov 28, 2022
Implementation of various Vision Transformers I found interesting

Implementation of various Vision Transformers I found interesting

Kim Seonghyeon 78 Dec 6, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022