Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

Christos Matsoukas

Last update: Dec 27, 2022

Related tags

Deep Learning medical_transformers

Overview

Is it Time to Replace CNNs with Transformers for Medical Images?

Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Convolutional Neural Networks (CNNs) have reigned for a decade as the de facto approach to automated medical image diagnosis. Recently, vision transformers (ViTs) have appeared as a competitive alternative to CNNs, yielding similar levels of performance while possessing several interesting properties that could prove beneficial for medical imaging tasks. In this work, we explore whether it is time to move to transformer-based models or if we should keep working with CNNs - can we trivially switch to transformers? If so, what are the advantages and drawbacks of switching to ViTs for medical image diagnosis? We consider these questions in a series of experiments on three mainstream medical image datasets. Our findings show that, while CNNs perform better when trained from scratch, off-the-shelf vision transformers using default hyperparameters are on par with CNNs when pretrained on ImageNet, and outperform their CNN counterparts when pretrained using self-supervision.

Enviroment setup

To build using the docker file use the following command
docker build -f Dockerfile -t med_trans \
--build-arg UID=$(id -u) \
--build-arg GID=$(id -g) \
--build-arg USER=$(whoami) \
--build-arg GROUP=$(id -g -n) .

Usage:

Training: python classification.py
Training with DINO: python classification.py --dino
Testing (using json file): python classification.py --test
Testing (using saved checkpoint): python classification.py --checkpoint CheckpointName --test
Fine tune the learning rate: python classification.py --lr_finder

Configuration (json file)

dataset_params
- dataset: Name of the dataset (ISIC2019, APTOS2019, DDSM)
- data_location: Location that the datasets are located
- train_transforms: Defines the augmentations for the training set
- val_transforms: Defines the augmentations for the validation set
- test_transforms: Defines the augmentations for the test set
dataloader_params: Defines the dataloader parameters (batch size, num_workers etc)
model_params
- backbone_type: type of the backbone model (e.g. resnet50, deit_small)
- transformers_params: Additional hyperparameters for the transformers
  - img_size: The size of the input images
  - patch_size: The patch size to use for patching the input
  - pretrained_type: If supervised it loads ImageNet weights that come from supervised learning. If dino it loads ImageNet weights that come from sefl-supervised learning with DINO.
- pretrained: If True, it uses ImageNet pretrained weights
- freeze_backbone: If True, it freezes the backbone network
- DINO: It controls the hyperparameters for when training with DINO
optimization_params: Defines learning rate, weight decay, learning rate schedule etc.
- optimizer: The default optimizer's parameters
  - type: The optimizer's type
  - autoscale_rl: If True it scales the learning rate based on the bach size
  - params: Defines the learning rate and the weght decay value
- LARS_params: If use=True and bach size >= batch_act_thresh it uses LARS as optimizer
- scheduler: Defines the learning rate schedule
  - type: A list of schedulers to use
  - params: Sets the hyperparameters of the optimizers
training_params: Defines the training parameters
- model_name: The model's name
- val_every: Sets the frequency of the valiidation step (epochs - float)
- log_every: Sets the frequency of the logging (iterations - int)
- save_best_model: If True it will save the bast model based on the validation metrics
- log_embeddings: If True it creates U-maps on each validation step
- knn_eval: If True, during validation it will also calculate the scores based on knn evalutation
- grad_clipping: If > 0, it clips the gradients
- use_tensorboard: If True, it will use tensorboard for logging instead of wandb
- use_mixed_precision: If True, it will use mixed precision
- save_dir: The dir to save the model's checkpoints etc.
system_params: Defines if GPUs are used, which GPUs etc.
log_params: Project and run name for the logger (we are using Weights & Biases by default)
lr_finder: Define the learning rate parameters
- grid_search_params
  - min_pow, min_pow: The min and max power of 10 for the search
  - resolution: How many different learning rates to try
  - n_epochs: maximum epochs of the training session
  - random_lr: If True, it uses random learning rates withing the accepted range
  - keep_schedule: If True, it keeps the learning rate schedule
  - report_intermediate_steps: If True, it logs if validates throughout the training sessions
transfer_learning_params: Turns on or off transfer learning from pretrained models
- use_pretrained: If True, it will use a pretrained model as a backbone
- pretrained_model_name: The pretrained model's name
- pretrained_path: If the prerained model's dir

You might also like...

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

261 Jan 9, 2023

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

81 Sep 25, 2021

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

187 Dec 26, 2022

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds Björn Michele1), Alexandre Boulch1), Gilles Puy1), Maxime Bucher1) and Rena

15 Dec 22, 2022

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Intention Adaptive Graph Neural Network (IAGNN) This is the official repository of paper Intention Adaptive Graph Neural Network for Category-Aware Se

9 Nov 22, 2022

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Comments

Batch Size

Hello, I was wondering what batch size was used for the supervised learning/fine-tuning task? From my understanding, The paper mentions 256, which is for self-supervised training, and 64 (https://github.com/ChrisMats/medical_transformers/blob/ecd460a66ea2b03a83caf4c94546ef012e17e049/params.json#L99) for supervised learning across all three datasets. Please correct me if I am wrong.

Thank You.

opened by pranavsinghps1 5

Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

Related tags

Overview

Is it Time to Replace CNNs with Transformers for Medical Images?

Enviroment setup

Usage:

Configuration (json file)

You might also like...

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

source code of “Visual Saliency Transformer” (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

Comments

Batch Size

Owner

Christos Matsoukas

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

ICCV2021 Paper: AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"