Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Related tags

Deep Learning MCAT

Overview

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

[ICCV 2021]

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

@inproceedings{chen2021multimodal,
  title={Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images},
  author={Chen, Richard J and Lu, Ming Y and Weng, Wei-Hung and Chen, Tiffany Y and Williamson, Drew FK and Manz, Trevor and Shady, Maha and Mahmood, Faisal},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={4015--4025},
  year={2021}
}

Updates:

11/12/2021: Several users have raised concerns about the low c-Index for GBMLGG in SNN (Genomic Only). In using the gene families from MSigDB as gene signatures, IDH1 mutation was not included (key biomarker in distinguishing GBM and LGG).
06/18/2021: Updated data preprocessing section for reproducibility.
06/17/2021: Uploaded predicted risk scores on the validation folds for each models, and the evaluation script to compute the c-Index and Integrated AUC (I-AUC) validation metrics, found using the following Jupyter Notebook. Model checkpoints for MCAT are uploaded in the results directory.
06/17/2021: Uploaded notebook detailing the MCAT network architecture, with sample input in the following following Jupyter Notebook, in which we print the shape of the tensors at each stage of MCAT.

Pre-requisites:

Linux (Tested on Ubuntu 18.04)
NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Ti x 16) with CUDA 11.0 and cuDNN 7.5
Python (3.7.7), h5py (2.10.0), matplotlib (3.1.1), numpy (1.18.1), opencv-python (4.1.1), openslide-python (1.1.1), openslide (3.4.1), pandas (1.1.3), pillow (7.0.0), PyTorch (1.6.0), scikit-learn (0.22.1), scipy (1.4.1), tensorflow (1.13.1), tensorboardx (1.9), torchvision (0.7.0), captum (0.2.0), shap (0.35.0)

Installation Guide for Linux (using anaconda)

1. Downloading TCGA Data

To download diagnostic WSIs (formatted as .svs files), molecular feature data and other clinical metadata, please refer to the NIH Genomic Data Commons Data Portal and the cBioPortal. WSIs for each cancer type can be downloaded using the GDC Data Transfer Tool.

2. Processing Whole Slide Images

To process WSIs, first, the tissue regions in each biopsy slide are segmented using Otsu's Segmentation on a downsampled WSI using OpenSlide. The 256 x 256 patches without spatial overlapping are extracted from the segmented tissue regions at the desired magnification. Consequently, a pretrained truncated ResNet50 is used to encode raw image patches into 1024-dim feature vectors, which we then save as .pt files for each WSI. The extracted features then serve as input (in a .pt file) to the network. The following folder structure is assumed for the extracted features vectors:

DATA_ROOT_DIR/
    └──TCGA_BLCA/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_BRCA/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_GBMLGG/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_LUAD/
        ├── slide_1.ptd
        ├── slide_2.pt
        └── ...
    └──TCGA_UCEC/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    ...

DATA_ROOT_DIR is the base directory of all datasets / cancer type(e.g. the directory to your SSD). Within DATA_ROOT_DIR, each folder contains a list of .pt files for that dataset / cancer type.

3. Molecular Features and Genomic Signatures

Processed molecular profile features containing mutation status, copy number variation, and RNA-Seq abundance can be downloaded from the cBioPortal, which we include as CSV files in the following directory. For ordering gene features into gene embeddings, we used the following categorization of gene families (categorized via common features such as homology or biochemical activity) from MSigDB. Gene sets for homeodomain proteins and translocated cancer genes were not used due to overlap with transcription factors and oncogenes respectively. The curation of "genomic signatures" can be modified to curate genomic embedding that reflect unique biological functions.

4. Training-Validation Splits

For evaluating the algorithm's performance, we randomly partitioned each dataset using 5-fold cross-validation. Splits for each cancer type are found in the splits/5foldcv folder, which each contain splits_{k}.csv for k = 1 to 5. In each splits_{k}.csv, the first column corresponds to the TCGA Case IDs used for training, and the second column corresponds to the TCGA Case IDs used for validation. Alternatively, one could define their own splits, however, the files would need to be defined in this format. The dataset loader for using these train-val splits are defined in the get_split_from_df function in the Generic_WSI_Survival_Dataset class (inherited from the PyTorch Dataset class).

5. Running Experiments

To run experiments using the SNN, AMIL, and MMF networks defined in this repository, experiments can be run using the following generic command-line:

CUDA_VISIBLE_DEVICES=<DEVICE ID> python main.py --which_splits <SPLIT FOLDER PATH> --split_dir <SPLITS FOR CANCER TYPE> --mode <WHICH MODALITY> --model_type <WHICH MODEL>

Commands for all experiments / models can be found in the Commands.md file.

Comments

Question about 'fast_cluster_ids.pkl'

Thank you for sharing your code！I‘m interested in your research, it gives me a lot of inspiration. While trying to run the code, I'm confused about the file 'fast_cluster_ids.pkl' in dataset_survival.py. I can't find a description about it. Could you please tell me what this file contains? Thank you very much!

opened by Houjiaxin123 4
Codes about "Processing Whole Slide Images"

Hello, thank you very much for your paper and project.

Could you please provide the exact codes in Processing Whole Slide Images?

Looking forward to your reply.

opened by SuooL 1
Questions about the computation of the survival layer in `MCAT_Surv`

It seems that the computation of the survival layer in MCAT_Surv(link) is wrong, and logits = self.classifier(h).unsqueeze(0) should be logits = self.classifier(h). With the old version, supposing that the batch_size=6 and n_classes=4, the logits will be of size of (1,6,4), the hazards will be of size of (1,6,4), the Y_hat will be of size of (1,1,4), which certainly does not contain the Y_hat for the 6 samples of the batch. Besides, the S will means the cumulative production of the survival(i.e. 1-hazards) along the batch dimension, what does this mean? This S is of size of (1,6,4), then the len(S) in CoxSurvLoss(link) will be 1, which certainly is not the batch size as expected.

In the end, could you provide the reference of the equations for you to write this cox loss?

opened by huangmozhilv 1

Please help check this line

Dear authors, Please help check the following line: https://github.com/mahmoodlab/MCAT/blob/b9cca63be83c67de7f95308d54a58f80b78b0da1/datasets/dataset_survival.py#L63

I have tested the code as follows:

import pandas as pd
import numpy as np

csv_path = 'MCAT_master/datasets_sig_csv/tcga_brca_all_clean.csv.zip'
slide_data = pd.read_csv(csv_path, low_memory=False)

if "IDC" in slide_data['oncotree_code']:  # must be BRCA (and if so, use only IDCs)
    print('Yes, IDC is in there')
else:
    print('No, IDC is not in there')

if "IDC" in slide_data['oncotree_code'].values:  # must be BRCA (and if so, use only IDCs)
    print('Yes, IDC is in there')
else:
    print('No, IDC is not in there')

And the output is as follows:

No, IDC is not in there
Yes, IDC is in there

Is this a bug? My pandas version is 1.4.1. Could you please help check it? Thanks a lot!

opened by hachikong 0

Co-attention visualizations?

Hi,

Thanks for the publishing the nice code. I can't find code in this repo that makes the slide attention visualizations similar to that seen in Figures 2 and 3 of the paper. Is this available somewhere?

Thanks! Ben

opened by benlansdell 0

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Related tags

Overview

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

[ICCV 2021]

Updates:

Pre-requisites:

Installation Guide for Linux (using anaconda)

1. Downloading TCGA Data

2. Processing Whole Slide Images

3. Molecular Features and Genomic Signatures

4. Training-Validation Splits

5. Running Experiments

Comments

Question about 'fast_cluster_ids.pkl'

Codes about "Processing Whole Slide Images"

Questions about the computation of the survival layer in `MCAT_Surv`

Please help check this line

Co-attention visualizations?

Owner

Mahmood Lab @ Harvard/BWH

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

Deep Learning Slide Captcha

Makes patches from huge resolution .svs slide files using openslide

Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Improving XGBoost survival analysis with embeddings and debiased estimators

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).

multimodal transformer

METER: Multimodal End-to-end TransformER

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"