Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

van_der_Schaar \LAB

Last update: Dec 7, 2022

Related tags

Deep Learning clairvoyance

Overview

Clairvoyance: A Pipeline Toolkit for Medical Time Series

This repository contains implementations of Clairvoyance: A Pipeline Toolkit for Medical Time Series for the following applications.

Time-series prediction (one-shot and online)
Transfer learning
Individualized time-series treatment effects (ITE) estimation
Active sensing on time-series data
AutoML

All API files for those applications can be found in /api folder. All tutorials for those applications can be found in /tutorial folder.

Installation

There are currently two ways of installing the required dependencies: using Docker or using Conda.

Note on Requirements

Clairvoyance has been tested on Ubuntu 20.04, but should be broadly compatible with common Linux systems.
The Docker installation method is additionally compatible with Mac and Windows systems that support Docker.
Hardware requirements depends on the underlying ML models used, but a machine that can handle ML research tasks is recommended.
For faster computation, CUDA-capable Nvidia card is recommended (follow the CUDA-enabled installation steps below).

Docker installation

If you are not familiar with Docker, have a look at the resources:

Install Docker on your system: https://docs.docker.com/get-docker/.
[Required for CUDA-enabled installation only] Install Nvidia container runtime: https://github.com/NVIDIA/nvidia-container-runtime/.
- Assumes Nvidia drivers are correctly installed on your system.

Get the latest Clairvoyance Docker image:

$ docker pull clairvoyancedocker/clv:latest

To run the Docker container as a terminal, execute the below from the Clairvoyance repository root:
```
$ docker run -i -t --gpus all --network host -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data clairvoyancedocker/clv
```
- Explanation of the docker run arguments:
  - -i -t: Run a terminal session.
  - --gpus all: [Required for CUDA-enabled installation only], passes your GPU(s) to the Docker container, otherwise skip this option.
  - --network host: Use your machine's network and forward ports. Could alternatively publish ports, e.g. -p 8888:8888.
  - -v $(pwd)/datasets/data:/home/clvusr/clairvoyance/datasets/data: Share directory/ies with the Docker container as volumes, e.g. data.
  - clairvoyancedocker/clv: Specifies Clairvoyance Docker image.
- If using Windows:
  - Use PowerShell and first run the command $pwdwin = $(pwd).Path. Then use $pwdwin instead of $(pwd) in the docker run command.
- If using Windows or Mac:
  - Due to how Docker networking works, replace --network host with -p 8888:8888.
Run all following Clairvoyance API commands, jupyter notebooks etc. from within this Docker container.

Conda installation

Conda installation has been tested on Ubuntu 20.04 only.

From the Clairvoyance repo root, execute:

$ conda env create --name clvenv -f ./environment.yml
$ conda activate clvenv

Run all following Clairvoyance API commands, jupyter notebooks etc. in the clvenv environment.

Data

Clairvoyance expects your dataset files to be defined as follows:

Four CSV files (may be compressed), as illustrated below:

static_test_data.csv
static_train_data.csv
temporal_test_data.csv
temporal_train_data.csv

Static data file content format:

id,my_feature,my_other_feature,my_third_feature_etc
3wOSm2,11.00,4,-1.0
82HJss,3.40,2,2.1
iX3fiP,7.01,3,-0.4
...

Temporal data file content format:

id,time,variable,value
3wOSm2,0.0,my_first_temporal_feature,0.45
3wOSm2,0.5,my_first_temporal_feature,0.47
3wOSm2,1.2,my_first_temporal_feature,0.49
3wOSm2,0.0,my_second_temporal_feature,10.0
3wOSm2,0.1,my_second_temporal_feature,12.4
3wOSm2,0.3,my_second_temporal_feature,9.3
82HJss,0.0,my_first_temporal_feature,0.22
82HJss,1.0,my_first_temporal_feature,0.44
...

The id column is required in the static data files. The id,time,variable,value columns are required in the temporal file. The IDs of samples must match between the static and temporal files.

Your data files are expected to be under:

<clairvoyance_repo_root>/datasets/data/<your_dataset_name>/

See tutorials for how to define your dataset(s) in code.
Clairvoyance examples make reference to some existing datasets, e.g. mimic, ward. These are confidential datasets (or in case of MIMIC-III, it requires a training course and an access request) and are not provided here. Contact nm736@cam.ac.uk for more details.

Extract data from MIMIC-III

To use MIMIC-III with Clairvoyance, you need to get access to MIMIC-III and follow the instructions for installing it in a Postgres database: https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/

$ cd datasets/mimic_data_extraction && python extract_antibiotics_dataset.py

Usage

To run tutorials:
- Launch jupyter lab: $ jupyter-lab.
  - If using Windows or Mac and following the Docker installation method, run jupyter-lab --ip="0.0.0.0".
- Open jupyter lab in the browser by following the URL with the token.
- Navigate to tutorial/ and run a tutorial of your choice.
To run Clairvoyance API from the command line, execute the appropriate command from within the Docker terminal (see example command below).

Example: Time-series prediction

To run the pipeline for training and evaluation on time-series prediction framework, simply run $ python -m api/main_api_prediction.py or take a look at the jupyter notebook tutorial/tutorial_prediction.ipynb.

Note that any model architecture can be used as the predictor model such as RNN, Temporal convolutions, and transformer. The condition for predictor model is to have fit and predict functions as its subfunctions.

Stages of the time-series prediction:
- Import dataset
- Preprocess data
- Define the problem (feature, label, etc.)
- Impute missing components
- Select the relevant features
- Train time-series predictive model
- Estimate the uncertainty of the predictions
- Interpret the predictions
- Evaluate the time-series prediction performance on the testing set
- Visualize the outputs (performance, predictions, uncertainties, and interpretations)
Command inputs:
- data_name: mimic, ward, cf
- normalization: minmax, standard, None
- one_hot_encoding: input features that need to be one-hot encoded
- problem: one-shot or online
- max_seq_len: maximum sequence length after padding
- label_name: the column name for the label(s)
- treatment: the column name for treatments
- static_imputation_model: mean, median, mice, missforest, knn, gain
- temporal_imputation_model: mean, median, linear, quadratic, cubic, spline, mrnn, tgain
- feature_selection_model: greedy-addition, greedy-deletion, recursive-addition, recursive-deletion, None
- feature_number: selected feature number
- model_name: rnn, gru, lstm, attention, tcn, transformer
- h_dim: hidden dimensions
- n_layer: layer number
- n_head: head number (only for transformer model)
- batch_size: number of samples in mini-batch
- epochs: number of epochs
- learning_rate: learning rate
- static_mode: how to utilize static features (concatenate or None)
- time_mode: how to utilize time information (concatenate or None)
- task: classification or regression
- uncertainty_model_name: uncertainty estimation model name (ensemble)
- interpretation_model_name: interpretation model name (tinvase)
- metric_name: auc, apr, mae, mse

Example command:

$ cd api
$ python main_api_prediction.py \
    --data_name cf --normalization minmax --one_hot_encoding admission_type \
    --problem one-shot --max_seq_len 24 --label_name death \
    --static_imputation_model median --temporal_imputation_model median \
    --model_name lstm --h_dim 100 --n_layer 2 --n_head 2 --batch_size 400 \
    --epochs 20 --learning_rate 0.001 \
    --static_mode concatenate --time_mode concatenate \
    --task classification --uncertainty_model_name ensemble \
    --interpretation_model_name tinvase --metric_name auc

Outputs:
- Model prediction
- Model performance
- Prediction uncertainty
- Prediction interpretation

Citation

To cite Clairvoyance in your publications, please use the following reference.

Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, and Mihaela van der Schaar (2021). Clairvoyance: A Pipeline Toolkit for Medical Time Series. In International Conference on Learning Representations. Available at: https://openreview.net/forum?id=xnC8YwKUE3k.

You can also use the following Bibtex entry.

@inproceedings{
  jarrett2021clairvoyance,
  title={Clairvoyance: A Pipeline Toolkit for Medical Time Series},
  author={Daniel Jarrett and Jinsung Yoon and Ioana Bica and Zhaozhi Qian and Ari Ercole and Mihaela van der Schaar},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=xnC8YwKUE3k}
}

To cite the Clairvoyance alpha blog post, please use:

van Der Schaar, M., Yoon, J., Qian, Z., Jarrett, D., & Bica, I. (2020). clairvoyance alpha: the first pipeline toolkit for medical time series. [Webpages]. https://doi.org/10.17863/CAM.70020

@misc{https://doi.org/10.17863/cam.70020,
  doi = {10.17863/CAM.70020},
  url = {https://www.repository.cam.ac.uk/handle/1810/322563},
  author = {Van Der Schaar,  Mihaela and Yoon,  Jinsung and Qian,  Zhaozhi and Jarrett,  Dan and Bica,  Ioana},
  title = {clairvoyance alpha: the first pipeline toolkit for medical time series},
  publisher = {Apollo - University of Cambridge Repository},
  year = {2020}
}

Comments

Dead link in prediction tutorial

Thanks so much for your work on this!

I'm trying to run the tutorial here, but the link to https://github.com/jsyoon0823/time-series-automl.git is dead. Any updates on that? :-)

Thanks!

opened by MartinBernstorff 2
Tutorial fail because using mimic dataset and not mimic_antibiotics

Would it be possible to share the scripts to get the mimic dataset needed for the tutorials about prediction and active sensing. It uses to "normal" mimic dataset, but only for the mimim_antibiotics dataset the data extraction script is given.

It's a wonderful package and very well written and documented ;) Congrats to that.

Best wishes, Elly

opened by IceQueen1996 2
Extension of models to more complex structures

Hi, Just wanted to ask if there are any plans to include other models into the package that your lab has developed. Specifically Dynamic-Deephit and DCN-PD would be great to have.

opened by Mvdnheuv 1
net_helpers.py calls deprecated Adam optimised

In treatments/RMSN/libs/net_helpers.py the line

def get_optimization_graph(loss, learning_rate, max_global_norm, global_step, optimisation_function=tf.train.AdamOptimizer):

tf.train.AdamOptimizer is deprecated: should now be tf.train.Adam()

i.e.

def get_optimization_graph(loss, learning_rate, max_global_norm, global_step, optimisation_function=tf.optimizers.Adam()):

opened by ariercole 1
CRN_Base.py LSTMCell / DropoutWrapped import deprecated

In CRN_Base.py,

from tensorflow.contrib.rnn import LSTMCell, DropoutWrapper does not work because it has been moved.

Instead call (at least for now)

from tensorflow.compat.v1.nn.rnn_cell import LSTMCell

opened by ariercole 1
gain_imputation.py code tries to import deprecated tensorflow class

The gain_imputation.py code tries to import deprecated tensorflow.saved_model.tag_constants.

Fortunately the tags don't seem to be required as far as I can see so

tf.saved_model.loader.load(sess, [tag_constants.SERVING], self.save_file_directory)

can be replaced by

tf.saved_model.loader.load(sess, self.save_file_directory)

opened by ariercole 1
missingpy problem

Not directly a clairvoyance problems but some missingpy 0.2.0 components try to import sklearn.neighbors.base which isn’t right anymore as that’s now been renamed sklearn.neighbors._base since sklearn 0.22.1 I believe. Have reported issue to missingpy but work around is to modify missingpy components directly.

opened by ariercole 1
Missing empty ~/clairvoyance/datasets/data/mimic_antibiotics directory

Small issue but it would be good to have an empty '~/clairvoyance/datasets/data/mimic_antibiotics' directory set up in the repo as a placeholder (extract_antibiotics_dataset.py script fails otherwise).

:-)

opened by ariercole 0

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Related tags

Overview

Clairvoyance: A Pipeline Toolkit for Medical Time Series

Installation

Note on Requirements

Docker installation

Conda installation

Data

Extract data from MIMIC-III

Usage

Example: Time-series prediction

Citation

Comments

Owner

van_der_Schaar \LAB

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

A unified framework for machine learning with time series

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Model search is a framework that implements AutoML algorithms for model architecture search at scale

An AutoML Library made with Optuna and PyTorch Lightning

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

MMRazor: a model compression toolkit for model slimming and AutoML

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

End-to-end beat and downbeat tracking in the time domain.

A keras-based real-time model for medical image segmentation (CFPNet-M)

Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

End-to-End Object Detection with Fully Convolutional Network