This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

Overview

HiRID-ICU-Benchmark

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

We first introduce key resources to better understand the structure and specificity of the data. We then detail the different features of our pipeline and how to use them as shown in the below figure.

Figure

Key Resources

We build our work on previously released data, models, and metrics. To help users which might be unfamiliar with them we provide in this section some related documentation.

HiRID data

We based our benchmark on a recent dataset in intensive care called HiRID. It is a freely accessible critical care dataset containing data from more than 33,000 patient admissions to the Department of Intensive Care Medicine, Bern University Hospital, Switzerland (ICU) from January 2008 to June 2016. It was first released as part of the circulatory Early Warning Score project.

First, you can find some more details about the demographics of the patients of the data in Appendix A: HiRID Dataset Details. However, for more details about the original data, it's better to refer to its latest documentation . More in detail the documentation contains the following sections of interest:

  • Getting started This first section points to a jupyter notebook to familiarize yourself with the data.
  • Data details This second section contains a description of the variables existing in the dataset. To complete this section you can refer to our varref.tsv which we use to build the common version of the data.
  • Structure of the published data This final section contains details about the structure of the raw data you will have to download and place in hirid-data-root folder (see "Run Pre-Processing").

Models

As for the data, in this benchmark, we compare existing machine learning models that are commonly used for multivariate time-series data. For these models' implementation we use pytorch, for the deep learning models, lightgbm for the boosted tree approaches, and sklearn for the logistic regression model and metrics. In the deep learning models we used the following models:

Metrics

In our benchmark we use different metrics depending on the tasks, however, all the implementations are from sklearn which documents well their usage:

Setup

In the following we assume a Linux installation, however, other platforms may also work

  1. Install Conda, see the official installation instructions
  2. clone this repository and change into the directory of the repository
  3. conda env update (creates an environment icu-benchmark)
  4. pip install -e .

Download Data

  1. Get access to the HiRID 1.1.1 dataset on physionet. This entails
    1. getting a credentialed physionet account
    2. submit a usage request to the data depositor
  2. Once access is granted, download the following files
    1. reference_data.tar.gz
    2. observation_tables_parquet.tar.gz
    3. pharma_records_parquet.tar.gz
  3. unpack the files into the same directory using e.g. cat *.tar.gz | tar zxvf - -i

How to Run

Run Prepocessing

Activate the conda environment using conda activate icu-benchmark. Then

icu-benchmarks preprocess --hirid-data-root [path to unpacked parquet files as downloaded from phyiosnet] \
                          --work-dir [output directory] \
                          --var-ref-path ./preprocessing/resources/varref.tsv \
                          --split-path ./preprocessing/resources/split.tsv \
                          --nr-workers 8

The above command requires about 6GB of RAM per core and in total approximately 30GB of disk space.

Run Training

Custom training

To run a custom training you should, activate the conda environment using conda activate icu-benchmark. Then

icu-benchmarks train -c [path to gin config] \
                     -l [path to logdir] \
                     -t [task name] \
                     -sd [seed number] 

Task name should be one of the following: Mortality_At24Hours, Dynamic_CircFailure_12Hours, Dynamic_RespFailure_12Hours, Dynamic_UrineOutput_2Hours_Reg, Phenotyping_APACHEGroup or Remaining_LOS_Reg.\ To see an example of gin-config file please refer to ./configs/. You can also check directly the gin-config documentation. this will create a new directory [path to logdir]/[task name]/[seed number]/ containing:

  • val_metrics.pkl and test_metrics.pkl: Pickle files with model's performance respectively validation and test sets.
  • train_config.gin: The so-called "operative" config allowing the save the configuration used at training.
  • model.(torch/txt/joblib) : The weights of the model that was trained. The extension depends model type.
  • tensorboard/: (Optional) Directory with tensorboard logs. One can do tensorboard --logdir ./tensorboard to visualize them,

Reproduce experiments from the paper

If you are interested in reproducing the experiments from the paper, you can directly use the pre-built scripts in ./run_scripts/. For instance, you can run the following command to reproduce the GRU baseline on the Mortality task:

sh run_script/baselines/Mortality_At24Hours/GRU.sh

As for custom training, you will create a directory with the files mentioned above. The pre-built scripts are divided into four categories as follows:

  • baselines: This folder contains scripts to reproduce the main benchmark experiment. Each of them will run a model with the best parameters we found using a random search for 10 identical seeds.
  • ablations: This folder contains the scripts to reproduce the ablations studies on the horizon, sequence length, and weighting.
  • random-search: This script will run each one instance of a random search. This means if you want a k-run search you need to run it k times.
  • pretrained: This last type of script allows us to evaluate pretrain models from our experiments. We discuss them more in detail in the next section

Run Evaluation of Pretrained Models

Custom Evaluation

As for training a model, you can evaluate any previously trained model using the evaluate as follows:

icu-benchmarks evaluate -c [path to gin config] \
                        -l [path to logdir] \
                        -t [task name] \

This command will evaluate the model at [path to logdir]/[task name]/model.(torch/txt/joblib) on the test set of the dataset provided in the config. Results are saved to test_metrics.pkl file.

Evaluate Manuscript models

To either check the pre-processing pipeline outcome or simply reproduce the paper results we provided weights for all models of the benchmark experiment in files/pretrained_weights. Please note that the data items in this repository utilize the git-lfs framework. You need to install git-lfs on your system to be able to download and access the pretrained weights.

Once this is done you can evaluate any network by running :

sh ./run_scripts/pretrained/[task name]/[model name].sh

Note that we provide only one set of weights for each model which corresponds to the median performance among the 10 runs reported in the manuscript.

Run Pipeline on Simulated Data

We provide a small toy data set to test the processing pipeline and to get a rough impression how to original data looks like. Since there are restrictions accessing the HiRID data set, instead of publishing a small subset of the data, we generated a very simple simulated dataset based on some statistics aggregated from the full HiRID dataset. It is however not useful for data exploration or training, as for example the values are sampled independently from each other and any structure between variables in the original data set is not represented.

The example data set is provided in files/fake_data. Similar as with the original data, the preprocessing pipeline can be run using

icu-benchmarks preprocess --hirid-data-root files/fake_data --work-dir fake_data_wdir --var-ref-path preprocessing/resources/varref.tsv

Note, that for this fake dataset some models cannot be successfully trained, as the training instances are degenerate. In case you'd like to explore the training part of our pipeline, you could work with pretrained models as described above.

Dataset Generation

The data set was generated using the following command:

python -m icu_benchmarks.synthetic_data.generate_simple_fake_data files/dataset_stats/ files/fake_data/ --var-ref-path preprocessing/resources/varref.tsv

The script generate_simple_fake_data.py generates fake observation and pharma records in the following way: It first generates a series of timestamps where the difference between consecutive timestamps is sampled from the distribution of timestamp differences in the original dataset. Then, for every timestamp, a variableid/pharmaid is selected at random also according to the distribution in the original dataset. Finally, we sample the values of a variable from a gaussian with mean and standard deviation as observed in the original data. We then clip the values to fit the lower and upperbound as given in the varref table.

The necessary statistics for sampling can be found in files/dataset_stats. They were generated using

python -m icu_benchmarks.synthetic_data.collect_stats [Path to the decompressed parquet data directory as published on physionet] files/dataset_stats/

License

You can find the license for the original HiRID data here. For our code we license it under a MIT License

Comments
  • Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.

    Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.

    I was trying to get the HIRID benchmark working, but when attempting the icu-benchmarks preprocess command, I get the following warning: Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process. This warning repeats every few minutes. The data processing does seem to work, but very slowly (on i9-12900h). I used the following command to run the preprocessing:

    icu-benchmarks preprocess --hirid-data-root "/mnt/c/Users/Robin van de Water/Documents/Datasets/hirid/hirid_icu_benchmark" --work-dir "/mnt/c/Users/Robin van de Water/Documents/Datasets/hirid_benchmark_preprocessed" --var-ref-path ./preprocessing/resources/varref.tsv --split-path ./preprocessing/resources/split.tsv --nr-workers 8

    Your work seems great, and I would like to get it running. Thank you in advance for your help.

    opened by rvandewater 9
  • HDF5 issue when reading hdf5 file

    HDF5 issue when reading hdf5 file

    Hi,

    I was trying to run the script or GRU and I face the following error:

    self._g_read_slice(startl, stopl, stepl, nparr) File "tables/hdf5extension.pyx", line 1585, in tables.hdf5extension.Array._g_read_slice tables.exceptions.HDF5ExtError: HDF5 error back trace

    File "H5Dio.c", line 199, in H5Dread can't read data File "H5Dio.c", line 601, in H5D__read can't read data File "H5Dchunk.c", line 2229, in H5D__chunk_read unable to read raw data chunk File "H5Dchunk.c", line 3609, in H5D__chunk_lock data pipeline read failed File "H5Z.c", line 1326, in H5Z_pipeline filter returned failure during read File "hdf5-blosc/src/blosc_filter.c", line 254, in blosc_filter Blosc decompression error

    End of HDF5 error back trace

    Problems reading the array data.

    In call to configurable 'train' (<function DLWrapper.train at 0x7fcbb45d11f0>) In call to configurable 'train_common' (<function train_common at 0x7fcbb4c64e50>) Closing remaining open files:../output_dir/ml_stage/ml_stage.h5...done../output_dir/ml_stage/ml_stage.h5...done

    Process finished with exit code 1

    I found that this issue is due to safe threading in hdf5. I am trying to recompile the hdf5 manually. I am not sure if it is the source of the issue. I wonder if you have faced a similar problem when turning the on_RAM flag to False and running the code? Also, it would be great if the version of HDF5 be specified.

    Thank you.

    opened by minasmz 9
  • Correct value of --hirid-data-root?

    Correct value of --hirid-data-root?

    Hi,

    Thank you for providing this public resource! We are attempting to run the data preprocessing and are running into parquet errors. Specifically, after we follow steps 2 + 3 in the downloading data instructions in the readme, and run

    python icu_benchmarks/run.py preprocess --hirid-data-root BASE_DIRECTORY_WE_DOWNLOADED_TAR_GZ_FILES_INTO
    --work-dir MY_OUTPUT_DIR
    --var-ref-path ./preprocessing/resources/varref.tsv
    --split-path ./preprocessing/resources/split.tsv
    --nr-workers 8

    we run into the following error

    OSError: Could not open parquet input source 'BASE_DIRECTORY_WE_DOWNLOADED_TAR_GZ_FILES_TO/observation_tables/observation_tables_index.csv': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

    I assume this is because we are somehow passing in the wrong value of hirid-data-root, since these are indeed CSVs, not parquet files. But we are not sure what path to use instead. Using BASE_DIRECTORY_WE_DOWNLOADED_TAR_GZ_FILES_INTO/observation_tables/ or BASE_DIRECTORY_WE_DOWNLOADED_TAR_GZ_FILES_INTO/observation_tables/parquet/ also yields errors.

    opened by epierson9 8
  • confirming arterial readings

    confirming arterial readings

    Hello, and thanks for all your help with the data set! We just wanted to confirm the arterial readings we are working with. We found a high fraction of patients (~86%) have a_SO2 and a_Lac readings. Is this what you would expect? Best, Michela

    opened by michela-meister 3
  • AttributeError: Can't pickle local object

    AttributeError: Can't pickle local object

    Hello,

    I was trying to run a model (GRU) on the preprocessed HiRID dataset, but I get the error mentioned below(it is just some last sentences of error). I guess it is related to the DLWrapper but I could not figure it out. I appreciate it if you solve this issue.

    File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'WeakValueDictionary.init..remove' In call to configurable 'train' (<function DLWrapper.train at 0x14480b550>) In call to configurable 'train_common' (<function train_common at 0x1440bd160>)

    Thank you!

    opened by minasmz 3
  • How to save a simple logistic regression model?

    How to save a simple logistic regression model?

    Hi,

    I was trying to build a logistic regression model on the preprocessed data. The model is built and test and validation .pkl files are created in the log folder. I tried to save the logistic regression model as a .pkl file as well. As a variable (save_weights) defined in the train function of the wrappers.py file for this and I set it to "True" to save the model as follow:

    @gin.configurable(module='MLWrapper') def train(self, train_dataset, val_dataset, weight, patience=gin.REQUIRED, save_weights=True):

    But I get the error of: _pickle.PicklingError: Can't pickle <class 'sklearn.linear_model._logistic.LogisticRegression'>: it's not the same object as sklearn.linear_model._logistic.LogisticRegression In call to configurable 'train' (<function MLWrapper.train at 0x141d17670>) In call to configurable 'train_common' (<function train_common at 0x1414d1670>)

    and I cannot find the issue. It seems that the LogisticRegression functions are different... I appreciate it if you help me to find the problem and solve this issue.

    opened by minasmz 3
  • common_stage: definitions of column names

    common_stage: definitions of column names

    Hello,

    Thank you for creating this dataset! We would really like to work with the common_stage matrices, however many of the column names are abbreviations or are in German. Do you have definitions for the column names in the common_stage matrices? I tried using the hirid variable reference spreadsheet to understand the column names, but I could not find a mapping.

    opened by michela-meister 2
  • common_stage: arterial measurement labels

    common_stage: arterial measurement labels

    Hello, We are trying to work with data on oxygen saturation in arterial blood (concept ID 20000800 the hirid variable reference spreadsheet). In the appendix of the original HiRID paper, the authors explain that the raw data includes venous measurements that were incorrectly labeled as arterial, and they had to correct these measurements during preprocessing (which they do here with with the function change_arterial_to_venous). Do the common_stage matrices include the correct labels for arterial measurements?

    opened by michela-meister 1
  • Inference output prediction

    Inference output prediction

    Hi, If we want to evaluate the accuracy in classification task, how we can convert the predicted tensor to the predicted label? For example in Mortality task-classification setup, If the predicted output is [-2.4, 1.1] how we can compute the final predicted class? Can we simply assume a max operation? Or do we need to multiply classes with some defined wights?

    opened by N-NEJATISHAHIDIN 1
  • How to screen patients admitted due to trauma and query blood transfusion information?

    How to screen patients admitted due to trauma and query blood transfusion information?

    I am using the HIRID database to screen patients admitted for trauma, but the database does not seem to contain information about the diagnosis of admission or the type of treatment the patient receives. Can I screen patients admitted to the hospital for trauma through other channels?

    In addition, by searching keywords, I retrieved the following IDs related to blood transfusion in the table: 1000050, 1000744 (Transfusion of plasma (FFP)) 1000100, 1000743 (Intravenous blood transfusion of packed cells) 1000245, 1000201 (Platelet transfusion)

    But` I am not sure if I have missed other fluids related to blood transfusion.

    opened by xujiameng 1
  • Feature/additional unittest

    Feature/additional unittest

    This PR contains two main components:

    • The refactoring of preprocessing scripts that no longer depend on gin config. (Training script still does)
    • The addition of unit test of all preprocessing step of the pipeline
    opened by hugoych 0
Owner
Biomedical Informatics at ETH Zurich
Biomedical Informatics at ETH Zurich
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

Crisalix 72 Dec 10, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Ximing Yang 4 Dec 14, 2021
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 9, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
A novel benchmark dataset for Monocular Layout prediction

AutoLay AutoLay: Benchmarking Monocular Layout Estimation Kaustubh Mani, N. Sai Shankar, J. Krishna Murthy, and K. Madhava Krishna Abstract In this pa

Kaustubh Mani 39 Apr 26, 2022
Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E. Evaluated on benchmark dataset Office31.

Deep-Unsupervised-Domain-Adaptation Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E.

Alan Grijalva 49 Dec 20, 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ ?? ??‍?? ??‍⚖️ Dataset Summary Inspired by the recent widespread use of th

null 95 Dec 8, 2022
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN)

Flickr-Faces-HQ Dataset (FFHQ) Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative

NVIDIA Research Projects 2.9k Dec 28, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

Double Cube Engravings This script creates a dataset for multi-label mesh clasification, with an intentionally difficult setup for point cloud classif

Yotam Erel 1 Nov 30, 2021
OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

?? DrugOOD ?? : OOD Dataset Curator and Benchmark for AI Aided Drug Discovery This is the official implementation of the DrugOOD project, this is the

null 108 Dec 17, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

Ryo Yonetani 2 Jan 16, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022