The easiest tool for extracting radiomics features and training ML models on them.

Last update: Aug 4, 2022

Overview

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()

Comments

Preprocessing features fails during machine learning

Describe the bug

Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)

from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
feature_df = extractor.run()

feature_df.head()

label_df = pd.read_csv(data_dir / "labels.csv")
label_df.sample(5)

from autorad.data.dataset import FeatureDataset

merged_feature_df = feature_df.merge(label_df, left_on="ID",
    right_on="patient_ID", how="left")
feature_dataset = FeatureDataset(
    merged_feature_df,
    target="diagnosis",
    ID_colname="ID"
)

splits_path = result_dir / "splits.json"
feature_dataset.split(method="train_val_test", save_path=splits_path)

from autorad.models.classifier import MLClassifier
from autorad.training.trainer import Trainer

models = MLClassifier.initialize_default_sklearn_models()
print(models)

trainer = Trainer(
    dataset=feature_dataset,
    models=models,
    result_dir=result_dir,
    experiment_name="Fibromatosis_vs_sarcoma_classification",
)
trainer.run_auto_preprocessing(
        selection_methods=["boruta"],
        oversampling=False,
        )

Expected Results

Initialising the trainer and running preprocessing on the features

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [15], in <cell line: 7>()
      1 trainer = Trainer(
      2     dataset=feature_dataset,
      3     models=models,
      4     result_dir=result_dir,
      5     experiment_name="Fibromatosis_vs_sarcoma_classification",
      6 )
----> 7 trainer.run_auto_preprocessing(
      8         selection_methods=["boruta"],
      9         oversampling=False,
     10         )

File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
     70 preprocessor = Preprocessor(
     71     normalize=True,
     72     feature_selection_method=selection_method,
     73     oversampling_method=oversampling_method,
     74 )
     75 try:
     76     preprocessed[selection_method][
     77         oversampling_method
---> 78     ] = preprocessor.fit_transform(self.dataset.data)
     79 except AssertionError:
     80     log.error(
     81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
     82     )

File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
     64 result_y = {}
     65 all_features = X.train.columns.tolist()
---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
     67     X.train, y.train
     68 )
     69 self.selected_features = self.pipeline["select"].selected_features(
     70     column_names=all_features
     71 )
     72 result_X["train"] = pd.DataFrame(
     73     X_train_trans, columns=self.selected_features
     74 )

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
    432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    433 if hasattr(last_step, "fit_transform"):
--> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
    435 else:
    436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)

File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
     44 def fit_transform(
     45     self, X: np.ndarray, y: np.ndarray
     46 ) -> tuple[np.ndarray, np.ndarray]:
---> 47     self.fit(X, y)
     48     return X[:, self.selected_columns], y

File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
    122 with warnings.catch_warnings():
    123     warnings.simplefilter("ignore")
--> 124     model.fit(X, y)
    125 self.selected_columns = np.where(model.support_)[0].tolist()
    126 if not self.selected_columns:

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
    188 def fit(self, X, y):
    189     """
    190     Fits the Boruta feature selection with the provided estimator.
    191 
   (...)
    198         The target values.
    199     """
--> 201     return self._fit(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
    249 def _fit(self, X, y):
    250     # check input params
--> 251     self._check_params(X, y)
    252     self.random_state = check_random_state(self.random_state)
    253     # setup variables for Boruta

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
    513 """
    514 Check hyperparameters as well as X and y before proceeding with fit.
    515 """
    516 # check X and y are consistent len, X is Array and y is column
--> 517 X, y = check_X_y(X, y)
    518 if self.perc <= 0 or self.perc > 100:
    519     raise ValueError('The percentile should be between 0 and 100.')

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    961 if y is None:
    962     raise ValueError("y cannot be None")
--> 964 X = check_array(
    965     X,
    966     accept_sparse=accept_sparse,
    967     accept_large_sparse=accept_large_sparse,
    968     dtype=dtype,
    969     order=order,
    970     copy=copy,
    971     force_all_finite=force_all_finite,
    972     ensure_2d=ensure_2d,
    973     allow_nd=allow_nd,
    974     ensure_min_samples=ensure_min_samples,
    975     ensure_min_features=ensure_min_features,
    976     estimator=estimator,
    977 )
    979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
    981 check_consistent_length(X, y)

File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    744         array = array.astype(dtype, casting="unsafe", copy=False)
    745     else:
--> 746         array = np.asarray(array, order=order, dtype=dtype)
    747 except ComplexWarning as complex_warning:
    748     raise ValueError(
    749         "Complex data not supported\n{}\n".format(array)
    750     ) from complex_warning

ValueError: could not broadcast input array from shape (60,1015) into shape (60,)

opened by wagon-master 3

BUG: Time and memory inefficient concating in pandas on every case.

In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

opened by laqua-stack 2
Feature/add inference mlflow
Major changes:

fixed training with autologging of training parameters, preprocessor and classifier in MLFlow

webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation

webapp: moved all steps into subpages

webapp: added Getting started in the landing page

Fixes:

webapp: fixed extraction params discarding Feature Names selected from Feature Classes
opened by pwoznicki 1

example_WORC.ipynb not being up to date with the repository

Describe the bug

In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

Steps/Code to Reproduce

import pandas as pd
from pathlib import Path
from autorad.external.download_WORC import download_WORCDatabase

# Set where we will save our data and results
base_dir = Path.cwd() / "autorad_tutorial"
data_dir = base_dir / "data"
result_dir = base_dir / "results"
data_dir.mkdir(exist_ok=True, parents=True)
result_dir.mkdir(exist_ok=True, parents=True)

%load_ext autoreload
%autoreload 2

download data (it may take a few minutes)
download_WORCDatabase(
dataset="Desmoid",
data_folder=data_dir,
n_subjects=100,
)



from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1

# create a table with all the paths
paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
paths_df.sample(5)


from autorad.data.dataset import ImageDataset
from autorad.feature_extraction.extractor import FeatureExtractor
import logging

logging.getLogger().setLevel(logging.CRITICAL)

image_dataset = ImageDataset(
    paths_df,
    ID_colname="ID",
    root_dir=data_dir,
)

# Let's take a look at the data, plotting random 10 cases
image_dataset.plot_examples(n=10, window=None)

extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
feature_df = extractor.run()

Expected Results

1: Importing the function get_paths_with_separate_folder_per_case

2: Using default_MR.yaml as value for extraction_params

Actual Results

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
      3 # create a table with all the paths
      4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)

ModuleNotFoundError: No module named 'autorad.data.utils'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
      2 feature_df = extractor.run()

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
     39 self.dataset = dataset
     40 self.feature_set = feature_set
---> 41 self.extraction_params = self._get_extraction_param_path(
     42     extraction_params
     43 )
     44 log.info(f"Using extraction params from {self.extraction_params}")
     45 self.n_jobs = set_n_jobs(n_jobs)

File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
     53     result = default_extraction_param_dir / extraction_params
     54 else:
---> 55     raise ValueError(
     56         f"Extraction parameter file {extraction_params} not found."
     57     )
     58 return result

ValueError: Extraction parameter file default_MR.yaml not found.

Fix

1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

opened by wagon-master 1

Bugfix/refactor
New features:

log feature dataset and splits in MLFlow

update docs & add getting-started

Fixes:

fix evaluation in the web app

fix docs build in readthedocss
opened by pwoznicki 0
Support various readers (Nibabel, ITK)

Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.
enhancement

opened by pwoznicki 0

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Includes fixes for the web application, fixed bugs in spatial util functions, and function for voxel-based extraction
Source code(tar.gz)
Source code(zip)

The easiest tool for extracting radiomics features and training ML models on them.

Related tags

Overview

Simple pipeline for experimenting with radiomics features

Installation

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

Create a dataset from the features table

Select classifiers to compare

Create an evaluator to train and evaluate selected classifiers

Comments

Preprocessing features fails during machine learning

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

BUG: Time and memory inefficient concating in pandas on every case.

Feature/add inference mlflow

example_WORC.ipynb not being up to date with the repository

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Fix

Bugfix/refactor

Support various readers (Nibabel, ITK)

Releases(v0.2.2)

v0.2.2(Jul 30, 2022)

Owner

Piotr Woźnicki

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

discovering subdomains, hidden paths, extracting unique links

A simple algorithm for extracting tree height in sparse scene from point cloud data.

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Code for pre-training CharacterBERT models (as well as BERT models).

Using image super resolution models with vapoursynth and speeding them up with TensorRT

Using VapourSynth with super resolution models and speeding them up with TensorRT.

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training