fMRIprep Pipeline To Machine Learning

Overview

fMRIprep Pipeline To Machine Learning(Demo)

所有配置均在config.py文件下定义

前置环境(lilab)

  • 各个节点均安装docker,并有fmripre的镜像
  • 可以使用conda中的base环境(相应的第三份包之后更新)

1. fmriprep script on single machine(docker)

config.py中的fMRI_Prep_Job类中配置相应变量,注意在修改cmd时,不能修改{}中的关键字。在执行此步骤时,将自动在bids同级目录下建立processed文件夹,用来存放后处理数据。其中处理后的fmriprep数据存放在processed/frmriprepprceossed/fressurfer中。

class fMRI_Prep_Job:
    # input data path
    bids_data_path  = "/share/data2/dataset/ds002748/depression"
    # 一个容器中处理多少个被试 
    step = 8
    # fmriprep opm thread
    thread = 9
    # max work contianers
    max_work_nums = 10

    # 在bids同级目录下创建processed文件夹
    bids_output_path = os.path.join("/".join(bids_data_path.split('/')[:-1]),'processed')
    if not os.path.exists(bids_output_path):
        os.mkdir(bids_output_path)
    # fmri work path 
    fmri_work="/share/fmri_work"
    # freesurfer_license
    freesurfer_license = "/share/user_data/public/fanq_ocd/license.txt"
    # contianer id fmriprep
    contianer_id = "d7235efbbd3c"
    # fmriprep cmd 
    cmd ="docker run -it --rm -v {bids_data_path}:/data -v {freesurfer_license}:/opt/freesurfer/license.txt -v {bids_output_path}:/out -v {fmri_work}:/work {contianer_id} /data /out --skip_bids_validation --ignore slicetiming fieldmaps  -w /work --omp-nthreads {thread} --fs-no-reconall --resource-monitor participant --participant-label {subject_ids}"

2. fmriprep post preocess

这一步的操作主要依赖于fmribrant,主要作用是回归掉白质信号、脑脊液信号、全脑信号、头动信息、并进行滤波(可选),将其处理后的文件放存在prcoessed/post-precoss/ fliter/clean_imgs 中, 可选表示是否进行滤波。该配置中不建议修改dataset_path,store_path

class PostProcess:
    """
    fmriprep 后处理数据
    """
    # 类型的名字
    task_type = "rest"

    dataset_path = os.path.join(fMRI_Prep_Job.bids_output_path,'fmriprep')

    store_path = os.path.join(fMRI_Prep_Job.bids_output_path,'post-process')

    t_r = 2.5

    low_pass = 0.08

    high_pass = 0.01

    n_process = 40

    if t_r != None:
        store_path = os.path.join(store_path,'filter','clean_imgs')
    else:
        store_path = os.path.join(store_path,'unfilter','clean_imgs')

    os.makedirs(store_path,exist_ok=True)

3.获取ROI级别的时间序列

atlas由271个roi组成,分别是Schaefer_200(皮上),Tianye_54(皮下),Buckner_17(小脑)。由于在fmribrant中实现提取时间序列的功能,简单封装一下。

class RoiTs:
    """
    ROI 级别时间序列
    处理271个全脑roi
    """
    n_process = 40

    # 如果在第二步fmri post process已经滤波之后,不建议再次使用滤波操作
    t_r = None
    
    low_pass = None

    high_pass = None
    
    flag_gs = False #  回归全脑均值为 True 否则为False
    # 以下内容不建议修改

    if flag_gs:
        file_name = "*with_gs.nii.gz"
        ts_file = "GS"
    else:
        file_name = "*without_gs.nii.gz"
        ts_file = "NO_GS"
    
    reg_path = os.path.join(PostProcess.store_path,"*",PostProcess.task_type,file_name)
    
    subject_id_index = -3

    save_path = os.path.join("/".join(PostProcess.store_path.split('/')[:-1]),'timeseries',ts_file)

    os.makedirs(save_path,exist_ok=True)

4. Machine Learning(Baseline)

这一步是可选的,一般先用来看看FC做性别分类、年龄回归的效果如何。只保留粗略结果,详细结果可以使用baseline这个包。

class ML:
    # 选择的subject id 默认是全部
    sub_ids = [i.split('.')[0] for i in os.listdir(RoiTs.save_path)]
    # 量表位置
    csv = pd.read_csv('/share/data2/dataset/ds002748/depression/participants.tsv',sep='\t')
    #取交集
    csv = pd.DataFrame({"participant_id":sub_ids}).merge(csv)
    # 分类的任务
    classifies = ["gender"]
    # 回归的任务
    regressions = ["age"]
    # 分类模型
    classify_models = [SVC(),SVC(C=100),SVC(kernel='linear'),SVC(kernel='linear',C=100)]
    # 回归模型
    regress_models = [SVR(),SVR(C=100),SVR(kernel='linear'),SVR(kernel='linear',C=100)]
    kfold = 3
    # 多少个roi
    rois = 200

5. run

修改script/run.py

from fmriprep_job import run_fmri_prep
from fmriprep_pprocess import  run as pp_run
from roi2ts import run as roi_ts_run
from fast_fc_ml import run as ml_run


if __name__ =='__main__':
    run_fmri_prep() # fmriprep
    pp_run() # fmriprep post process
    roi_ts_run() # get roi time series
    ml_run() # machine learning

然后执行

python run.py

6. To Do

  • 质量控制
You might also like...
MLOps pipeline project using Amazon SageMaker Pipelines
MLOps pipeline project using Amazon SageMaker Pipelines

This project shows steps to build an end to end MLOps architecture that covers data prep, model training, realtime and batch inference, build model registry, track lineage of artifacts and model drift detection. It utilizes SageMaker Pipelines that offers machine learning (ML) to orchestrate SageMaker jobs and author reproducible ML pipelines.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A comprehensive repository containing 30+ notebooks on learning machine learning!
A comprehensive repository containing 30+ notebooks on learning machine learning!

A comprehensive repository containing 30+ notebooks on learning machine learning!

MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

cuML - RAPIDS Machine Learning Library
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

Owner
Alien
A student
Alien
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 9, 2022
Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared

Feature-Engineering Required for a machine learning pipeline data preprocessing and variable engineering script needs to be prepared. When the dataset

kemalgunay 5 Apr 21, 2022
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

Pinar Oner 7 Dec 18, 2021
Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Amplo 10 May 15, 2022
This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform.

Zillow-Houses This repository contains full machine learning pipeline of the Zillow Houses competition on Kaggle platform. Pipeline is consists of 10

null 2 Jan 9, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 9, 2022