ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Overview

ml4h

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more. The diverse data modalities of biomedicine offer different perspectives on the underlying challenge of understanding human health. For this reason, ml4h is built on a foundation of multimodal multitask modeling, hoping to leverage all available data to help power research and inform clinical care. Our tools help apply clinical research standards to ML models by carefully considering bias and longitudinal outcomes. Our project grew out of efforts at the Broad Institute to make it easy to work with the UK Biobank on the Google Cloud Platform and has since expanded to include proprietary data from academic medical centers. To put cutting-edge AI and ML to use making the world healthier, we're fostering interdisciplinary collaborations across industry and academia. We'd love to work with you too!

ml4h is best described with Five Verbs: Ingest, Tensorize, TensorMap, Model, Evaluate

  • Ingest: collect files onto one system
  • Tensorize: write raw files (XML, DICOM, NIFTI, PNG) into HD5 files
  • TensorMap: tag data (typically from an HD5) with an interpretation and a method for generation
  • ModelFactory: connect TensorMaps with a trainable architectures
  • Evaluate: generate plots that enable domain-driven inspection of models and results

Getting Started

Advanced Topics:

  • Tensorizing Data (going from raw data to arrays suitable for modeling, in ml4h/tensorize/README.md, TENSORIZE.md )

Setting up your local environment

Clone the repo

git clone [email protected]:broadinstitute/ml.git

Setting up your cloud environment (optional; currently only GCP is supported)

Make sure you have installed the Google Cloud SDK (gcloud). With Homebrew, you can use

brew cask install google-cloud-sdk

gcloud config set project your-gcp-project

Conda (Python package manager)

  • Download onto your laptop the Miniconda bash or .pkg installer for Python 3.7 and Mac OS X from here, and run it. If you installed Python via a package manager such as Homebrew, you may want to uninstall that first, to avoid potential conflicts.

  • On your laptop, at the root directory of your ml4h GitHub clone, load the ml4h environment via

    conda env create -f env/ml4h_osx64.yml
    

    If you get an error, try updating your Conda via

    sudo conda update -n base -c defaults conda
    

    If you have get an error while installing gmpy, try installing gmp:

    brew install gmp
    

    The version used at the time of this writing was 4.6.1.

    If you plan to run jupyter locally, you should also (after you have conda activate ml4h, run pip install ~/ml (or wherever you have stored the repo)

  • Activate the environment:

    source activate ml4h
    

You may now run code on your Terminal, like so

python recipes.py --mode ...

Note that recipes require having the right input files in place and running them without proper inputs will not yield meaningful results.

PyCharm (Python IDE if interested)

  • Install PyCharm either directly from here, or download the Toolbox App and have the app install PyCharm. The latter makes PyCharm upgrades easier. It also allows you to manage your JetBrains IDEs from a single place if you have multiple (e.g. IntelliJ for Java/Scala).
  • Launch PyCharm.
  • (Optional) Import the custom settings as described here.
  • Open the project on PyCharm from the File menu by pointing to where you have your GitHub repo.
  • Next, configure your Python interpreter to use the Conda environment you set up previously:
    • Open Preferences from PyCharm -> Preferences....
    • On the upcoming Preferences window's left-hand side, expand Project: ml4h if it isn't already.
    • Highlight Project Interpreter.
    • On the right-hand side of the window, where it says Project Interpreter, find and select your python binary installed by Conda. It should be a path like ~/conda/miniconda3/envs/ml4h/bin/python where conda is the directory you may have selected when installing Conda.
    • For a test run:
      • Open recipes.py (shortcut Shift+Cmd+N if you imported the custom settings).
      • Right-click on if __name__=='__main__' and select Run recipes.
      • You can specify input arguments by expanding the Parameters text box on the window that can be opened using the menu Run -> Edit Configurations....

Setting up a remote VM

To create a VM without a GPU run:

./scripts/vm_launch/launch_instance.sh ${USER}-cpu

With GPU (not recommended unless you need something beefy and expensive)

./scripts/vm_launch/launch_dl_instance.sh ${USER}-gpu

This will take a few moments to run, after which you will have a VM in the cloud. Remember to shut it off from the command line or console when you are not using it!

Now ssh onto your instance (replace with proper machine name, note that you can also use regular old ssh if you have the external IP provided by the script or if you login from the GCP console)

gcloud --project your-gcp-project compute ssh ${USER}-gpu --zone us-central1-a

Next, clone this repo on your instance (you should copy your github key over to the VM, and/or if you have Two-Factor authentication setup you need to generate an SSH key on your VM and add it to your github settings as described here):

git clone [email protected]:broadinstitute/ml.git

Because we don't know everyone's username, you need to run one more script to make sure that you are added as a docker user and that you have permission to pull down our docker instances from GCP's gcr.io. Run this while you're logged into your VM:

./ml/scripts/vm_launch/run_once.sh

Note that you may see warnings like below, but these are expected:

WARNING: Unable to execute `docker version`: exit status 1
This is expected if `docker` is not installed, or if `dockerd` cannot be reached...
Configuring docker-credential-gcr as a registry-specific credential helper. This is only supported by Docker client versions 1.13+
/home/username/.docker/config.json configured to use this credential helper for GCR registries

You need to log out after that (exit) then ssh back in so everything takes effect.

Finish setting up docker, test out a jupyter notebook

Now let's run a Jupyter notebook. On your VM run:

${HOME}/ml/scripts/jupyter.sh -p 8889

Add a -c if you want a CPU version.

This will start a notebook server on your VM. If you a Docker error like

docker: Error response from daemon: driver failed programming external connectivity on endpoint agitated_joliot (1fa914cb1fe9530f6599092c655b7036c2f9c5b362aa0438711cb2c405f3f354): Bind for 0.0.0.0:8888 failed: port is already allocated.

overwrite the default port (8888) like so

${HOME}/ml/scripts/dl-jupyter.sh 8889

The command also outputs two command lines in red. Copy the line that looks like this:

ssh -i ~/.ssh/google_compute_engine -nNT -L 8888:localhost:8888 

Open a terminal on your local machine and paste that command.

If you get a public key error run: gcloud compute config-ssh

Now open a browser on your laptop and go to the URL http://localhost:8888

Contributing code

Want to contribute code to this project? Please see CONTRIBUTING for developer setup and other details.

Command line interface

The ml4h package is designed to be accessable through the command line using "recipes". To get started, please see RECIPE_EXAMPLES.

Comments
  • cross reference multiple time windows

    cross reference multiple time windows

    resolves #283

    Functionality Changes

    • Multiple time windows for cross referencing
      • set minimum number of ECGs to use in all time window
      • filter to exactly the number in time window and specify which ECGs to use in a time series
      • name windows for readability
      • output dump of all ECGs with indicator for which time window
    • Multilabel counts
      • report count/fraction of all label combinations
    • TODO:
      • had to remove plots for multiple time windows to reduce number of files created and readability. figure out how to put these back or if theyre even needed

    Examples:

    Get the most recent pre-op ECG and all their outcomes (https://github.com/broadinstitute/ml/pull/307#issuecomment-638260303)

    ./scripts/tf.sh -c -t \
        $HOME/repos/ml/ml4cvd/recipes.py \
        --logging_level INFO \
        --mode cross_reference \
        --tensors_source /data/partners_ecg/mgh/explore/tensors_all_union.csv \
        --tensors_name ecg \
        --join_tensors partners_ecg_patientid_clean \
        --time_tensor partners_ecg_datetime \
        --reference_tensors /data/sts-data/mgh-all-features-labels.csv \
        --reference_name sts \
        --reference_join_tensors medrecn \
        --reference_label mtopd cnstrokp crenfail cpvntlng deepsterninf reop anymorbidity llos \
        \
        --reference_start_time_tensor surgdt -30 \
        --reference_end_time_tensor   surgdt \
        --order_in_window             newest \
        --window_name                 preop \
        \
        --output_folder $HOME/recipes_output/xref/ \
        --id sts-hd5-multilabel
    

    Get results for patients who have paired ECGs (at least 1 ECG pre-op and at least 1 ECG post-op) (https://github.com/broadinstitute/ml/pull/307#issuecomment-638263210)

    ./scripts/tf.sh -c -t \
        $HOME/repos/ml/ml4cvd/recipes.py \
        --logging_level INFO \
        --mode cross_reference \
        --tensors_source /data/partners_ecg/mgh/explore/tensors_all_union.csv \
        --tensors_name ecg \
        --join_tensors partners_ecg_patientid_clean \
        --time_tensor partners_ecg_datetime \
        --reference_tensors /data/sts-data/mgh-afib-after-avr-metadata.csv \
        --reference_name sts-afib-after-avr \
        --reference_join_tensors medrecn \
        --number_per_window            1 \
        \
        --reference_start_time_tensor surgdt -180 \
        --reference_end_time_tensor   surgdt \
        --window_name                 preop \
        \
        --reference_start_time_tensor surgdt \
        --reference_end_time_tensor   surgdt 180 \
        --window_name                 postop \
        \
        --output_folder $HOME/recipes_output/xref/ \
        --id test-xref-min-any
    
    enhancement 
    opened by StevenSong 21
  • initial refactoring of tensor_from_file

    initial refactoring of tensor_from_file

    This is an incomplete pull request to start a conversation regarding the refactoring of the TMAP dictionary into separate submodules (issue #143 ). For example, TMAPS['ecg_rest_raw'] is now defined as ml4cvd.tensormap.ukbb.ecg.ecg.ecg_rest_raw. This introduces more meaningful semantics and imports libraries only required for the target application. In this initial commit, only TMAP definitions and their associated functions in tensor_from_file are covered.

    The current proposed structure looks like this:

    .
    ├── partners
    └── ukbb
        ├── __init__.py
        ├── accelerometer
        ├── demographics
        │   └── demographics.py
        ├── ecg
        │   └── ecg.py
        ├── general.py
        ├── genetics
        ├── mri
        │   ├── mri.py
        │   └── vtk.py
        └── survival.py
    
    opened by mklarqvist 13
  • generated output is owned by ${USER} instead of root

    generated output is owned by ${USER} instead of root

    1. scripts/tf.sh runs as ${USER} instead of root by default. This is accomplished by setting up the correct user and group in bash in docker, then calling the Python script as ${USER}.

      The end result is that output from calling ml4cvd is now owned by ${USER} instead of root!

      To preserve prior behavior, the user can toggle between ${USER} and root using the -r flag, although I am unaware of situations where this is desired.

    2. The jupyter directory is now created by calling tf.sh with the -j flag, instead of created by default.

    enhancement 
    opened by erikr 10
  • tensorize Partners ECG XMLs → HD5, and relevant documentation

    tensorize Partners ECG XMLs → HD5, and relevant documentation

    This PR adds scripts for working with Partners ECG XMLs, and tensorizing.

    Also adds documentation for ECG extraction from Muse Editor.

    It does not have any dependencies on ML4CVD, and will not interfere with any existing code.

    Resolves #142

    enhancement 
    opened by erikr 9
  • Improve clarity of logfile contents

    Improve clarity of logfile contents

    What

    1. Display training, validation, and test set size at the end of the log file for train mode (and potentially other modes).

    2. Clearly portray how many epochs are actually completed (due to patience).

    Why It is helpful to know the number of tensors used for training, validation, and test, as well as the label count within each set.

    Label count makes sense for categorical. Less clear how we best handle this for regression models.

    It is also important to know when early stopping occurred.

    Currently this information is not consolidated in one place in the log file. It also is spread out over workers.

    How Aggregate over workers.

    Acceptance Criteria After running recipes with train mode, the number of tensors used for training, validation, and test sets, as well label counts in each set, and the number of epochs actually run before early stopping, are summarized at the end of the log file.

    enhancement 
    opened by erikr 7
  • model factory refactor

    model factory refactor

    • parents and u_connect work
    • No variational mode yet
    • model tests are much more thorough. They are very slow though, because tf is slow at loading
    • Looks like it adds a bunch of lines, but many are to temporarily preserve make_variational..., which will soon be deprecated

    Here is a model with u_connect, multiple inputs, multiple outputs, and parented outputs image

    opened by ndiamant 7
  • quick debasing -> ignoring partners tmaps initially

    quick debasing -> ignoring partners tmaps initially

    [Debasing of #224] This is an incomplete pull request to start a conversation regarding the refactoring of the TMAP dictionary into separate submodules (issue #143 ). For example, TMAPS['ecg_rest_raw'] is now defined as ml4cvd.tensormap.ukb.ecg.ecg_rest_raw. This introduces more meaningful semantics and imports libraries only required for the target application. In this initial commit, only TMAP definitions and their associated functions in tensor_from_file are covered.

    The current proposed structure looks like this:

    ml4cvd/tensormap
    ├── general.py
    ├── partners
    └── ukb
        ├── demographics.py
        ├── ecg.py
        ├── genetics.py
        ├── mri.py
        ├── mri_vtk.py
        ├── scripted.py
        └── survival.py
    

    Some quick sanity tests

    import ml4cvd.arguments
    ml4cvd.arguments.tensormap_lookup("ukb.genetics.genetic_pca_2")
    
    >>> import ml4cvd.arguments
    2020-07-13 20:55:03.373053: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2020-07-13 20:55:03.397221: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc6e34f63e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-07-13 20:55:03.397240: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    >>> ml4cvd.arguments.tensormap_lookup("ukb.genetics.genetic_pca_2")
    TensorMap(22009_Genetic-principal-components_0_2, (1,), continuous)
    

    and

    import ml4cvd.tensormap.ukb.mri
    import ml4cvd.arguments
    assert(ml4cvd.tensormap.ukb.mri.filtered_phase == ml4cvd.arguments.tensormap_lookup("ukb.mri.filtered_phase"))
    
    >>> import ml4cvd.tensormap.ukb.mri
    2020-07-13 20:57:19.394531: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2020-07-13 20:57:19.409760: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ff3fb1eb800 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-07-13 20:57:19.409774: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    >>> import ml4cvd.arguments
    >>> assert(ml4cvd.tensormap.ukb.mri.filtered_phase == ml4cvd.arguments.tensormap_lookup("ukb.mri.filtered_phase"))
    

    In this updated PR, I haven't yet run any other more comprehensive tests. Partners-specific TensorMaps are simply commented out to facilitate expediency.

    opened by mklarqvist 6
  • synchronize with signals and fix tensors presented

    synchronize with signals and fix tensors presented

    resolves #323 resolves #346

    this PR aims to use more reliable and portable python functions (no longer using qsize), to fix the number of tensors presented (i + 1) % ... and to fix 2 possible concurrency edge cases:

    1. consumer dequeues more items than originally produced at time of decision to consume. current code only blocks producers when qsize == num_workers and consumer consumes until the queue is empty (see appendix for current code). this opens the door for this scenario:
    let's say workers = 4
    1. 4 workers put things into queue and block while the queue is full
    2. consumer begins to dequeue and pops 1 item off the queue
    3. workers begin putting more items into the queue because queue is no longer "full"
    4. consumer pops from queue until queue is empty
    at this point, consumer has consumed more than the 4 items it originally wanted.
    

    the fix is to have producers wait until the consumer gives the "all clear" to begin producing again. the consumer waits for all the producers to have enqueued, processes all the items, and then gives the all clear. concurrently, the producers are waiting to produce the next item.

    1. the other edge case is for a single producer to enqueue more than once before the consumer dequeues. this example in the current code can happen:
    let's say workers = 2
    1. worker 0 enqueues an item
    2. worker 1 is lazy
    3. worker 0 enqueues another item
    4. consumer detects 2 items in queue, start to dequeue
    worker 0 has just reported stats for the same set of paths twice, as each worker gets a distinct set of paths
    

    the fix is again to have the producers wait until the "all clear" is given before enqueuing the next set of items.

    Appendix:

    current consumer code: https://github.com/broadinstitute/ml/blob/dd13b4518b3547ebdcad13698f0c71a4abaaafb8/ml4cvd/tensor_generators.py#L171-L186

    current producer code: https://github.com/broadinstitute/ml/blob/dd13b4518b3547ebdcad13698f0c71a4abaaafb8/ml4cvd/tensor_generators.py#L426-L437

    opened by StevenSong 6
  • STS: reproduce earlier AUC = 0.85 predicting mortality from pre-op ECG

    STS: reproduce earlier AUC = 0.85 predicting mortality from pre-op ECG

    Goal is to learn post-operative mortality for cardiac surgical patients at MGH from the most recent pre-operative ECG.

    Recent results are significantly worse than our initial results: https://github.com/broadinstitute/ml/issues/202#issuecomment-610528628

    ROC for mortality: ECGs resampled to 2500 samples

    ./scripts/tf.sh \
        /$HOME/ml/ml4cvd/recipes.py \
        --mode train \
        --logging_level INFO \
        --tensors /data/partners_ecg/mgh/hd5 \
        --input_tensors \
            ecg_2500_sts \
        --output_tensors \
            sts_death \
        --sample_csv /data/sts/mgh-all-features-labels.csv \
        --inspect_model \
        --optimizer adam \
        --epochs 100 \
        --batch_size 32 \
        --learning_rate 0.0002 \
        --conv_x 71 \
        --output_folder $HOME \
        --id sts-ecg-death-all-resampled-to-2500
    

    per_class_roc_sts_death-2500-downsampled metric_history_sts-ecg-death-all-downsampled-to-2500

    ROC for mortality: ECGs resampled to 5000 samples

    ./scripts/tf.sh \
        /$HOME/ml/ml4cvd/recipes.py \
        --mode train \
        --logging_level INFO \
        --tensors /data/partners_ecg/mgh/hd5 \
        --input_tensors \
            ecg_5000_sts \
        --output_tensors \
            sts_death \
        --sample_csv /data/sts/mgh-all-features-labels.csv \
        --inspect_model \
        --optimizer adam \
        --epochs 100 \
        --batch_size 32 \
        --learning_rate 0.0002 \
        --conv_x 71 \
        --output_folder $HOME \
        --id sts-ecg-death-all-resampled-to-5000
    

    per_class_roc_sts_death metric_history_sts-ecg-death-all-resampled-to-5000 However, we do not know some vital information:

    1. label prevalence in 2500- vs 5000-sample ECGs (would be addressed by issue https://github.com/broadinstitute/ml/issues/270)
    2. label prevalence of any given experiment; technically info is in the log file aggregated stats but requires some digging (would be addressed by issue https://github.com/broadinstitute/ml/issues/266).
    data 
    opened by erikr 6
  • cross reference mode

    cross reference mode

    Create a new mode that performs cross referencing on two datasets and saves summary information, distribution of labels, and plots the occurrence of cross referenced data points relative to some time window

    specify the dataset to cross reference with tensors arguments specify the dataset to use as a reference with reference arguments

    closes #158 closes #186 closes #188 closes #200

    enhancement 
    opened by StevenSong 6
  • update partners tmaps to use new hd5 structure

    update partners tmaps to use new hd5 structure

    resolves #208 resolves #199 resolves #160

    • default behavior gets all tensors from a given hd5 - tensors are returned in a python list (easier to distinguish multiples tensors from single tensors since tensors are returned in numpy arrays)

    • implemented wrapper to enable getting tensors for existing tensor_from_file functions

    • add path prefix to all partners tmaps

    • update explore (_tensor_to_df) to use new tmaps

    • update cardiac surgery partners tmaps

    • add arguments to control how many tensors to get

    enhancement 
    opened by StevenSong 6
  • Bump certifi from 2020.12.5 to 2022.12.7 in /model_zoo/PCLR

    Bump certifi from 2020.12.5 to 2022.12.7 in /model_zoo/PCLR

    Bumps certifi from 2020.12.5 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Add work-around for cuda issue on app.terra.bio.

    Add work-around for cuda issue on app.terra.bio.

    Also add a note about the Facets visualization not working on app.terra.bio.

    @amygdala I already made these changes to https://app.terra.bio/#workspaces/uk-biobank-sek/ml4h-toolkit-for-machine-learning-on-clinical-data but let me know if you have any better ideas about this cuda issue.

    opened by deflaux 2
  • Channels in mri_silhouettes

    Channels in mri_silhouettes

    The model expects 3 channels but you provide 1 channel.

    input = tf.keras.Input(
        shape=(256, 237, 1),
        name="input_mdrk_silhouette_continuous",
    )
    

    https://github.com/broadinstitute/ml4h/blob/master/model_zoo/silhouette_mri/train_models.py#L261

    How do you account for the other 2 channels? @mklarqvist

    opened by Pulkit-Khandelwal 0
  • Bump tensorflow from 2.7.2 to 2.9.3 in /model_zoo/PCLR

    Bump tensorflow from 2.7.2 to 2.9.3 in /model_zoo/PCLR

    Bumps tensorflow from 2.7.2 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump tensorflow from 2.9.1 to 2.9.3 in /docker/vm_boot_images/config

    Bump tensorflow from 2.9.1 to 2.9.3 in /docker/vm_boot_images/config

    Bumps tensorflow from 2.9.1 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump protobuf from 3.15.7 to 3.18.3 in /model_zoo/PCLR

    Bump protobuf from 3.15.7 to 3.18.3 in /model_zoo/PCLR

    Bumps protobuf from 3.15.7 to 3.18.3.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.18.3

    C++

    Protocol Buffers v3.16.1

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.2

    Java

    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.18.1

    Python

    • Update setup.py to reflect that we now require at least Python 3.5 (#8989)
    • Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

    Ruby

    • Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

    Protocol Buffers v3.18.0

    C++

    • Fix warnings raised by clang 11 (#8664)
    • Make StringPiece constructible from std::string_view (#8707)
    • Add missing capability attributes for LLVM 12 (#8714)
    • Stop using std::iterator (deprecated in C++17). (#8741)
    • Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)
    • Fix #7047 Safely handle setlocale (#8735)
    • Remove deprecated version of SetTotalBytesLimit() (#8794)
    • Support arena allocation of google::protobuf::AnyMetadata (#8758)
    • Fix undefined symbol error around SharedCtor() (#8827)
    • Fix default value of enum(int) in json_util with proto2 (#8835)
    • Better Smaller ByteSizeLong
    • Introduce event filters for inject_field_listener_events
    • Reduce memory usage of DescriptorPool
    • For lazy fields copy serialized form when allowed.
    • Re-introduce the InlinedStringField class
    • v2 access listener
    • Reduce padding in the proto's ExtensionRegistry map.
    • GetExtension performance optimizations
    • Make tracker a static variable rather than call static functions
    • Support extensions in field access listener
    • Annotate MergeFrom for field access listener
    • Fix incomplete types for field access listener
    • Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.
    • Reduce binary size due to fieldless proto messages
    • TextFormat: ParseInfoTree supports getting field end location in addition to start.

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Broad Institute
Broad Institute of MIT and Harvard
Broad Institute
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 8, 2023
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 2, 2023
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Jan 5, 2023
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

null 2.3k Dec 29, 2022
Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable.

SDK: Overview of the Kubeflow pipelines service Kubeflow is a machine learning (ML) toolkit that is dedicated to making deployments of ML workflows on

Kubeflow 3.1k Jan 6, 2023
A Powerful Serverless Analysis Toolkit That Takes Trial And Error Out of Machine Learning Projects

KXY: A Seemless API to 10x The Productivity of Machine Learning Engineers Documentation https://www.kxy.ai/reference/ Installation From PyPi: pip inst

KXY Technologies, Inc. 35 Jan 2, 2023
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

Machine Learning Tooling 2.8k Jan 2, 2023
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 7, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 3, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbit 8.1k Dec 30, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 3, 2022
Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

David Kundih 3 Oct 19, 2022
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

[Due to the time taken @ uni, work + hell breaking loose in my life, since things have calmed down a bit, will continue commiting!!!] [By the way, I'm

Daniel Han-Chen 1.4k Jan 1, 2023
Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

Microsoft 43.4k Jan 4, 2023
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Facebook Research 4.1k Dec 29, 2022