The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments

Overview

GeneDisco ICLR-22 Challenge Starter Repository

Python version Library version

The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments.

GeneDisco (to be published at ICLR-22) is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.

Install

pip install -r requirements.txt

Use

Setup

  • Create a cache directory. This will hold any preprocessed and downloaded datasets for faster future invocation.
    • $ mkdir /path/to/genedisco_cache
    • Replace the above with your desired cache directory location.
  • Create an output directory. This will hold all program outputs and results.
    • $ mkdir /path/to/genedisco_output
    • Replace the above with your desired output directory location.

How to Run the Full Benchmark Suite?

Experiments (all baselines, acquisition functions, input and target datasets, multiple seeds) included in GeneDisco can be executed sequentially for e.g. acquired batch size 64, 8 cycles and a bayesian_mlp model using:

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --max_num_jobs=1

Results are written to the folder at /path/to/genedisco_cache, and processed datasets will be cached at /path/to/genedisco_cache (please replace both with your desired paths) for faster startup in future invocations.

Note that due to the number of experiments being run by the above command, we recommend execution on a compute cluster.
The GeneDisco codebase also supports execution on slurm compute clusters (the slurm command must be available on the executing node) using the following command and using dependencies in a Python virtualenv available at /path/to/your/virtualenv (please replace with your own virtualenv path):

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --schedule_on_slurm \
  --schedule_children_on_slurm \
  --remote_execution_virtualenv_path=/path/to/your/virtualenv

Other scheduling systems are currently not supported by default.

How to Run A Single Isolated Experiment (One Learning Cycle)?

To run one active learning loop cycle, for example, with the "topuncertain" acquisition function, the "achilles" feature set and the "schmidt_2021_ifng" task, execute the following command:

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="topuncertain" \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

How to Evaluate a Custom Acquisition Function?

To run a custom acquisition function, set --acquisition_function_name="custom" and --acquisition_function_path to the file path that contains your custom acquisition function (e.g. main.py in this repo).

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="custom" \
    --acquisition_function_path=/path/to/src/main.py \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

...where "/path/to/custom_acquisition_function.py" contains code for your custom acquisition function corresponding to the BaseBatchAcquisitionFunction interface, e.g.:

import numpy as np
from typing import AnyStr, List
from slingpy import AbstractDataSource
from slingpy.models.abstract_base_model import AbstractBaseModel
from genedisco.active_learning_methods.acquisition_functions.base_acquisition_function import \
    BaseBatchAcquisitionFunction

class RandomBatchAcquisitionFunction(BaseBatchAcquisitionFunction):
    def __call__(self,
                 dataset_x: AbstractDataSource,
                 batch_size: int,
                 available_indices: List[AnyStr], 
                 last_selected_indices: List[AnyStr] = None, 
                 model: AbstractBaseModel = None,
                 temperature: float = 0.9,
                 ) -> List:
        selected = np.random.choice(available_indices, size=batch_size, replace=False)
        return selected

Note that the last class implementing BaseBatchAcquisitionFunction is loaded by GeneDisco if there are multiple valid acquisition functions present in the loaded file.

Submission instructions

For submission, you will need two things:

Please note that all your submitted code must either be loaded via a dependency in requirements.txt or be present in the src/ directory in this starter repository for the submission to succeed.

Once you have set up your submission environment, you will need to create a lightweight container image that contains your acquisition function.

Submission steps

  • Navigate to the directory to which you have cloned this repo to.
    • $ cd /path/to/genedisco-starter
  • Ensure you have ONE acquisition function (inheriting from BaseBatchAcquisitionFunction) in main.py
    • This is your pre-defined program entry point.
  • Build your container image
    • $ docker build -t submission:latest .
  • Save your image name to a shell variable
    • $ IMAGE="submission:latest"
  • Use the EvalAI-CLI command to submit your image
    • Run the following command to submit your container image:
      • $ evalai push $IMAGE --phase gsk-genedisco-challenge-1528
      • Please note that you have a maximum number of submissions that any submission will be counted against.

That’s it! Our pipeline will take your image and test your function.

If you have any questions or concerns, please reach out to us at [email protected]

Citation

Please consider citing, if you reference or use our methodology, code or results in your work:

@inproceedings{mehrjou2022genedisco,
    title={{GeneDisco: A Benchmark for Experimental Design in Drug Discovery}},
    author={Mehrjou, Arash and Soleymani, Ashkan and Jesson, Andrew and Notin, Pascal and Gal, Yarin and Bauer, Stefan and Schwab, Patrick},
    booktitle={{International Conference on Learning Representations (ICLR)}},
    year={2022}
}

License

License

Authors

Arash Mehrjou, GlaxoSmithKline plc
Jacob A. Sackett-Sanders, GlaxoSmithKline plc
Patrick Schwab, GlaxoSmithKline plc

Acknowledgements

PS, JSS and AM are employees and shareholders of GlaxoSmithKline plc.

You might also like...
A template repository implementing HTML5 Boilerplate 8.0 in Sanic using the Domonic framework.

sanic-domonic-h5bp A template repository implementing HTML5 Boilerplate 8.0 in Sanic using the Domonic framework. If you need frontend interactivity,

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

Flask-Starter is a boilerplate starter template designed to help you quickstart your Flask web application development.
Flask-Starter is a boilerplate starter template designed to help you quickstart your Flask web application development.

Flask-Starter Flask-Starter is a boilerplate starter template designed to help you quickstart your Flask web application development. It has all the r

Contains an implementation (sklearn API) of the algorithm proposed in
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS GENetic DIscovery of Shapelets In the time series classification domain, shapelets are small subseries that are discriminative for a certain cl

Code for the paper
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

Contextualized Perturbation for Textual Adversarial Attack Introduction This is a PyTorch implementation of Contextualized Perturbation for Textual Ad

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)
Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

Starter kit for getting started in the Music Demixing Challenge.
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces
This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces

CODEFORCES DOWNLOADER This is a python based web scraping bot for windows to download all ACCEPTED submissions of any user on Codeforces Requirements

working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX
working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Music Demixing Challenge - xumx-sliCQ This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixi

Sky attention heatmap of submissions to astrometry.net
Sky attention heatmap of submissions to astrometry.net

astroheat Installation Requires Python 3.6+, Tested with Python 3.9.5 Install library dependencies pip install -r requirements.txt The program require

Stream comments, submissions from subreddits and users across reddit right in your terminal

reddit_from_terminal stream comments, submissions from subreddits and users across reddit right in your terminal Alert! : Can't watch media contents(p

A CLI application that downloads your AC submissions from OJ's like Atcoder,Codeforces,CodeChef and distil it into beautiful Submission HeatMap.
A CLI application that downloads your AC submissions from OJ's like Atcoder,Codeforces,CodeChef and distil it into beautiful Submission HeatMap.

Yoda A CLI that takes away the hassle of managing your submission files on different online-judges by automating the entire process of collecting and organizing your code submissions in one single Directory on your Machine also it distils User Submissions into beautiful Submission HeatMap.

System Design Assignments as part of Arpit's System Design Masterclass
System Design Assignments as part of Arpit's System Design Masterclass

System Design Assignments The repository contains a set of problem statements around Software Architecture and System Design as conducted by Arpit's S

An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.
An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.

Awesome AI for Art & Design An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to a

A whale detector design for the Kaggle whale-detector challenge!
A whale detector design for the Kaggle whale-detector challenge!

CNN (InceptionV1) + STFT based Whale Detection Algorithm So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The obje

Repository for tutorials, examples and starter scripts for using the MTU HPC cluster

MTU-HPC-Starter Repository for tutorials, examples and starter scripts for using the MTU HPC cluster Connecting to the MTU HPC cluster Within the coll

Owner
null
Django starter project with 🔋

A batteries-included Django starter project. For a production-ready version see the book Django for Professionals. ?? Features Django 3.1 & Python 3.8

William Vincent 1.5k Jan 8, 2023
Django project/application starter for lazybones :)

Django Project Starter Template My custom project starter for Django! I’ll try to support every upcoming Django releases as much as I can! Requirement

Uğur Özyılmazel 40 Jul 16, 2022
Django Webpack starter template for using Webpack 5 with Django 3.1 & Bootstrap 4. Yes, it can hot-reload.

Django Webpack Starter Hello fellow human. The repo uses Python 3.9.* Django 3.1.* Webpack 5.4.* Bootstrap 4.5.* Pipenv If you have any questions twe

Ganesh Khade 56 Nov 28, 2022
A Django starter template with a sound foundation.

SOS Django Template SOS Django Tempalate is a Django starter template that has opinionated and good solutions while starting your new Django project.

Eray Erdin 19 Oct 30, 2022
Launchr is an open source SaaS starter kit, based on Django.

Launchr Launchr is an open source SaaS starter kit. About Launchr is a fully-equipped starter template, ready to start a SaaS web app. It implements t

Jannis Gebauer 183 Jan 6, 2023
This is the starter for the Flask React project.

Flask React Project This is the starter for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

Jami Travers 5 May 25, 2022
The starter for the Flask React project

Flask React Project This is the starter for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

Parker Bolick 2 May 14, 2022
A python starter package to be used as a template for creating your own python packages.

Python Starter Package This is a basic python starter package to be used as a template for creating your own python packages. Github repo: https://git

Mystic 1 Apr 4, 2022
Starter project for python based lambda project.

Serverless Python Starter Starter project for python based lambda project. Features FastAPI - Frontend dev with Hot Reload API Gateway Integration (+r

null 4 Feb 22, 2022
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Joonhyung Lee/이준형 651 Dec 12, 2022