DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Overview

DockStream

alt text

Description

DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution and post hoc analysis can be automated via the benchmarking and analysis workflow. The flexilibity to specifiy a large variety of docking configurations allows tailored protocols for diverse end applications. DockStream can also parallelize docking across CPU cores, increasing throughput. DockStream is integrated with the de novo design platform, REINVENT, allowing one to incorporate docking into the generative process, thus providing the agent with 3D structural information.

Supported Backends

Ligand Embedders

Docking Backends

Note: The CCDC package, the OpenEye toolkit and Schrodinger's tools require you to obtain the respective software from those vendors.

Tutorials and Usage

Detailed Jupyter Notebook tutorials for all DockStream functionalities and workflows are provided in DockStreamCommunity. The DockStream repository here contains input JSON templates located in examples. The templates are organized as follows:

  • target_preparation: Preparing targets for docking
  • ligand_preparation: Generating 3D coordinates for ligands
  • docking: Docking ligands
  • integration: Combining different ligand embedders and docking backends into a single input JSON to run successively

Requirements

Two Conda environments are provided: DockStream via environment.yml and DockStreamFull via environment_full.yml. DockStream suffices for all use cases except when CCDC GOLD software is used, in which case DockStreamFull is required.

git clone <DockStream repository>
cd <DockStream directory>
conda env create -f environment.yml
conda activate DockStream

Enable use of OpenEye software (from REINVENT README)

You will need to set the environmental variable OE_LICENSE to activate the oechem license. One way to do this and keep it conda environment specific is: On the command-line, first:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Then edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh
export OE_LICENSE='/opt/scp/software/oelicense/1.0/oe_license.seq1'

and finally, edit ./etc/conda/deactivate.d/env_vars.sh :

#!/bin/sh
unset OE_LICENSE

Unit Tests

After cloning the DockStream repository, enable licenses, if applicable (OpenEye, CCDC, Schrodinger). Then execute the following:

python unit_tests.py

Contributors

Christian Margreitter ([email protected]) Jeff Guo ([email protected]) Alexey Voronov ([email protected])

Comments
  • Glide dockings using local machine

    Glide dockings using local machine

    Hi, I am trying to play with DockStream using Schrodinger. I am wondering if there is the possibility to use it in the local machine specifying $SCHRODINGER/glide instead of the tokens procedure.

    opened by Oulfin 6
  • Bug in Glide backend parallelization

    Bug in Glide backend parallelization

    First, thanks for contributing this nice toolbox.

    This is to report a bug in the following module:

    https://github.com/MolecularAI/DockStream/blob/7bdfd4a67f5c938e3222db59387e5a95e8a59e56/dockstream/core/Schrodinger/Glide_docker.py#L404

    while loop is used to process all sublists in batches. However, the number of processed sublists as recorded in jobs_submitted could be off because this variable is the cumulative sum of len(tmp_output_dirs), which could be smaller than len(cur_slice_sublists) if any of the sublists has no valid molecules to write out.

    The bug may cause some of the sublists get processed repeatedly, and in extreme cases may result in an infinite loop.

    I didn't check if any other backend uses similar logic to parallelize the run and may suffer from the same problem.

    opened by hshany 3
  • Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

    I'm trying to work out whether it's possible to put an sdf file of prepared ligands straight into a Glide run? i.e. not specifying an input_pool to the docking_runs list? (especially when using docker.py)

    opened by reskyner 2
  • Raise LigandPreparationFailed error

    Raise LigandPreparationFailed error

    For OpenEye Hybrid, it reported LigandPreparationFailed errors for both CORINA and OMEGA backend. One example is shown below: `File "/DockStream/dockstream/core/OpenEyeHybrid/Omega_ligand_preparator.py", line 66, in init raise LigandPreparationFailed("Cannot initialize OMEGA backend - abort.") dockstream.utils.dockstream_exceptions.LigandPreparationFailed: Cannot initialize OMEGA backend - abort.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/DockStream/docker.py", line 132, in raise LigandPreparationFailed dockstream.utils.dockstream_exceptions.LigandPreparationFailed`

    Could you please help me with this problem? I tried both the provided receptor-ligand data files from DockStreamCommunity and my own dataset. Both reported same LigandPreparation error. Thank you in advance!

    opened by fangffRS 1
  • ADV 1.2.0 support

    ADV 1.2.0 support

    For DockStream to work with the new AutoDock-Vina 1.2.0 (https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203), the "log-file" specification has to go:

    https://github.com/MolecularAI/DockStream/blob/efefbe52d3cecb8b6d1b72ab719aad1e4702833b/dockstream/core/AutodockVina/AutodockVina_docker.py#L275

    Should be backwards-compatible.

    opened by CMargreitter 1
  • Input file of the function

    Input file of the function "parse_maestro"

    First of all, thank you for your wonderful work in drug development area using AI. I am using Glide to get the result through DockStream. I think the the function parse_maestro in Glide_docker.py can be used to extract setting for docking(In DockStream, this setting is written json file). Is this right? If so, could you tell me the input file type for the parse_mastro?! (eg. maegz, mae, sdf, etc.) I tried the function with maegz (output from Glide docking), but I couldn't get the result. I want to use parse_maestro function to reproduce the setting which applied to previous docking simulation. I would be very grateful if you could give the answer to me. Thanks!

    opened by SejeongPark8354 0
  • Openbabel integration failed

    Openbabel integration failed

    I am trying to implement Dockstream with the vina backend, an exception is raised with openBabel executable.

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 130, in prep = AutodockVinaTargetPreparator(conf=config, target=input_pdb_path, run_number=run_number) File "C:\Users\Y-8874903-E.ESTUDIANT\OneDrive - URV\Escritorio\PLIP interaction\DockStream\dockstream\core\AutodockVina\AutodockVina_target_preparator.py", line 56, in init raise TargetPreparationFailed("Cannot initialize OpenBabel external library, which should be part of the environment - abort.") dockstream.utils.dockstream_exceptions.TargetPreparationFailed: Cannot initialize OpenBabel external library, which should be part of the environment - abort.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "DockStream/target_preparator.py", line 139, in raise TargetPreparationFailed() from e dockstream.utils.dockstream_exceptions.TargetPreparationFailed

    Follow all necessary steps mentioned in docs.

    opened by Crispae 1
  • Parallelization of ADV for docking

    Parallelization of ADV for docking

    Hello,

    I am trying to run first docking experiments together with reinvent. I am observing many ADV jobs getting started with -cpu 1 (hardcoded), but a few (1 or 2) take quite long and leave all other CPUs idle until the batch has finished and a new batch has started.

    This leaves quite some capacity of a e.g. 16-core machine unused - at least that is my impression when observing the run via top or ps. In the dockstream.config, parallelization.number_cores is set to 16.

    Are there better practical settings to better exploit larger machines with 16-64 CPUs ?

    Lars

    opened by LarsAC 3
  • No module named 'ccdc'

    No module named 'ccdc'

    I believe I successfully installed the normal (not Full) DockStream package as per your instructions on the github site, and then tried to run the unit test, but this fails with a complaint regarding the ccdc module missing (see below). But I want to use Glide so wouldn’t need (nor have) ccdc. I am doing this on Ubuntu 18.04.

    Dockstream/python ./unit_tests.py Traceback (most recent call last): File "./unit_tests.py", line 10, in from tests.Gold import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/init.py", line 1, in from tests.Gold.test_Gold_target_preparation import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/test_Gold_target_preparation.py", line 11, in from dockstream.core.Gold.Gold_target_preparator import GoldTargetPreparator File "/media/data/evehom/Projects/CompChem/DockStream/dockstream/core/Gold/Gold_target_preparator.py", line 3, in import ccdc ModuleNotFoundError: No module named 'ccdc'

    opened by Evert-Homan 4
Releases(v1.0.0)
Owner
AstraZeneca - Molecular AI
Software from the Molecular AI department at AstraZeneca R&D
AstraZeneca - Molecular AI
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

MOSES 656 Dec 29, 2022
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

Facebook Research 69 Dec 29, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 4, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Few-Shot Graph Learning for Molecular Property Prediction

Few-shot Graph Learning for Molecular Property Prediction Introduction This is the source code and dataset for the following paper: Few-shot Graph Lea

Zhichun Guo 94 Dec 12, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

MilaGraph 117 Dec 9, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

International Business Machines 71 Nov 15, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
Molecular AutoEncoder in PyTorch

MolEncoder Molecular AutoEncoder in PyTorch Install $ git clone https://github.com/cxhernandez/molencoder.git && cd molencoder $ python setup.py insta

Carlos Hernández 80 Dec 5, 2022
Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

null 4 Jan 7, 2022
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

null 55 Dec 19, 2022
Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

null 14 Nov 6, 2022