A pythonic interface to high-throughput virtual screening software

Overview

codecov CI Documentation Status PyPI version

pyscreener

A pythonic interface to high-throughput virtual screening software

Overview

This repository contains the source of pyscreener, both a library and software for conducting HTVS via python calls

Table of Contents

Installation

General requirements

  • python >= 3.7
  • numpy, openbabel, openmm, pdbfixer, ray, rdkit, scikit-learn, scipy, and tqdm
  • all corresponding software downloaded and located on your PATH or under the path of a specific environment variable (see external software for more details.)

environment setup with conda

  1. (if necessary) install conda
  2. conda create -n NAME python=3.8 pip openbabel openmm rdkit
  3. conda activate NAME
  4. pip install pyscreener (or if installing from source, pip install .)
  5. pip install git+https://github.com/openmm/pdbfixer.git
  6. follow the corresponding directions below for the intended software

Before running pyscreener, be sure to first activate the environment: conda activate pyscreener (or whatever you've named your environment)

external software

To test whether your environment is setup correctly with respect to pathing and environment variables, run pyscreener-check --screen-type and --metadata-template values, like so:

pyscreener-check SCREEN_TYPE METADATA_TEMPLATE

If the checks pass, then your environment is set up correctly.

  • vina-type software

    1. install ADFR Suite and add prepare_receptor to your PATH. If this step was successful, the command which prepare_receptor should output path/to/prepare_receptor. This can be done via either:

      1. adding the entire bin directory to your path (you should see a command at the end of the installation process) or

      2. adding only prepare_receptor in the bin directory to your PATH as detailed below

    2. install any of the following docking software: vina 1.1.2 (note: pyscreener does not work with vina 1.2), qvina2, smina, psovina and ensure the desired software executable is in a folder that is located on your path

  • DOCK6

    1. obtain a license for DOCK6
    2. install DOCK6 from the download link and follow the installation directions
    3. after ensuring the installation was installed properly, specify the DOCK6 environment variable as the path of the DOCK6 parent directory as detailed below. This is the directory that was unzipped from the tarball and is usually named dock6. It is the folder that contains the bin, install, etc. subdirectories.)
    4. install sphgen_cpp. On linux systems, this can be done:
      1. wget http://dock.compbio.ucsf.edu/Contributed_Code/code/sphgen_cpp.1.2.tar.gz
      2. tar -xzvf sphgen_cpp.1.2.tar.gz
      3. cd sphgen_cpp.1.2
      4. make
    5. place the sphgen_cpp executable (it should be sphgen_cpp) inside the bin subdirectory of the DOCK6 parent directory. If you've configured the environment variable already, (on linux) you can run: mv sphgen_cpp $DOCK6/bin
    6. install chimera and place the file on your PATH as detailed below

adding an executable to your PATH

To add an executable to your PATH, you have three options:

  1. create a symbolic link to the executable inside a directory that is already on your path: ln -s FILE -t DIR. Typically, ~/bin or ~/.local/bin are good target directories (i.e., DIR). To see what directories are currently on your path, type echo $PATH. There will typically be a lot of directories on your path, and it is best to avoid creating files in any directory above your home directory ($HOME on most *nix-based systems)
  2. copy the software to a directory that is already on your path. Similar, though less preferred than the above: cp FILE DIR
  3. append the directory containing the file to your PATH: export PATH=$PATH:DIR, where DIR is the directory containing the file in question. As your PATH must be configured each time run pyscreener, this command should also be placed inside your ~/.bashrc or ~/.bash_profile (if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.

specifying an environment variable

To set the DOCK6 environment variable, run the following command: export DOCK6=path/to/dock6, where path/to/dock6 is the full path of the DOCK6 parent directory mentioned above. As this this environment variable must always be set before running pyscreener, the command should be placed inside your ~/.bashrc or ~/.bash_profile (if using a bash shell) to avoid needing to run the command every time you log in. Note: if using a non-bash shell, the specific file will be different.

Ray Setup

pyscreener uses ray as its parallel backend. If you plan to parallelize the software only across your local machine, don't need to do anything . However, if you wish to either (a.) limit the number of cores pyscreener will be run over or (b.) run it over a distributed setup (e.g., an HPC with many distinct nodes), you must manually start a ray cluster before running pyscreener.

Limiting the number of cores

To do this, simply type ray start --head --num-cpus N before starting pyscreener (where N is the total number of cores you wish to allow pyscreener to utilize). Not performing this step will give pyscreener access to all of the cores on your local machine, potentially slowing down other applications.

Distributing across many nodes

While the precise instructions for this will vary with HPC cluster architecture, the general idea is to establish a ray cluster between the nodes allocated to your job. We have provided a sample SLURM submission script (run_pyscreener_distributed_example.batch) to achieve this, but you may have to alter some commands depending on your system. For more information on this see here. To allow pyscreener to connect to your ray cluster, you must set the ip_head and redis_password environment variables appropriately, where ip_head is the address of the head of your ray cluster, i.e., IP:PORT where IP is the IP address of the head node and PORT is the port that is running ray.

pyscreener writes a lot of intermediate input and output files (due to the inherent specifications of the underlying docking software.) Given that the primary endpoint of pyscreener is a list of ligands and associated scores (rather than the specific binding poses,) these files are written to each node's temporary directory (determined by tempfile.gettempdir()) and discarded at the end. If you wish to collect these files, pass the --collect-all flag in the program arguments or run the collect_files() method of your VirtualScreen object when your screen is complete.

Note: the VirtualScreen.collect_files() method is slow due to the need to send possibly a bunch of files over the network. This method should only be run once over the lifetime of a VirtualScreen object, as several intermediate calls will yield the same result as a single, final call.

Note: tempfile.gettempdir() returns a path that depends the values of specific environment variables (see here). It is possible that the value returned on your system is not actually a valid path for you! In this case you will likely get file permissions errors and must ask your system administrator where this value should point to and set your environment variables accordingly before running pyscreener!

Running pyscreener as a software

!!please read the entire section before running pyscreener!!

pyscreener was designed to have a minimal interface under the principal that a high-throughput virtual screen is intended to be a broad strokes technique to gauge ligand favorability. With that in mind, all one really needs to get going are the following:

  • the type of screen (screen-type) you would like to run: vina or dock for Vina-type or DOCK6 screens, respectively
  • the PDB id(s) of your receptor(s) of interest or PDB file(s) of the specific structure(s)
  • a file containing the ligands you would like to dock, in SDF, SMI, or CSV format
  • the coordinates of your docking box (center + size) or a PDB format file containing the coordinates of a previously bound ligand
  • a metadata template containing screen-specific options in a JSON-format string. See the metadata section below for more details.
  • the number of CPUs you would like to parallellize each docking simulation over. This is 1 by default, but Vina-type software can leverage multiple CPUs for faster docking. A generally good value for this is between 2 and 8 depending on your compute setup. If you're docking molecule-by-molecule, e.g., reinforcement learning, then you will likely want this to be as many CPUs as are on your machine.

There are a variety of other options you can specify as well (including how to score a ligand given that multiple scored conformations are output, how many times to repeatedly dock a given ligand, etc.) To see all of these options and what they do, use the following command: psycreener --help

All of these options may be specified on the command line or in a configuration file that accepts YAML, INI, and argparse syntaxes. Example configuration files are located in integration-tests/configs. Assuming everything is working and installed properly, you can run any of these files via the following command: pyscreener --config integration-tests/configs/

Metadata Templates

Vina-type and DOCK6 docking simulations have a number of options unique to their preparation and simulation pipeline, and these options are termed simulation "metadata" in pyscreener. At present, only a few of these options are supported for both families of docking software, but future updates will add support for more of these options. These options may be specified via a JSON struct to the --metadata-template argument. Below is a list of the supported options for both types of docking screen (default options provided in parentheses next to the parameter)

  • Vina-type

    • software (="vina"): which Vina-type docking software you would like to use. Currently supported values: "vina", "qvina", "smina", and "psovina"
    • extra (=""): all the extra command line options to pass to a Vina-type docking software. E.g. for a run of Smina, extra="--force_cap ARG" or for PSOVina, extra="-w ARG"
  • DOCK6

    • probe_radius (=1.4): the size of the probe to use for calculating the molecular surface (see here for more details)
    • steric_clash_dist (=0.0): prevent the generation of large spheres with close surface contacts with larger values
    • min_radius (=1.4): the minimum radius of sphere to use for sphere generation
    • max_radius (=4.0): the maximum "..."
    • sphere_mode (="box"): the method by which to select spheres for docking box construction. Accepted values: "largest", select the largest cluster of spheres; "box", select all spheres within a predefined docking box; "ligand", use the coordinates of a previously docked/bound ligand to select spheres
    • docked_ligand_file (=""): a MOL2 file containing the coordinates of a previously docked/bound ligand
    • buffer (=10.0): the amount of extra space (in Angstroms) to be added around the ligand when selecting spheres
    • enclose_spheres (=True): whether to construct the docking box by enclosing all of the selected spheres or use only spheres within a predefined docking box

Using pyscreener as a library

The object model of pyscreener relies on four classes:

  • CalculationData: a simple object containing the broadstrokes specifications of a docking calculation common to all types of docking calculations (e.g., Vina, DOCK6, etc.): the SMILES string, the target receptor, the center/size of a docking box, the metadata, and the result.
  • CalculationMetadata: a nondescript object that contains software-specific fields. For example, a Vina-type calculation requires a software parameter, whereas a DOCK6 calculation requires a number of different parameters for receptor preparation. Most importantly, the metadata will always contain two fields of abstract type: prepared_ligand and prepared_receptor.
  • DockingRunner: a static object that takes defines an interface to prepare and run docking calculations. Each calculation type defines its own DockingRunner implementation.
  • DockingVirtualScreen: an object that organizes a virtual screen. At a high level, a virtual is a series of docking calculations with some template set of parameters performed for a collection of molecules and distributed over some set of computational resources. A DockingVirtualScreen takes as arguments a DockingRunner, a list of receptors (for possible ensemble docking) and a set of template values for a CalculationData template. It defines a __call__() method that takes an unzipped list of SMILES strings, builds the CalculationData objects for each molecule, and submits these objects for preparation and calculation to various resources in the ray cluster (see ray setup).

To perform docking calls inside your python code using pyscreener, you must first initialize a DockingVirtualScreen object either through the factory pyscreener.virtual_screen function or manually initializing one. The following section will show an example of how to perform computational from inside a python interpreter.

Example

the following code snippet will dock benzene (SMILES string "c1ccccc1") against the D4 dopamine receptor (PDB ID 5WIU) using a predefined docking box and Autodock Vina

>> virtual_screen = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], (-18.2, 14.4, -16.1), (15.4, 13.9, 14.5), metadata, ncpu=8) {...} >>> scores = virtual_screen("c1ccccc1") >>> scores array([-4.4])">
>>> import ray
>>> ray.init()
[...]
>>> import pyscreener as ps
>>> metadata = ps.build_metadata("vina")
>>> virtual_screen = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], (-18.2, 14.4, -16.1), (15.4, 13.9, 14.5), metadata, ncpu=8)
{...}
>>> scores = virtual_screen("c1ccccc1")
>>> scores
array([-4.4])

A few notes from the above example:

  • the input PDB file must be clean prior to use. You can alternatively pass in a PDB ID (e.g., pdbids=["5WIU"]) but you must know the coordinates of the docking box for the corresponding PDB file. This usually means downloading the PDB file and manually inspecting it for more reliable results, but it's there if you want it.
  • you can construct a docking from the coordinates of a previously bound ligand by providing these coordinates in a PDB file, e.g.
    vs = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], None, None, metadata, ncpu=8, docked_ligand_file="path/to/DOCKED_LIGAND.pdb")
  • ray handles task distribution in the backend of the library. You don't need to manually start it if you're just going to call ray.init() like we did above. This was only done to highlight the ability to initialize ray according to your own needs (i.e., a distributed setup).
  • to use an input file containing ligands, you must use the LigandSupply class and access the .ligands attribute, e.g.,
    supply = ps.LigandSupply(['integration-tests/inputs/ligands.csv'])
    virtual_screen(supply.ligands)

Testing

  1. pip install pytest
  2. pytest

Copyright

Copyright (c) 2021, david graff

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

Comments
  • Problem with ray (This function was not imported properly.)

    Problem with ray (This function was not imported properly.)

    Hello, After updating to the most recent version I got such an issue:

    image

    Can you give me a hand here? Is the user supposed to have a certain version of ray? Thank you in advance.

    Kind regards, Rafał Bachorz

    opened by rafalbachorz 13
  • Dockerfile

    Dockerfile

    Description

    This PR adds a Dockerfile which allows for easy installation of all python dependencies, the pyscreener library, ADFR suite tools and choice of vina based docking software to a contained docker image. This allows for reproducible and straightforward installation / testing / usage of the pyscreener code. All pytest checks pass as well as smoke-test for all vina versions.

    Example / Current workflow

    Creating a docker image containing the pyscreener library, its dependencies, ADFR suite tools and the vina docking software can be accomplished with docker build -t pyscreener:vina --target vina . however any of the vina base docking softwares can be specified instead.

    Bugfix / Desired workflow

    n/a

    Questions

    Installing qvina / psovina requires building from source which raises some pragma issues during compilation. If any light can be shed on solving these issues / clarifying if these issues then that would be invaluable feedback.

    Relevant issues

    n/a

    checklist

    No new tests added but all docker builds pass the associated python and docking smoke tests

    Status

    Not 100% ready as seeking feedback on compilation of qvina and psovina docking tools from souce

    opened by Bundaberg-Joey 6
  • handle errors during virtual screen?

    handle errors during virtual screen?

    i am using ps to do vs, (scores = vs(virtual_lib)),

    however, when vs meets some molecular that can't be processed correctly (e.g., rdkit can't generate conformer), the vs process will crash rather than pass/handle those molecular.

    I am wandering is there any error handle procedure in ps?

    thanks

    question 
    opened by likun1212 5
  • bug in pyscreener-1.2.0

    bug in pyscreener-1.2.0

    below is the emg, i have no idea what is it, any idea?

    #################################################################### Traceback (most recent call last): File "/work/home/aixplorerbio_wz/ylk/molpal-main/run.py", line 71, in main() File "/work/home/aixplorerbio_wz/ylk/molpal-main/run.py", line 51, in main explorer = Explorer(path, **params) File "/work/home/aixplorerbio_wz/ylk/molpal-main/molpal/explorer.py", line 148, in init self.objective = objectives.objective(**kwargs) File "/work/home/aixplorerbio_wz/ylk/molpal-main/molpal/objectives/init.py", line 9, in objective return DockingObjective(objective_config, **kwargs) File "/work/home/aixplorerbio_wz/ylk/molpal-main/molpal/objectives/docking.py", line 51, in init self.virtual_screen = ps.virtual_screen( File "/work/home/aixplorerbio_wz/miniconda3/envs/molpal/lib/python3.8/site-packages/pyscreener/docking/init.py", line 88, in virtual_screen return DockingVirtualScreen(get_runner(software), *args, **kwargs) TypeError: init() takes from 6 to 16 positional arguments but 18 were given

    bug 
    opened by likun1212 4
  • [JOSS review] Installation instructions

    [JOSS review] Installation instructions

    Companion of openjournals/joss-reviews/issues/3950

    The install instructions are not very clear to me, but after following it step-by-step I tried to check it with:

    $ pyscreener-check SCREEN_TYPE METADATA_TEMPLATE
    

    Which seem like a nice feature, but its not clear what are the SCREEN_TYPE and METADATA_TEMPLATE and also there is no usage:

    $ pyscreener-check -h    
    Traceback (most recent call last):
      File "/Users/rodrigo/software/anaconda3/envs/pyscreener_env/bin/pyscreener-check", line 8, in <module>
        sys.exit(check())
      File "/Users/rodrigo/repos/pyscreener/pyscreener/main.py", line 13, in check
        ps.check_env(sys.argv[1], json.loads(sys.argv[2]))
    IndexError: list index out of range
    

    Could you please clarify this step?

    An additional note is that to install the packages you need to first add the conda-forge channel with $ conda config --append channels conda-forge

    opened by rvhonorato 4
  • Question about running error

    Question about running error

    Dear Authors,

    I installed pyscreener in conda env.

    When I run the command "python run.py --config test_configs/test_vina.ini", I got the error message like:

    ERROR: failed to convert "testing_inputs/5WIU.pdb" Traceback (most recent call last): File "", line 1, in File "/home/njgoo/Data1/program/pyscreener/pyscreener/docking/init.py", line 8, in screener return Vina(software=software, **kwargs) File "/home/njgoo/Data1/program/pyscreener/pyscreener/docking/vina.py", line 102, in init super().init(receptors=receptors, pdbids=pdbids, File "/home/njgoo/Data1/program/pyscreener/pyscreener/docking/base.py", line 102, in init self.receptors = receptors File "/home/njgoo/Data1/program/pyscreener/pyscreener/docking/base.py", line 203, in receptors raise RuntimeError('Preparation failed for all receptors!') RuntimeError: Preparation failed for all receptors!

    Could you provide information for solving the error? I also added external software path to PATH in bach_profile.

    Thank you!

    opened by rebirthjin 4
  • running time

    running time

    Dear authors,

    Thanks for your great project! However, when I use pyscreener, the running time for each molecule is 30 seconds in average. Do you have any suggestions for accelerating the process?

    opened by futianfan 4
  • potential bugs in run_pyscreener_distributed_example.batch

    potential bugs in run_pyscreener_distributed_example.batch

    Hi

    I think there are 2 potential bugs in run_pyscreener_distributed_example.batch( run_molpal.batch as well) I found with this script ray can not leverage all the resources on a cluster node.

    1. in line 7 "#SBATCH --ntasks-per-node 4". I think "--ntasks-per-node" should always = 1. quote: "this will be used to guarantee that each Ray worker runtime will obtain the proper resources" see https://docs.ray.io/en/latest/cluster/slurm.html.

    2. in line 31 and 41, you start ray cluster with " --num-cpus $SLURM_CPUS_ON_NODE ". However, this can only let ray use part of cpus in a node. for instance, say you ask 1 node and set "-c = 4" and "--ntasks-per-node 4", ray can only use 4*4=16 cpus eventhough you have 32 cpus in the node. ($SLURM_CPUS_ON_NODE will return 16 instead 32).

    suggested config: ######################################################################

    #!/bin/bash
    
    #SBATCH --partition=???
    #SBATCH --job-name=???
    #SBATCH -o %x_%j.out
    #SBATCH -e %x_%j.err
    
    ### This script works for any number of nodes, Ray will find and manage all resources
    #SBATCH --nodes=10
    #SBATCH --exclusive
    ### Give all resources to a single Ray task, ray can manage the resources internally
    #SBATCH --ntasks-per-node=1
    
    # Load modules or your own conda environment here
    # module load pytorch/v1.4.0-gpu
    # conda activate ${CONDA_ENV}
    source activate pyscreener
    
    # ===== DO NOT CHANGE THINGS HERE UNLESS YOU KNOW WHAT YOU ARE DOING =====
    # This script is a modification to the implementation suggest by gregSchwartz18 here:
    # https://github.com/ray-project/ray/issues/826#issuecomment-522116599
    redis_password=$(uuidgen)
    export redis_password
    
    nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST") # Getting the node names
    nodes_array=($nodes)
    
    node_1=${nodes_array[0]}
    ip=$(srun --nodes=1 --ntasks=1 -w "$node_1" hostname --ip-address) # making redis-address
    
    # if we detect a space character in the head node IP, we'll
    # convert it to an ipv4 address. This step is optional.
    if [[ "$ip" == *" "* ]]; then
      IFS=' ' read -ra ADDR <<< "$ip"
      if [[ ${#ADDR[0]} -gt 16 ]]; then
        ip=${ADDR[1]}
      else
        ip=${ADDR[0]}
      fi
      echo "IPV6 address detected. We split the IPV4 address as $ip"
    fi
    
    port=6379
    ip_head=$ip:$port
    export ip_head
    echo "IP Head: $ip_head"
    
    echo "STARTING HEAD at $node_1"
    srun --nodes=1 --ntasks=1 -w "$node_1" \
      ray start --head --node-ip-address="$ip" --port=$port --redis-password="$redis_password" --block &
    sleep 30
    
    worker_num=$((SLURM_JOB_NUM_NODES - 1)) #number of nodes other than the head node
    for ((i = 1; i <= worker_num; i++)); do
      node_i=${nodes_array[$i]}
      echo "STARTING WORKER $i at $node_i"
      srun --nodes=1 --ntasks=1 -w "$node_i" ray start --address "$ip_head" --redis-password="$redis_password" --block &
      sleep 5
    done
    
    # ===== Call your code below =====
    pyscreener --config vina_dock.ini --ncpu 2
    

    ########################################

    bug 
    opened by likun1212 3
  • [JOSS review] Installation Instruction Simplification

    [JOSS review] Installation Instruction Simplification

    I was able to follow the instructions and run the integration tests :tada: The details on installing different docking backends was :ok_hand:
    I do think the installation instructions could be simplified by installing everything from conda-forge:

    conda create -n NAME -c conda-forge python=3.8 pip openbabel openmm rdkit pdbfixer

    Then the extra step to install pdbfixer from source isn't necessary.

    This is a secondary (but related) question:

    Are you interested in getting this package on conda-forge? Then end users would only need to (in addtion to installing the backend of thier choice) do:

    conda install -c conda-forge pyscreener

    cc https://github.com/openjournals/joss-reviews/issues/3950

    opened by mikemhenry 3
  • [JOSS review] Criptic error running example

    [JOSS review] Criptic error running example

    When running the provided example:

    # example/benzene_5wiu.py
    import pyscreener as ps
    import ray
    ray.init()
    
    metadata = ps.build_metadata("vina")
    virtual_screen = ps.virtual_screen("vina", ["integration-tests/inputs/5WIU.pdb"], (-18.2, 14.4, -16.1), (15.4, 13.9, 14.5), metadata, ncpu=8)
    
    scores = virtual_screen("c1ccccc1")
    scores
    
    $ python example/benzene_5wiu.py 
    2022-01-12 14:20:06,884 INFO services.py:1338 -- View the Ray dashboard at http://127.0.0.1:8265
    Traceback (most recent call last):
      File "example/benzene_5wiu.py", line 9, in <module>
        scores = virtual_screen("c1ccccc1")
      File "/Users/rodrigo/repos/pyscreener/pyscreener/docking/screen.py", line 169, in __call__
        planned_simulationsss = self.plan(sources, smiles)
      File "/Users/rodrigo/repos/pyscreener/pyscreener/docking/screen.py", line 243, in plan
        planned_simulationsss = [
      File "/Users/rodrigo/repos/pyscreener/pyscreener/docking/screen.py", line 244, in <listcomp>
        [
      File "/Users/rodrigo/repos/pyscreener/pyscreener/docking/screen.py", line 245, in <listcomp>
        [
      File "/Users/rodrigo/repos/pyscreener/pyscreener/docking/screen.py", line 246, in <listcomp>
        replace(
      File "/Users/rodrigo/software/anaconda3/envs/pyscreener_env/lib/python3.8/dataclasses.py", line 1264, in replace
        raise TypeError("replace() should be called on dataclass instances")
    TypeError: replace() should be called on dataclass instances
    (prepare_receptors pid=47498) Traceback (most recent call last):
    (prepare_receptors pid=47498)   File "/Users/rodrigo/ADFRsuite-1.0/CCSBpckgs/AutoDockTools/Utilities24/prepare_receptor4.py", line 9, in <module>
    (prepare_receptors pid=47498)     from MolKit import Read
    (prepare_receptors pid=47498) ImportError: No module named MolKit
    (prepare_receptors pid=47498) 
    (prepare_receptors pid=47498) ERROR: failed to convert "integration-tests/inputs/5WIU.pdb"
    
    opened by rvhonorato 3
  • json.decoder.JSONDecodeError

    json.decoder.JSONDecodeError

    After install pyscreener, I tried pyscreener-check SCREEN_TYPE METADATA_TEMPLATE command but encountered the following problems:

    Traceback (most recent call last): File "/data/yangziyi/anaconda3/envs/Relation/bin/pyscreener-check", line 8, in <module> sys.exit(check()) File "/data/yangziyi/anaconda3/envs/Relation/lib/python3.8/site-packages/pyscreener/main.py", line 13, in check ps.check_env(sys.argv[1], json.loads(sys.argv[2])) File "/data/yangziyi/anaconda3/envs/Relation/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/data/yangziyi/anaconda3/envs/Relation/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/data/yangziyi/anaconda3/envs/Relation/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    opened by yangziyi1990 2
  • [FEATURE]: retrieve files for given molecules

    [FEATURE]: retrieve files for given molecules

    Is your feature request related to a problem? Please describe. Some users have noted that they would like a method by which to programmatically retrieve the in/out files corresponding to a specific compound. They can currently do this themselves by calling VirtualScreen.collect_all() and manually mapping the names of each compound to the collected files and unzipping only those. This is cumbersome.

    Desired solution/workflow

    # vs: VirtualScreen
    # smis: List[str]
    y = vs(smis)
    best_cpd = smis[np.argmax(y)]
    pose_file = vs.get_poses(best_cpd)
    

    Thoughts

    • could return the poses as an array of shape p x n x 3, but that would require additional and some custom PDBQT parsing code (MDanalysis doesn't parse multiple models from a single pdbqt from my understanding)
    • logically a function of the VirtualScreen?
    • should support single or batch retrieval, batch-specific code to come later
    enhancement 
    opened by davidegraff 0
  • Data transfer

    Data transfer

    Description

    Provide a brief description of the PR's purpose here.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [ ] TODO 1

    Questions

    • [ ] Question1

    Status

    • [ ] Ready to go
    opened by mburlage 2
  • Joss paper

    Joss paper

    Description

    Provide a brief description of the PR's purpose here.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [ ] TODO 1

    Questions

    • [ ] Question1

    Status

    • [ ] Ready to go
    opened by davidegraff 1
Releases(v1.2.0)
  • v1.2.0(Mar 17, 2022)

    the main thrust of this release adds new support for DOCK6 receptor and DOCKing parameters. The remainder of the changes is mostly logical improvements in the codebase that will (unfortunately) break some client code. The CalculationData has been renamed to a Simulation and the CalculationRunner.prepare_and_run() methods now return the Result object rather than the full Simulation (as was done previously. This will speed up task distribution for virtual screening purposes but now loses the added information of the prepared_ligand and prepared_receptor filenames (which will be lost forever if utilizing the VirtualScreen interface.) We'll look back into adding this information back, but we don't think any users were actually taking advantage of this information.

    Additionally, a full VirtualScreen used to crash when handed an invalid molecule. Theoretically, users could always prevent this via pre-filtering what molecules are passed, but now the returned score array just handles them silently and indicates nan values in their place. Note that nan could also indicate that the molecule failed during simulation itself, which means it could be attempted again and possibly succeed (about 1% of molecules randomly fail.) Now, however a nan could mean one of those two things. If this distinction is important, you can do the following:

    import numpy as np
    from rdkit import Chem
    import pyscreener as ps
    
    # smis: Iterable[str]
    vs = ps.virtual_screen(...)
    s = vs(smis)
    failed_idxs = np.arange(len(s))[np.isnan(s)]
    invalid_mol_idxs = set(i for i in failed_idxs if Chem.MolFromSmiles(smis[i]) is None)
    failed_sim_idxs = set(failed_idxs) -inavlid_mol_idxs
    

    happy screening!

    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Mar 4, 2022)

    What's Changed

    • Iss/entry points hotfix by @davidegraff in https://github.com/coleygroup/pyscreener/pull/14
    • Update CI Badge by @mikemhenry in https://github.com/coleygroup/pyscreener/pull/27
    • Fix codecov badge by @mikemhenry in https://github.com/coleygroup/pyscreener/pull/28
    • Feat/versioning by @davidegraff in https://github.com/coleygroup/pyscreener/pull/30
    • Iss/linting by @davidegraff in https://github.com/coleygroup/pyscreener/pull/31
    • Feat/gh templates by @davidegraff in https://github.com/coleygroup/pyscreener/pull/32
    • add examples by @davidegraff in https://github.com/coleygroup/pyscreener/pull/33
    • fix coverage by @davidegraff in https://github.com/coleygroup/pyscreener/pull/34

    New Contributors

    • @mikemhenry made their first contribution in https://github.com/coleygroup/pyscreener/pull/27

    Full Changelog: https://github.com/coleygroup/pyscreener/compare/v1.1.0...v1.1.1

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Oct 22, 2021)

    A few minor improvements from the initial release:

    • *DockingRunner.prepare_from_smi() now use RDKit geometry optimization instead of OpenBabel (#12)
    • SMILES can now be supplied to a LigandSupply to encapsulate all of your molecules in one object. Previously, SMILES strings would have to be passed to a VirtualScreen separate from LigandSupply.ligands, but now you can pass in your SMILES strings to LigandSupply.__init__() via the smis keyword argument. NOTE: this change breaks backwards compatibility for positional args (#12)
    • the addition of entry point scripts. Pyscreener can now be invoked directly on the command line via pyscreener, rather than having to clone this repo and run python run.py. There is also an additional pyscreener-check script to see if your environment is configured properly for your desired screen type and input metadata (#13). You can still use the --smoke-test arg, but pyscreener-check uses positional arguments on the command line rather than flags.

    What's Changed

    • Feature/smiles input lig supply by @davidegraff in https://github.com/coleygroup/pyscreener/pull/12
    • Feature/entry point scripts by @davidegraff in https://github.com/coleygroup/pyscreener/pull/13

    Full Changelog: https://github.com/coleygroup/pyscreener/compare/v1.0.0...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Oct 20, 2021)

    We're happy to release the first official version of pyscreener!

    Pyscreener was initially developed as a small research tool for active learning, but a lot of development has taken place over the last few months to completely revamp the package into a more fully featured and maintainable python package for pythonic docking/simulation calls! The high-level usage of pyscreener remains the same (in that the core functionality defines an object which maps a SMILES string to a scalar result from a simulation,) but the entire object model underlying the package has now changed to fundamentally separate the data (i.e., simulation parameters/results) from objects that actually conduct simulations (e.g., Runners). While some features of v1.0 are not fully completed, these are more for bookkeeping, like testing and a robust CI.

    No major feature updates are planned for the foreseeable future, so we hope you find this useful as-is!

    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Oct 20, 2021)

    We never formally "released" the alpha version of pyscreener that was open-sourced at the same time as MolPAL. While we believe the completely refactored version 1 is far superior to this original alpha version, we are adding a release tag for this version for the sake of posterity so that those who have developed using the old version of the codebase can still access it.

    Source code(tar.gz)
    Source code(zip)
Owner
null
Fish shell tool for managing Python virtual environments

VirtualFish VirtualFish is a Python virtual environment manager for the Fish shell. You can get started by reading the documentation. (It’s quite shor

Justin Mayer 968 Dec 24, 2022
A fast and easy python virtual environment creator for linux with some pre-installed libraries.

python-venv-creator A fast and easy python virtual environment created for linux with some optional pre-installed libraries. Dependencies: The followi

null 2 Apr 19, 2022
Ontario-Covid-Screening - An automated Covid-19 School Screening Tool for Ontario

Ontario-Covid19-Screening An automated Covid-19 School Screening Tool for Ontari

Rayan K 0 Feb 20, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

null 356 Dec 23, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing

null 356 Dec 23, 2022
Tool for running a high throughput data ingestion/transformation workload with MongoDB

Mongo Mangler The mongo-mangler tool is a lightweight Python utility, which you can run from a low-powered machine to execute a high throughput data i

Paul Done 9 Jan 2, 2023
A web-based app that allows easy, simple - and if desired high-throughput - analysis of qPCR data

qpcr-Analyser A web-based GUI for the qpcr package that allows easy, simple and high-throughput analysis of qPCR data. As is described in more detail

null 1 Sep 13, 2022
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild community 87 Dec 27, 2022
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening Introduction This is an implementation of the model used for breast

null 757 Dec 30, 2022
A central task in drug discovery is searching, screening, and organizing large chemical databases

A central task in drug discovery is searching, screening, and organizing large chemical databases. Here, we implement clustering on molecular similarity. We support multiple methods to provide a interactive exploration of chemical space.

NVIDIA Corporation 124 Jan 7, 2023
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

Background Remover The Need It's been good long while since Microsoft first released a Teams version for Linux and yet, one of Teams' coolest features

Dylan Turner 80 Dec 20, 2022
AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

AI Virtual Calculator: This is a simple virtual calculator that works with gestures using OpenCV. We will use our hand in the air to click on the calc

Md. Rakibul Islam 1 Jan 13, 2022
A new kind of Progress Bar, with real time throughput, eta and very cool animations!

alive-progress :) A new kind of Progress Bar, with real-time throughput, eta and very cool animations! Ever found yourself in a remote ssh session, do

Rogério Sampaio de Almeida 4k Dec 30, 2022
A new kind of Progress Bar, with real time throughput, eta and very cool animations!

A new kind of Progress Bar, with real time throughput, eta and very cool animations!

Rogério Sampaio de Almeida 4.1k Jan 8, 2023
Benchmark a WebSocket server's message throughput ⌛

?? WebSocket Benchmarker ⌚ Message throughput is how fast a WebSocket server can parse and respond to a message. Some people consider this to be a goo

Andrew Healey 24 Nov 17, 2022
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

null 2.1k Dec 30, 2022
A Pythonic interface for Google Mail

GMail for Python A Pythonic interface to Google's GMail, with all the tools you'll need. Search, read and send multipart emails, archive, mark as read

Charlie Guo 1.7k Dec 29, 2022
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

null 2.1k Dec 30, 2022