E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

Overview

Documentation

E2EDNA 2.0 - OpenMM Implementation of E2EDNA !

An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides.

Michael Kilgour, Tao Liu, Ilya S. Dementyev, Lena Simine

mjakilgour gmail com

For original version of E2EDNA: J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 (https://doi.org/10.1021/acs.jcim.1c00696) https://github.com/InfluenceFunctional/E2EDNA

Installation

This installation path has been tested on macOS and it relies on conda and pip package managers.

  1. Download the E2EDNA 2.0 package from this repository.
  2. Register and download NUPACK from http://www.nupack.org/downloads, you will need the path to ~/nupack-###/source/package directory
  3. In the E2EDNA2 directory please modify the macos_installation.sh script: update the path to nupack (see step 2)
  4. From E2EDNA2 folder run macos_installation.sh.
    • Caveat: in case conda activate e2edna command gives an error or if after the script finishes e2edna enviroment has not been activated, please either replace the activation command with with source activate /path-to-env/e2edna
    • OR alternatively copy paste commands from the script without modifications to command line and run one by one, this will go around the unconfigured shell issue.
  5. Register and download MMB from https://simtk.org/projects/rnatoolbox . Place the Installer### folder into the e2edna folder. NB: Do not specify DYLD_LIBRARY_PATH against the recommendations of the MMB installation guide. This is to avoid interference with the OpenMM module.
  6. Update 3 paths in main.py:
 params['workdir'] = '/path-to-e2edna/localruns'                         # working directory   
       
 params['mmb dir'] = '/path-to-e2edna/e2edna/Installer.###/lib'          # path to MMB dylib files
      
 params['mmb']     = '/path-to-e2edna/Installer.###/bin/MMB-executable'  # path to MMB executable    

Running a job

Quickstart

  • Set 'params' in main.py, as indicated in "Installation".
  • Run the bash script automate_tests.sh to test all 8 modes automatically.
  • Alternatively, a single run can be carried out by run_num, mode, aptamer sequence, and ligand's structural file. For example,
python main.py --run_num=1 --mode='free aptamer' --aptamerSeq='TAATGTTAATTG' --ligand='False' --ligandType='' --ligandSeq=''
python main.py --run_num=2 --mode='full dock' --aptamerSeq='TAATGTTAATTG' --ligand='YQTQ.pdb' --ligandType='peptide' --ligandSeq='YQTQTNSPRRAR'
    
# --ligand='False'        # if no ligand. --ligandType and --ligandSeq will be ignored.
# --ligandType='peptide'  # or 'DNA' or 'RNA' or 'other'. Assuming 'other' ligand can be described by Amber14 force field.
# --ligandSeq=''          # if no sequence. For instance, when ligandType is 'other'

Functionality: Eight different modes of operation

E2EDNA 2.0 takes in a DNA aptamer sequence in FASTA format, and optionally a short peptide or other small molecule, and returns details of the aptamer structure and binding behaviour. This code implements several distinct analysis modes so users may customize the level of computational cost and accuracy.

  • 2d structure → returns NUPACK or seqfold analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.
  • 3d coarse → returns MMB fold of the best secondary structure. Fast O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.
  • 3d smooth → identical to '3d coarse', with a short MD relaxation in solvent. ~Less than double the cost of '3d coarse' depending on relaxation time.
  • coarse dock → uses the 3D structure from '3d coarse' as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to '3d coarse'.
  • smooth dock → identical to 'coarse dock', instead using the relaxed structure from '3d smooth'. Similar cost.
  • free aptamer → fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow O(hours).
  • full dock → Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure 'free aptamer'. Similar cost (LightDock is relatively cheap)
  • full binding → Same steps as 'full docking', with follow-up extended MD simulation of the best binding configuration. Slowest O(hours).

Test run: inputs and outcomes

Running this script automate_tests.sh will automatically run simple very light simulations of all 8 modes. Here we explain what outputs to look for and what success looks like.

  • Mode 1:2d structure Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: observe the dot-parenthesis representation for 2d structure, e.g., ..(...)..

  • Mode 2:3d coarse

Input: ‘3d unrefined’, fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: Visualize foldedAptamer_0.pdb in VMD or PyMOL

  • Mode 3:3d coarse

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation:

  • Mode 4:coarse dock

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 5:smooth dock Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 6: free aptamer Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Last step is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. (if we ask contact predictor for >1 ssStructure)

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Modifications to the code: set params[‘mode’] = ‘free aptamer’ params['sequence'] =’CGCGCGCGCGCGC’

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb”

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer.

  • Mode 7: full dock Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. Finally, the representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file).

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications to the code: set params[‘mode’] = ‘full docking’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”.

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure.

  • Mode 8: full binding Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. The representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file). Finally, the aptamer-ligand complex molecule will be sampled by MD simulation to investigate its dynamics.

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications: set params[‘mode’] = ‘full binding’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb” MD simulation of free aptamer: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”. MD simulation of aptamer-ligand complex: Binary trajectory: “clean_complex_0_0_processed_trajectory.dcd” Topology: “clean_complex_0_0_processed.pdb”

Success evaluation: File “log.txt” shows that the MD sampling of free aptamer is 100% complete and the DCD trajectory file is generated. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure The DCD trajectory file is generated, and file “log_complex.txt” shows that the MD sampling of aptamer-ligand is 100% complete. Visualize MD trajectory of aptamer-ligand using its binary and topolog file. It is worth noting that the aptamer might seem far apart from the target ligand, which could be a result of the periodic boundary condition. Should we correct it or leave user to do it?

MD simulation might stop at the onset with “Particle coordinate is nan” error. It could be due to the energy minimization being too aggressive so tha the coordinate gets out of boundary, then integrator cannot work on those non-sense coordinate values. In this case, re-run the pipeline.

MMB folding could take a while if multiple refolding takes place for any tricky sequence.

__ work in progress__

Physical Parameters

Default force field is AMBER 14. Other AMBER fields and explicit water models are trivial to implement. Implicit water requires moving to building systems from AMBER prmtop files. CHARMM may also be easily implemented, but hasn't been tested. AMOEBA 2013 parameters do not include nucleic acids, and AMOEBABIO18 parameters are not implemented in OpenMM.

* params['force field'] = 'AMBER'
* params['water model'] = 'tip3p'

Default parameters here - for guidance on adjustments start here.

params['box offset'] = 1.0 # nanometers
params['barostat interval'] = 25
params['friction'] = 1.0 # 1/picosecond
params['nonbonded method'] = PME
params['nonbonded cutoff'] = 1.0 # nanometers
params['ewald error tolerance'] = 5e-4
params['constraints'] = HBonds
params['rigid water'] = True
params['constraint tolerance'] = 1e-6
params['pressure'] = 1 

Increasing hydrogen mass e.g., to 4 AMU enables longer time-steps up to ~3-4 fs. See documentation for details.

params['hydrogen mass'] = 1.0 # in amu

Temperature, pH and ionic strength are taken into account for 2D folding in NUPACK, ion concentration in MD simulation, and protonation of molecules for MD (safest near 7-7.4).

params['temperature'] = 310 # Kelvin - used to predict secondary structure and for MD thermostatting
params['ionic strength'] = .163 # mmol - used to predict secondary structure and add ions to simulation box
params['pH'] = 7.4 # simulation will automatically protonate the peptide up to this pH

The peptide backbone constraint constant is the constant used to constrain backbone dihedrals. A minimum of 10000, as it is currently set, is recommended for good constraints (deviations < 5° were always seen with this value). For more info, please read README_CONSTRAINTS.md.

params['peptide backbone constraint constant'] = 10000

Implicit Solvent

params['implicit solvent'] = True
if params['implicit solvent']:
    params['implicit solvent model'] = OBC1  # only meaningful if implicit solvent is True
    params['leap template'] = 'leap_template.in'
    # TODO add more options to params: implicitSolventSaltConc, soluteDielectric, solventDielectric, implicitSolventKappa

Starting with a folded DNA aptamer structure (instead of just a FASTA sequence)

params['skip MMB'] = True  # it will skip '2d analysis' and 'do MMB'
if params['skip MMB'] is True:
    params['folded initial structure'] = 'foldedSequence_0.pdb'  # if wishing to skip MMB, must provide a folded structure
Comments
  • JOSS Review

    JOSS Review

    Hi all,

    Thanks for the invitation to review and congrats on the submission.

    The general idea behind this submission is sound, and follows-up on a 2021 publication from the same authors on E2EDNA v1.0, published in JCIM. From my understanding, the code is essentially a re-write to use OpenMM instead of Tinker as the MD engine. While this is valuable - makes it simpler to install/run - the authors do not realize, in my opinion, this change to its fullest potential. The authors repository is not so much a "package" in the traditional sense, but more of a collection of scripts that automate a certain rigid protocol. I would rather see for instance, NUPACK being an optional dependency - as a user, I could simply provide my own DNA molecules instead of being forced to use NUPACK. In this sense, I think this repository could use more work to stand out on its own compared to last year's publication.

    In addition to this comments, I have a general comment on the repository itself. The authors should take some time to clean up files that are no longer useful for the protocol or that are simply part of the development workflow. Folders named old, or IDE config folders (.idea) should not be part of a published version of the repository, specially when they are even marked to be ignored in the .gitignore file. Same with the existence of both a requirements.txt file and an environment.yml file, whereas only the latter is used. As such, I believe that the authors should spend some time cleaning up the repository and setting up a more "traditional" structure to help potential users navigate through their code base more easily.

    Further, I have a few starter questions about the manuscript, code and, licenses that I think should be clarified. Hopefully these will help the authors improve their work and repository/code.

    Licenses

    • You're licensing the tool under the Apache license but you are including data (parameter sets) that falls under a difference license. In particular, I see the parameter files for the Amoeba forcefield taken from Tinker/OpenMM almost verbatim. Did you check with the appropriate developers if this sharing of the forcefield parameter files is allowed under their license, without any attribution?

    Installation

    • The installation process is quite complex. As a user, I'd have to register and download NUPACK and MMB, as well as edit a series of files in order to get a functional installation. This is simply a suggestion for the developers to keep in mind.

    • Related to the point above, have the authors considered using conda directly to install their software, instead of a custom shell script? pdbfixer is available as a conda package, and you could specify pip packages there too, e.g. lightdock. The installation could be reduced to a simple: 1) install nupack 2) install mmb 3) run conda env create -f e2edna-env.yml.

    • On this last point, the authors should strip the granular version of the env yaml file otherwise conda will struggle with versions on anything but the authors' hardware.

    • According to the README, the code is only tested on MacOS, although I'd imagine the most use would be on a compute cluster running Linux. Have the authors tried running their code on Linux?

    Misc

    • In several sections of their documentation, the authors mention "OpenDNA". Was this the previous name of this package?
    • It would be greatly beneficial for a user to have config files with installation paths, simulation settings etc, instead of having to edit source code. Would the authors be open to this change?

    Comments on the Manuscript

    • In the "Statement of Need", the authors mention an "all-python" package several times. Being pedantic, this is not entirely true as their code relies on quite some compiled code in their dependencies (lightdock, openmm).
    opened by JoaoRodrigues 12
  • Feature Request: Argument parsing

    Feature Request: Argument parsing

    Hello,

    Would you be interested in more fully utilizing command-line argument parsing (e.g. using argparse)? I always feel a bit uncomfortable having to edit source code to use a program. It would be great if you could set the parameters strictly from the CL at runtime, such as workdir, mmb dir, and mmb, instead of editing main.py which is tracked by git.

    Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

    enhancement 
    opened by schackartk 10
  • 7 feature request argument parsing

    7 feature request argument parsing

    Overview

    This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

    Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

    Affected files

    The following files have changes:

    • main.py: add shebang line, add argument parsing and validation
    • automate_tests.sh: update arguments to align with argparse
    • README.md: describe current functionality and arguments

    Notes

    main.py

    There were a few things that may need to be changed to work most efficiently and predictably.

    The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

    I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

    I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

    I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

    Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

    I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

    Readme.md

    I hope my additions are helpful in describing the current functionality.

    One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

    Conclusions

    Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

    Please check that it works on MacOS still, as I have only tested on WSL.

    No additional dependencies have been added, only core libraries were used.

    Please feel free to make any changes you see fit or discuss!

    enhancement 
    opened by schackartk 7
  • Question: GPL-3.0 license required for this repo because of lightdock?

    Question: GPL-3.0 license required for this repo because of lightdock?

    Hello @brianjimenez - Hope this message finds you well.

    I am trying to figure out what license is the best choice for our E2EDNA 2.0 software and am aware that LightDock is licensed under GNU GPLv3. According to the license guide website (link) provided by GitHub, the GNU GPLv3 seems to require "larger works using a licensed work" to be under the same license. Currently our E2EDNA 2.0 is under Apache-2 license which does not include the condition of "same license". In my opinion, Apache-2 license could give some flexibility because a future version of the E2EDNA software may provide multiple options of different auto-docker package.

    A little summary of how LightDock is used in E2EDNA 2.0 now: lightdock-0.9.2 is installed by pip and the python scripts such as lightdock3.py are directly called without modification. Does our way of using LightDock fall into the category where we can only choose GNU GPLv3 for our E2EDNA 2.0? I am not sure of this question therefore would like to hear the LightDock developer's opinions.

    Thank you very much!

    question 
    opened by taoliu032 4
  • Lightdock Rust nucleic support

    Lightdock Rust nucleic support

    Dear E2DNA2 developers,

    Since you are using LightDock in some parts of your pipeline, it could be of your interest the 0.2.0 release of the Rust implementation of the framework. This new release adds support for protein-nucleic complex prediction and typically runs 5x-6x faster compared to the Python+C implementations of the Python LightDock flavor, and two orders of magnitude less amount of memory. There is more information on how to compile and use the Rust version here.

    Hope it helps!

    enhancement 
    opened by brianjimenez 3
  • Enhancement: Avoid runtime exception when

    Enhancement: Avoid runtime exception when "run" folder exists

    If the output directory for the current run already exists, right now an exception is produced:

    Start automating tests one by one...
    ====================================
    TESTING MODE #1: '2d structure'
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileExistsError: [Errno 17] File exists: '/home/ken/personal/E2EDNA2/localruns/run1'
    
    END OF TEST #1. Results are saved to folder "run1", where:
            2d structure: in record.txt
    

    An exception could be avoided by validating that the output directory does not exist, and providing a useful message such as "The output directory for this run already exists at './localrun/run1'", and an optional -f/--force flag could be provided to overwrite the output directory.

    opened by schackartk 2
  • Bug: Runtime exception when params['workdir'] does not exist

    Bug: Runtime exception when params['workdir'] does not exist

    When the directory in the variable params['workdir'] does not exist, the program fails at runtime:

    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1'
    

    This could be fixed by checking for the directory, and creating it if it does not exist:

    if not os.path.isdir(params['workdir'])
        os.mkdir(params['workdir'])
    
    opened by schackartk 2
  • Error: 'str' object is not callable; in opendna.py, line 535

    Error: 'str' object is not callable; in opendna.py, line 535

    Hello,

    I am excited to try out this tool!

    I have installed all dependencies successfully (I believe), and I am running the script automate_tests.sh. Most tests are passing, but tests 4, 5, 7, and 8 are failing during the docking step with the same exception.

    TESTING MODE #4: 'coarse dock'
    Starting Fresh Run 4
    Simulation mode: coarse dock
    Simulating TAATGTTAATTG with YQTQ.pdb
    Getting Secondary Structure(s)
    Running over 1 possible 2D structures.
    2D structure #0 is                              : .(((....))).
    
    Folding Aptamer from Sequence. Fold speed = quick.
    Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
    2D structure after MMB folding (from MDAnalysis): .(((....))).
    Initial fold fidelity = 1.000
    Initial fold fidelity = 1.000 (from MDAnalysis)
    Folded the aptamer and generated the folded structure: foldedAptamer_0.pdb
    
    No relaxation (smoothing) of the folded aptamer.
    
    Docking
    Traceback (most recent call last):
      File "main.py", line 230, in <module>
        opendnaOutput = opendna.run()  # retrieve binding information (eventually this should become a normalized c-number)    
      File "/home/ken/personal/E2EDNA2/opendna.py", line 297, in run
        outputDict['dock scores {}'.format(self.i)] = self.dock(self.pdbDict['representative aptamer {}'.format(self.i)], self.targetPDB)  # eg, "peptide.pdb" which can be created given peptide sequence by buildPeptide in function dock
      File "/home/ken/personal/E2EDNA2/opendna.py", line 535, in dock
        ld.run()
    TypeError: 'str' object is not callable
    

    I am unsure what the underlying problem is, but maybe it has to do with a mistake between:

    • The instance variable run on line 487 of instances.py: self.run = params['ld run']
    • The method run() on line 504 of instances.py: def run(self):

    Because the instance variable from line 487 is the string value set on line 220 in main.py: params['ld run'] = 'lightdock3.py'. Maybe this variable is somehow shadowing the method run(), and so it is failing to "call" str() (i.e. 'lightdock3.py'())?

    I would appreciate any help with resolving this.

    Thank you!

    bug 
    opened by schackartk 2
  • Bug: Mysterious error when using invalid mode

    Bug: Mysterious error when using invalid mode

    If the mode is misspelled or an invalid choice, an excpetion occurs:

    $ python main.py --run_num=1 --mode='fulldock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 52, in __init__
        if self.actionDict['make workdir']:
    KeyError: 'make workdir'
    

    The exception doesn't seem to mention the invalid --mode, so the user may be confused as to what happened.

    I have confirmed that this runs fine once the mode name is corrected.

    This issue is resolved in #8 by using argparse and specifying the valid choices. Here is what is displayed from the code in that pull request:

    $ ./main.py -r 1 -m 'fulldock' -a aptamers/my_aptamer.txt -l my_ligand.pdb -t other -f
    usage: main.py [-h] [-f] -r INT -m MODE -a SEQ -l PDB [-t TYPE] [-s SEQ]
                   [-d RUN] [-p DEV] [-w DIR] [-md DIR] [-mb MMB]
    main.py: error: argument -m/--mode: invalid choice: 'fulldock' (choose from '2d structure', '3d coarse', '3d smooth', 'coarse dock', 'smooth dock', 'free aptamer', 'full dock', 'full binding')
    
    opened by schackartk 1
  • Enhancement: More control over output location

    Enhancement: More control over output location

    It seems a bit restrictive to enforce that the output directory be structured as {workdir}/run{runnum}/. Most tools allow you to specify the output directory yourself.

    This could be useful to the user (myself included) for organizing runs, and automating using a workflow manager. For instance, if I am running several combinations of aptamer, ligands, and modes, I may want my output directories to be {aptamer}/{ligand}/{mode}/. This structure is meaningful to me unlike the folder name "run1".

    While this is not resolved in #8 , it would reduce the number of arguments. Instead of having both --workdir and --run_num, you could just have a single --outdir argument.

    enhancement 
    opened by schackartk 1
  • Bug: Ligand file in a folder causes exception

    Bug: Ligand file in a folder causes exception

    If the ligand pdb file is in a folder instead of the root of the repo, an exception occurs:

    $ ls ligands/
    my_ligand.pdb
    
    $ python main.py --run_num=1 --mode='full dock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='ligands/my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Starting Fresh Run 1
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 179, in setup
        copyfile(self.targetPDB, self.workDir + '/' + self.targetPDB)
      File "/home/ken/personal/E2EDNA2/env/lib/python3.7/shutil.py", line 121, in copyfile
        with open(dst, 'wb') as fdst:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1/ligands/my_ligand.pdb'
    

    I don't see any reason that the ligand file should not be in a folder, so this should not fail.

    bug 
    opened by schackartk 2
Releases(v2.0.0)
  • v2.0.0(May 16, 2022)

    This release is associated with the JOSS publication: https://doi.org/10.21105/joss.04182 The release has also been archived on Zenodo: https://doi.org/10.5281/zenodo.6546661

    Clarification: the archive folder will have a name of "E2EDNA2-2.0.0", once downloaded from below. It refers to the version v2.0.0 of E2EDNA. The name "E2EDNA2" is inherited from the repository name.

    To view the repository: https://github.com/siminegroup/E2EDNA2/tree/v2.0.0 Full Changelog: https://github.com/siminegroup/E2EDNA2/commits/v2.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
computational chemistry group at McGill University
null
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 61 Sep 17, 2022
A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

Zain 1 Feb 1, 2022
DNA-RECON { Automatic Web Reconnaissance Tool }

ABOUT TOOL : DNA-RECON is an automatic web reconnaissance tool written in python. This tool made for reconnaissance and information gathering with an

NIKUNJ BHATT 25 Aug 11, 2021
Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

DeepGeneAnnotator: A tool to annotate the gene in the genome The master thesis of the "Using deep learning to predict gene structures of the coding ge

Ching-Tien Wang 3 Sep 9, 2022
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

Phil Wang 55 Sep 2, 2022
Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

Ejemplo Algoritmo Viterbi Ejemplo de un algoritmo Viterbi aplicado a modelo ocul

Mateo Velásquez Molina 1 Jan 10, 2022
Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google 82 Sep 20, 2022
A very short and easy implementation of Quantile Regression DQN

Quantile Regression DQN Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression (https://arx

Arsenii Senya Ashukha 80 Sep 17, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 3 Feb 27, 2022
Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

Microsoft 278 Sep 30, 2022
Official implementation of Long-Short Transformer in PyTorch.

Long-Short Transformer (Transformer-LS) This repository hosts the code and models for the paper: Long-Short Transformer: Efficient Transformers for La

NVIDIA Corporation 190 Sep 27, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

DeepMind 42 Jul 20, 2022
LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

null 5 Nov 12, 2021
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 70 Sep 27, 2022
PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

ALiBi PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. Quickstart Clone this reposit

Jake Tae 4 Jul 27, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 12 Sep 1, 2022
Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Session-aware BERT4Rec Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 shor

Jamie J. Seol 20 Aug 5, 2022
MM1 and MMC Queue Simulation using python - Results and parameters in excel and csv files

implementation of MM1 and MMC Queue on randomly generated data and evaluate simulation results then compare with analytical results and draw a plot curve for them, simulate some integrals and compare results and run monte carlo algorithm with them

Mohamadreza Rezaei 1 Jan 19, 2022