A geometric deep learning pipeline for predicting protein interface contacts.

Last update: Dec 30, 2022

Related tags

Deep Learning bioinformatics proteins transformers geometric-deep-learning graph-neural-networks protein-protein-interactions

Overview

DeepInteract

Description

A geometric deep learning pipeline for predicting protein interface contacts.

Citing this work

If you use the code or data associated with this package, please cite:

@article{morehead2021deepinteract,
  title = {Geometric Transformers for Protein Interface Contact Prediction},
  author = {Alex Morehead, Chen Chen, and Jianlin Cheng},
  year = {2021},
  eprint = {N/A},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG}
}

First time setup

The following step is required in order to run DeepInteract:

Genetic databases

This step requires aria2c to be installed on your machine.

DeepInteract needs only one of the following genetic (sequence) databases compatible with HH-suite3 to run:

Install the BFD for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/bfd"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
# Mirror of:
# https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz.
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
tar --extract --verbose --file="${ROOT_DIR}/${BASENAME}" \
  --directory="${ROOT_DIR}"
rm "${ROOT_DIR}/${BASENAME}"

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt'

(Smaller Alternative) Install the Small BFD for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/small_bfd"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
pushd "${ROOT_DIR}"
gunzip "${ROOT_DIR}/${BASENAME}"
popd

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/small_bfd/bfd-first_non_consensus_sequences.fasta'

(Smaller Alternative) Install Uniclust30 for HH-suite3

# Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOAD_DIR="~/Data/Databases"
ROOT_DIR="${DOWNLOAD_DIR}/uniclust30"
mkdir "~/Data" "$DOWNLOAD_DIR" "$ROOT_DIR"
# Mirror of:
# http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz
SOURCE_URL="https://storage.googleapis.com/alphafold-databases/casp14_versions/uniclust30_2018_08_hhsuite.tar.gz"
BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOT_DIR}"
aria2c "${SOURCE_URL}" --dir="${ROOT_DIR}"
tar --extract --verbose --file="${ROOT_DIR}/${BASENAME}" \
  --directory="${ROOT_DIR}"
rm "${ROOT_DIR}/${BASENAME}"

# The CLI argument --hhsuite_db for lit_model_predict.py
# should then become '~/Data/Databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08'

Repository Directory Structure

DeepInteract
│
└───docker
│
└───img
│
└───project
     │
     └───checkpoints
     │
     └───datasets
     │   │
     │   └───builder
     │   │
     │   └───CASP_CAPRI
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   casp_capri_dgl_data_module.py
     │   │   casp_capri_dgl_dataset.py
     │   │
     │   └───DIPS
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   dips_dgl_data_module.py
     │   │   dips_dgl_dataset.py
     │   │
     │   └───Input
     │   │   │
     │   │   └───final
     │   │   │   │
     │   │   │   └───processed
     │   │   │   │
     │   │   │   └───raw
     │   │   │
     │   │   └───interim
     │   │   │   │
     │   │   │   └───complexes
     │   │   │   │
     │   │   │   └───external_feats
     │   │   │   │   │
     │   │   │   │   └───PSAIA
     │   │   │   │       │
     │   │   │   │       └───INPUT
     │   │   │   │
     │   │   │   └───pairs
     │   │   │   │
     │   │   │   └───parsed
     │   │   │
     │   │   └───raw
     │   │
     │   └───PICP
     │       picp_dgl_data_module.py
     │
     └───test_data
     │
     └───utils
     │   deepinteract_constants.py
     │   deepinteract_modules.py
     │   deepinteract_utils.py
     │   dips_plus_utils.py
     │   graph_utils.py
     │   protein_feature_utils.py
     │   vision_modules.py
     │
     lit_model_predict.py
     lit_model_predict_docker.py
     lit_model_train.py
.gitignore
CONTRIBUTING.md
environment.yml
LICENSE
README.md
requirements.txt
setup.cfg
setup.py

Running DeepInteract via Docker

The simplest way to run DeepInteract is using the provided Docker script.

The following steps are required in order to ensure Docker is installed and working correctly:

Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.
Check that DeepInteract will be able to use a GPU by running:
```
docker run --rm --gpus all nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04 nvidia-smi
```
The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

Now that we know Docker is functioning properly, we can begin building our Docker image for DeepInteract:

Clone this repository and cd into it.

git clone https://github.com/BioinfoMachineLearning/DeepInteract
cd DeepInteract/
DI_DIR=$(pwd)

Download the trained model checkpoint.

mkdir -p project/checkpoints
wget -P project/checkpoints https://zenodo.org/record/5546775/files/LitGINI-GeoTran-DilResNet.ckpt

Build the Docker image (Warning: Requires ~13GB of Space):
```
docker build -f docker/Dockerfile -t deepinteract .
```
Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.
```
pip3 install -r docker/requirements.txt
```
Create directory in which to generate input features and outputs:
```
mkdir -p project/datasets/Input
```
Run run_docker.py pointing to two input PDB files containing the first and second chains of a complex for which you wish to predict the contact probability map. For example, for the DIPS-Plus test target with the PDB ID 4HEQ:
```
python3 docker/run_docker.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --input_dataset_dir "$DI_DIR"/project/datasets/Input --ckpt_name "$DI_DIR"/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --num_gpus 0
```
This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.
Note that by using the default
```
--num_gpus 0
```
flag when executing run_docker.py, the Docker container will only make use of the system's available CPU(s) for prediction. However, by specifying
```
--num_gpus 1
```
when executing run_docker.py, the Docker container will then employ the first available GPU for prediction.

Running DeepInteract via a Traditional Installation (for Linux-Based Operating Systems)

First, install and configure Conda environment:

# Clone this repository:
git clone https://github.com/BioinfoMachineLearning/DeepInteract

# Change to project directory:
cd DeepInteract
DI_DIR=$(pwd)

# Set up Conda environment locally
conda env create --name DeepInteract -f environment.yml

# Activate Conda environment located in the current directory:
conda activate DeepInteract

# (Optional) Perform a full install of the pip dependencies described in 'requirements.txt':
pip3 install -r requirements.txt

# (Optional) To remove the long Conda environment prefix in your shell prompt, modify the env_prompt setting in your .condarc file with:
conda config --set env_prompt '({name})'

Installing PSAIA

Install GCC 10 for PSAIA:

# Install GCC 10 for Ubuntu 20.04
sudo apt install software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/ppa
sudo apt update
sudo apt install gcc-10 g++-10

# Or install GCC 10 for Arch Linux/Manjaro
yay -S gcc10

Install QT4 for PSAIA:

# Install QT4 for Ubuntu 20.04:
sudo add-apt-repository ppa:rock-core/qt4
sudo apt update
sudo apt install libqt4* libqtcore4 libqtgui4 libqtwebkit4 qt4* libxext-dev

# Or install QT4 for Arch Linux/Manjaro
yay -S qt4

Compile PSAIA from source:

# Select the location to install the software:
MY_LOCAL=~/Programs

# Download and extract PSAIA's source code:
mkdir "$MY_LOCAL"
cd "$MY_LOCAL"
wget http://complex.zesoi.fer.hr/data/PSAIA-1.0-source.tar.gz
tar -xvzf PSAIA-1.0-source.tar.gz

# Compile PSAIA (i.e., a GUI for PSA):
cd PSAIA_1.0_source/make/linux/psaia/
qmake-qt4 psaia.pro
make

# Compile PSA (i.e., the protein structure analysis (PSA) program):
cd ../psa/
qmake-qt4 psa.pro
make

# Compile PIA (i.e., the protein interaction analysis (PIA) program):
cd ../pia/
qmake-qt4 pia.pro
make

# Test run any of the above-compiled programs:
cd "$MY_LOCAL"/PSAIA_1.0_source/bin/linux
# Test run PSA inside a GUI:
./psaia/psaia
# Test run PIA through a terminal:
./pia/pia
# Test run PSA through a terminal:
./psa/psa

Finally, substitute your absolute filepath for DeepInteract (i.e., where on your local storage device you downloaded the repository to) anywhere DeepInteract's local repository is referenced in project/datasets/builder/psaia_config_file_input.txt.

Training

Download training and cross-validation DGLGraphs

To train, retrain, or cross-validate DeepInteract models using DIPS-Plus and/or CASP-CAPRI targets, we first need to download the preprocessed DGLGraphs from Zenodo:

final_processed_dips.tar.gz # Extract DIPS-Plus tar -xzf final_raw_dips.tar.gz tar -xzf final_processed_dips.tar.gz rm final_processed_dips.tar.gz.parta* final_raw_dips.tar.gz final_processed_dips.tar.gz # Download CASP-CAPRI mkdir -p ../../CASP_CAPRI/final cd ../../CASP_CAPRI/final wget https://zenodo.org/record/5546775/files/final_raw_casp_capri.tar.gz wget https://zenodo.org/record/5546775/files/final_processed_casp_capri.tar.gz # Extract CASP-CAPRI tar -xzf final_raw_casp_capri.tar.gz tar -xzf final_processed_casp_capri.tar.gz rm final_raw_casp_capri.tar.gz final_processed_casp_capri.tar.gz ">

# Download and extract preprocessed DGLGraphs for DIPS-Plus and CASP-CAPRI
# Requires ~55GB of free space
mkdir -p project/datasets/DIPS/final
cd project/datasets/DIPS/final

# Download DIPS-Plus
wget https://zenodo.org/record/5546775/files/final_raw_dips.tar.gz
wget https://zenodo.org/record/5546775/files/final_processed_dips.tar.gz.partaa
wget https://zenodo.org/record/5546775/files/final_processed_dips.tar.gz.partab

# First, reassemble all processed DGLGraphs
# We split the (tar.gz) archive into two separate parts with
# 'split -b 4096M final_processed_dips.tar.gz "final_processed_dips.tar.gz.part"'
# to upload it to Zenodo, so to recover the original archive:
cat final_processed_dips.tar.gz.parta* >final_processed_dips.tar.gz

# Extract DIPS-Plus
tar -xzf final_raw_dips.tar.gz
tar -xzf final_processed_dips.tar.gz
rm final_processed_dips.tar.gz.parta* final_raw_dips.tar.gz final_processed_dips.tar.gz

# Download CASP-CAPRI
mkdir -p ../../CASP_CAPRI/final
cd ../../CASP_CAPRI/final
wget https://zenodo.org/record/5546775/files/final_raw_casp_capri.tar.gz
wget https://zenodo.org/record/5546775/files/final_processed_casp_capri.tar.gz

# Extract CASP-CAPRI
tar -xzf final_raw_casp_capri.tar.gz
tar -xzf final_processed_casp_capri.tar.gz
rm final_raw_casp_capri.tar.gz final_processed_casp_capri.tar.gz

Navigate to the project directory and run the training script with the parameters desired:

# Hint: Run `python3 lit_model_train.py --help` to see all available CLI arguments
cd project
python3 lit_model_train.py --lr 1e-3 --weight_decay 1e-2
cd ..

Inference

Download trained model checkpoint

# Return to root directory of DeepInteract repository
cd "$DI_DIR"

# Download the trained model checkpoint
mkdir -p project/checkpoints
wget -P project/checkpoints https://zenodo.org/record/5546775/files/LitGINI-GeoTran-DilResNet.ckpt

Predict interface contact probability maps

Navigate to the project directory and run the prediction script with the filenames of the left and right PDB chains.

# Hint: Run `python3 lit_model_predict.py --help` to see all available CLI arguments
cd project
python3 lit_model_predict.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --ckpt_dir "$DI_DIR"/project/checkpoints --ckpt_name LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
cd ..

This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.

Acknowledgements

DeepInteract communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

License and Disclaimer

DeepInteract Code License

Licensed under the GNU Public License, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.gnu.org/licenses/gpl-3.0.en.html.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Comments

[Doc]About pdb files

Hi, @amorehead, can you provide the original 32 pdb files for the DIPS-Plus dataset and 55 pdb files for the DB5 dataset? And how to process the original pdb files into pdb.dill files for this project.

Thanks!

opened by terry-r123 29
[BUG?] List index out of range

When I run the line:

python3 docker/run_docker.py --left_pdb_filepath /storage/DeepInteract/project/test_data/4heq_l_u.pdb --right_pdb_filepath /storage/DeepInteract/project/test_data/4heq_r_u.pdb --input_dataset_dir /storage/DeepInteract/project/datasets/Input --ckpt_name /storage/DeepInteract/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db /storage/databases/Uniclust30/UniRef30_2021_06 --num_gpus 1

I get the following, terminating in a "list index out of range" and no output:

I1026 17:17:24.445479 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/test_data -> /mnt/input_pdbs I1026 17:17:24.445564 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/test_data -> /mnt/input_pdbs I1026 17:17:24.445607 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/datasets/Input -> /mnt/Input I1026 17:17:24.445646 140490422077248 run_docker.py:59] Mounting /storage/DeepInteract/project/checkpoints -> /mnt/checkpoints I1026 17:17:24.445684 140490422077248 run_docker.py:59] Mounting /storage/databases/Uniclust30 -> /mnt/hhsuite_db I1026 17:17:26.138480 140490422077248 run_docker.py:135] DGL backend not selected or invalid. Assuming PyTorch for now. I1026 17:17:26.138590 140490422077248 run_docker.py:135] Using backend: pytorch I1026 17:17:26.141283 140490422077248 run_docker.py:135] I1026 15:17:26.141029 140696250648384 deepinteract_utils.py:1030] Seeding everything with random seed 42 I1026 17:17:26.141357 140490422077248 run_docker.py:135] Global seed set to 42 I1026 17:17:26.177383 140490422077248 run_docker.py:135] I1026 15:17:26.177001 140696250648384 deepinteract_utils.py:587] Making interim data set from raw data I1026 17:17:26.178824 140490422077248 run_docker.py:135] I1026 15:17:26.178652 140696250648384 parse.py:43] 4 requested keys, 4 produced keys, 0 work keys I1026 17:17:26.178916 140490422077248 run_docker.py:135] W1026 15:17:26.178736 140696250648384 complex.py:36] Complex file /mnt/Input/interim/complexes/complexes.dill already exists! I1026 17:17:26.179392 140490422077248 run_docker.py:135] I1026 15:17:26.179221 140696250648384 pair.py:79] 0 requested keys, 0 produced keys, 0 work keys I1026 17:17:26.179549 140490422077248 run_docker.py:135] I1026 15:17:26.179284 140696250648384 deepinteract_utils.py:608] Generating PSAIA features from PDB files in /mnt/Input/interim/parsed I1026 17:17:26.179922 140490422077248 run_docker.py:135] I1026 15:17:26.179797 140696250648384 conservation.py:361] 0 PDB files to process with PSAIA I1026 17:17:26.181284 140490422077248 run_docker.py:135] I1026 15:17:26.179910 140696250648384 parallel.py:46] Processing 1 inputs. I1026 17:17:26.181358 140490422077248 run_docker.py:135] I1026 15:17:26.181147 140696250648384 parallel.py:62] Sequential Mode. I1026 17:17:26.181491 140490422077248 run_docker.py:135] I1026 15:17:26.181194 140696250648384 conservation.py:43] PSAIA'ing /mnt/Input/interim/external_feats/PSAIA/INPUT/pdb_list.fls I1026 17:17:26.199129 140490422077248 run_docker.py:135] I1026 15:17:26.198776 140696250648384 conservation.py:200] For generating protrusion indices, spent 00.02 PSAIA'ing, 00.00 writing, and 00.02 overall. I1026 17:17:26.199361 140490422077248 run_docker.py:135] I1026 15:17:26.198991 140696250648384 deepinteract_utils.py:625] Generating profile HMM features from PDB files in /mnt/Input/interim/parsed I1026 17:17:26.199785 140490422077248 run_docker.py:135] I1026 15:17:26.199542 140696250648384 conservation.py:458] 4 requested keys, 4 produced keys, 0 work filenames I1026 17:17:26.199849 140490422077248 run_docker.py:135] I1026 15:17:26.199590 140696250648384 conservation.py:464] 0 work filenames I1026 17:17:26.199900 140490422077248 run_docker.py:135] I1026 15:17:26.199645 140696250648384 deepinteract_utils.py:640] Starting postprocessing for all unprocessed pairs in /mnt/Input/interim/pairs I1026 17:17:26.199948 140490422077248 run_docker.py:135] I1026 15:17:26.199685 140696250648384 deepinteract_utils.py:647] Looking for all pairs in /mnt/Input/interim/pairs I1026 17:17:26.200107 140490422077248 run_docker.py:135] Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase) I1026 17:17:26.200161 140490422077248 run_docker.py:135] I1026 15:17:26.199843 140696250648384 deepinteract_utils.py:660] Found 0 work pair(s) in /mnt/Input/interim/pairs I1026 17:17:26.200797 140490422077248 run_docker.py:135] Traceback (most recent call last): I1026 17:17:26.200864 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 326, in I1026 17:17:26.200918 140490422077248 run_docker.py:135] app.run(main) I1026 17:17:26.200968 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I1026 17:17:26.201017 140490422077248 run_docker.py:135] _run_main(main, args) I1026 17:17:26.201066 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I1026 17:17:26.201114 140490422077248 run_docker.py:135] sys.exit(main(argv)) I1026 17:17:26.201161 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 199, in main I1026 17:17:26.201208 140490422077248 run_docker.py:135] input_dataset = InputDataset(left_pdb_filepath=FLAGS.left_pdb_filepath, I1026 17:17:26.201254 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 95, in init I1026 17:17:26.201300 140490422077248 run_docker.py:135] super(InputDataset, self).init(name='InputDataset', I1026 17:17:26.201347 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/dgl/data/dgl_dataset.py", line 94, in init I1026 17:17:26.201393 140490422077248 run_docker.py:135] self._load() I1026 17:17:26.201438 140490422077248 run_docker.py:135] File "/opt/conda/lib/python3.8/site-packages/dgl/data/dgl_dataset.py", line 179, in _load I1026 17:17:26.201483 140490422077248 run_docker.py:135] self.process() I1026 17:17:26.201529 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/lit_model_predict_docker.py", line 109, in process I1026 17:17:26.201575 140490422077248 run_docker.py:135] left_complex_graph, right_complex_graph = process_pdb_into_graph(self.left_pdb_filepath, I1026 17:17:26.201622 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/utils/deepinteract_utils.py", line 741, in process_pdb_into_graph I1026 17:17:26.201667 140490422077248 run_docker.py:135] input_pair = convert_input_pdb_files_to_pair(left_pdb_filepath, right_pdb_filepath, I1026 17:17:26.201713 140490422077248 run_docker.py:135] File "/app/DeepInteract/project/utils/deepinteract_utils.py", line 725, in convert_input_pdb_files_to_pair I1026 17:17:26.201758 140490422077248 run_docker.py:135] pair_filepath = launch_postprocessing_of_pruned_pairs( I1026 17:17:26.201883 140490422077248 run_docker.py:135] IndexError: list index out of range

opened by gabrielepozzati 6
About the training dataset

Hi, @amorehead

When I reproduce this work, I have some questions about the dataset. I use the ndata["x"] of complex["graph1"] and complex["graph2"] to check the postive labels and distance, but I get some comfused result. The postive labels created by distance map(<6 Angstrom) are less than the complex["examples"]. So I want to know the ndata["x"] is the bound complex coordinates?

Thanks!

opened by peter5842 4
what genetic databases did you use?

Hi, for the DIPS-PLUS dataset, what genetic databases did you use when you generated the DIPS-PLUS dataset from the raw PDB file, and for the casp, and db5 datasets?

opened by onlyonewater 3
what does this code mean？

what does this code mean？It seems to be used during data loader.

https://github.com/BioinfoMachineLearning/DeepInteract/blob/c78d205465f02ee4ef751dbbafb7ec8f30c75c9a/project/datasets/DIPS/dips_dgl_dataset.py#L137-L141

opened by onlyonewater 2
About positive labels for each inter-chain residue pair

Hi @amorehead,

I was wondering if the positive labels for the CASP and DB5 datasets are also determined by each inter-chain residue pair that are found within 6 ˚A?

Thanks!

opened by terry-r123 2
[BUG?] Invalid key "graph1". Must be one of the edge types.

Thanks for great DeepInteract! When I run the line:

python3 lit_model_train.py --lr 1e-3 --weight_decay 1e-2

I get the following:

Traceback (most recent call last): File "lit_model_train.py", line 223, in main(args) File "lit_model_train.py", line 174, in main trainer.fit(model=model, datamodule=picp_data_module) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit self._run(model) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 917, in _run self._dispatch() File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 985, in _dispatch self.accelerator.start_training(self) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training self.training_type_plugin.start_training(trainer) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training self._results = trainer.run_stage() File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 995, in run_stage return self._run_train() File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1030, in _run_train self._run_sanity_check(self.lightning_module) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1114, in _run_sanity_check self._evaluation_loop.run() File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 109, in advance dl_outputs = self.epoch_loop.run( File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run self.advance(*args, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 111, in advance output = self.evaluation_step(batch, batch_idx, dataloader_idx) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 158, in evaluation_step output = self.trainer.accelerator.validation_step(step_kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 211, in validation_step return self.training_type_plugin.validation_step(*step_kwargs.values()) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 392, in validation_step return self.model(*args, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 624, in forward output = self.module(*inputs, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/miniconda/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 93, in forward output = self.module.validation_step(*inputs, **kwargs) File "/ryc/DeepInteract/project/utils/deepinteract_modules.py", line 1923, in validation_step graph1, graph2 = batch['graph1'], batch['graph2'] File "/home/user/miniconda/lib/python3.8/site-packages/dgl/heterograph.py", line 2152, in getitem raise DGLError('Invalid key "{}". Must be one of the edge types.'.format(orig_key)) dgl._ffi.base.DGLError: Invalid key "graph1". Must be one of the edge types. Exception in thread Thread-2: Traceback (most recent call last): File "/home/user/miniconda/lib/python3.8/threading.py", line 932, in _bootstrap_inner

opened by terry-r123 2
[Doc] Documentation Confusion

In project/utils/deepinteract_utils.py line 378, the function description about convert_df_to_dgl_graph says that it will return edata['w'] and edata['a'] and in line 860 edata['w'] and edata['a'] are used, but the function convert_df_to_dgl_graph doesn't generate the two parameters.

opened by XuBlack 2
A major bug???

It seems that only these few values are printed at the end, https://github.com/BioinfoMachineLearning/DeepInteract/blob/c78d205465f02ee4ef751dbbafb7ec8f30c75c9a/project/utils/deepinteract_modules.py#L2160-L2165

And printed precision and recall seem that the value of the last sample. https://github.com/BioinfoMachineLearning/DeepInteract/blob/c78d205465f02ee4ef751dbbafb7ec8f30c75c9a/project/utils/deepinteract_modules.py#L2075-L2081

opened by onlyonewater 0
[Bugfix] Fix issue where re-running inference pipeline could result in a file-not-found error
Fix an issue where re-running the Docker inference pipeline could result in a file-not-found error

Resolve the issue by manually constructing the input complex's file path after postprocessing, in the case that the complex has already been postprocessed
opened by amorehead 0
[BUG?] RuntimeWarning: invalid value encountered in double_scalars & Normal vector missing

When I try to run: python3 docker/run_docker.py --left_pdb_filepath project/test_data/4heq_l_u.pdb --right_pdb_filepath project/test_data/4heq_r_u.pdb --input_dataset_dir project/datasets/CASP_CAPRI --ckpt_name project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08

I get these logs:

I0621 12:54:27.512626 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/test_data -> /mnt/input_pdbs I0621 12:54:27.512762 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/test_data -> /mnt/input_pdbs I0621 12:54:27.512836 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/datasets/CASP_CAPRI -> /mnt/Input I0621 12:54:27.512908 139977373710144 run_docker.py:59] Mounting /home/ryc/pro/DeepInteract/project/checkpoints -> /mnt/checkpoints I0621 12:54:27.512977 139977373710144 run_docker.py:59] Mounting /home/ryc/Data/Databases/uniclust30/uniclust30_2018_08 -> /mnt/hhsuite_db I0621 12:54:30.589913 139977373710144 run_docker.py:135] DGL backend not selected or invalid. Assuming PyTorch for now. I0621 12:54:30.590292 139977373710144 run_docker.py:135] Using backend: pytorch I0621 12:54:30.594311 139977373710144 run_docker.py:135] I0621 12:54:30.593440 140113106646848 deepinteract_utils.py:1098] Seeding everything with random seed 42 I0621 12:54:30.594596 139977373710144 run_docker.py:135] Global seed set to 42 I0621 12:54:30.643066 139977373710144 run_docker.py:135] cp: cannot stat '/mnt/input_pdbs/4heq_l_u.pdb': No such file or directory I0621 12:54:30.654789 139977373710144 run_docker.py:135] cp: cannot stat '/mnt/input_pdbs/4heq_r_u.pdb': No such file or directory I0621 12:54:30.655230 139977373710144 run_docker.py:135] I0621 12:54:30.654651 140113106646848 deepinteract_utils.py:608] Making interim data set from raw data I0621 12:54:30.675874 139977373710144 run_docker.py:135] I0621 12:54:30.675035 140113106646848 parse.py:43] 62 requested keys, 60 produced keys, 2 work keys I0621 12:54:30.676792 139977373710144 run_docker.py:135] I0621 12:54:30.675550 140113106646848 parallel.py:46] Processing 2 inputs. I0621 12:54:30.676914 139977373710144 run_docker.py:135] I0621 12:54:30.676569 140113106646848 parallel.py:62] Sequential Mode. I0621 12:54:30.677030 139977373710144 run_docker.py:135] I0621 12:54:30.676633 140113106646848 parse.py:63] Reading /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:54:30.711622 139977373710144 run_docker.py:135] I0621 12:54:30.710961 140113106646848 parse.py:65] Writing /mnt/Input/raw/he/4heq_r_u.pdb to /mnt/Input/interim/parsed/he/4heq_r_u.pdb.pkl I0621 12:54:30.713438 139977373710144 run_docker.py:135] I0621 12:54:30.712913 140113106646848 parse.py:67] Done writing /mnt/Input/raw/he/4heq_r_u.pdb to /mnt/Input/interim/parsed/he/4heq_r_u.pdb.pkl I0621 12:54:30.713546 139977373710144 run_docker.py:135] I0621 12:54:30.713084 140113106646848 parse.py:63] Reading /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:54:30.744368 139977373710144 run_docker.py:135] I0621 12:54:30.743873 140113106646848 parse.py:65] Writing /mnt/Input/raw/he/4heq_l_u.pdb to /mnt/Input/interim/parsed/he/4heq_l_u.pdb.pkl I0621 12:54:30.745597 139977373710144 run_docker.py:135] I0621 12:54:30.745240 140113106646848 parse.py:67] Done writing /mnt/Input/raw/he/4heq_l_u.pdb to /mnt/Input/interim/parsed/he/4heq_l_u.pdb.pkl I0621 12:54:30.745825 139977373710144 run_docker.py:135] I0621 12:54:30.745505 140113106646848 complex.py:38] Getting filenames... I0621 12:54:30.749119 139977373710144 run_docker.py:135] I0621 12:54:30.748700 140113106646848 complex.py:40] Getting complexes... I0621 12:54:30.770302 139977373710144 run_docker.py:135] I0621 12:54:30.769680 140113106646848 pair.py:79] 31 requested keys, 30 produced keys, 1 work keys I0621 12:54:30.770423 139977373710144 run_docker.py:135] I0621 12:54:30.769779 140113106646848 parallel.py:46] Processing 1 inputs. I0621 12:54:30.770527 139977373710144 run_docker.py:135] I0621 12:54:30.769842 140113106646848 parallel.py:62] Sequential Mode. I0621 12:54:30.770629 139977373710144 run_docker.py:135] I0621 12:54:30.769901 140113106646848 pair.py:97] Working on 4heq I0621 12:54:30.773638 139977373710144 run_docker.py:135] I0621 12:54:30.773111 140113106646848 pair.py:102] For complex 4heq found 1 pairs out of 2 chains I0621 12:54:31.086785 139977373710144 run_docker.py:135] I0621 12:54:31.085926 140113106646848 deepinteract_utils.py:689] Generating PSAIA features from PDB files in /mnt/Input/interim/parsed I0621 12:54:31.090075 139977373710144 run_docker.py:135] I0621 12:54:31.089508 140113106646848 conservation.py:361] 0 PDB files to process with PSAIA I0621 12:54:31.090215 139977373710144 run_docker.py:135] I0621 12:54:31.089650 140113106646848 parallel.py:46] Processing 1 inputs. I0621 12:54:31.090428 139977373710144 run_docker.py:135] I0621 12:54:31.089698 140113106646848 parallel.py:62] Sequential Mode. I0621 12:54:31.090618 139977373710144 run_docker.py:135] I0621 12:54:31.089743 140113106646848 conservation.py:43] PSAIA'ing /mnt/Input/interim/external_feats/PSAIA/INPUT/pdb_list.fls I0621 12:54:31.114144 139977373710144 run_docker.py:135] I0621 12:54:31.113151 140113106646848 conservation.py:200] For generating protrusion indices, spent 00.02 PSAIA'ing, 00.00 writing, and 00.02 overall. I0621 12:54:31.114319 139977373710144 run_docker.py:135] I0621 12:54:31.113927 140113106646848 deepinteract_utils.py:706] Generating profile HMM features from PDB files in /mnt/Input/interim/parsed I0621 12:54:31.125687 139977373710144 run_docker.py:135] I0621 12:54:31.125225 140113106646848 conservation.py:458] 62 requested keys, 60 produced keys, 2 work filenames I0621 12:54:31.125820 139977373710144 run_docker.py:135] I0621 12:54:31.125341 140113106646848 conservation.py:464] 2 work filenames I0621 12:54:31.126219 139977373710144 run_docker.py:135] I0621 12:54:31.125793 140113106646848 parallel.py:46] Processing 2 inputs. I0621 12:54:31.126399 139977373710144 run_docker.py:135] I0621 12:54:31.125915 140113106646848 parallel.py:62] Sequential Mode. I0621 12:54:31.160443 139977373710144 run_docker.py:135] I0621 12:54:31.159958 140113106646848 conservation.py:152] HHblits'ing /mnt/Input/interim/external_feats/he/work/4heq_l_u.pdb-1-A.fa I0621 12:55:03.191800 139977373710144 run_docker.py:135] I0621 12:55:03.190688 140113106646848 conservation.py:238] For 1 profile HMMs generated from 4heq_l_u.pdb, spent 32.06 blitsing, 00.00 writing, and 32.06 overall. I0621 12:55:03.224250 139977373710144 run_docker.py:135] I0621 12:55:03.223448 140113106646848 conservation.py:152] HHblits'ing /mnt/Input/interim/external_feats/he/work/4heq_r_u.pdb-1-B.fa I0621 12:55:37.966540 139977373710144 run_docker.py:135] I0621 12:55:37.965222 140113106646848 conservation.py:238] For 1 profile HMMs generated from 4heq_r_u.pdb, spent 34.77 blitsing, 00.00 writing, and 34.77 overall. I0621 12:55:37.966913 139977373710144 run_docker.py:135] I0621 12:55:37.965721 140113106646848 deepinteract_utils.py:722] Starting postprocessing for all unprocessed pairs in /mnt/Input/interim/pairs I0621 12:55:37.967144 139977373710144 run_docker.py:135] I0621 12:55:37.965833 140113106646848 deepinteract_utils.py:729] Looking for all pairs in /mnt/Input/interim/pairs I0621 12:55:37.972153 139977373710144 run_docker.py:135] I0621 12:55:37.971457 140113106646848 deepinteract_utils.py:743] Found 1 work pair(s) in /mnt/Input/interim/pairs I0621 12:55:37.972460 139977373710144 run_docker.py:135] I0621 12:55:37.971827 140113106646848 parallel.py:46] Processing 1 inputs. I0621 12:55:37.972671 139977373710144 run_docker.py:135] I0621 12:55:37.971918 140113106646848 parallel.py:62] Sequential Mode. I0621 12:55:41.316108 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/Bio/PDB/vectors.py:357: RuntimeWarning: invalid value encountered in double_scalars I0621 12:55:41.316489 139977373710144 run_docker.py:135] c = (self * other) / (n1 * n2) I0621 12:55:41.316720 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/Bio/PDB/vectors.py:357: RuntimeWarning: invalid value encountered in double_scalars I0621 12:55:41.316918 139977373710144 run_docker.py:135] c = (self * other) / (n1 * n2) I0621 12:55:41.317103 139977373710144 run_docker.py:135] I0621 12:55:41.314721 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 9 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.336150 139977373710144 run_docker.py:135] I0621 12:55:41.335281 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 13 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.356255 139977373710144 run_docker.py:135] I0621 12:55:41.355384 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 17 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.427843 139977373710144 run_docker.py:135] I0621 12:55:41.426913 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 30 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.506287 139977373710144 run_docker.py:135] I0621 12:55:41.505459 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 45 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.526054 139977373710144 run_docker.py:135] I0621 12:55:41.525439 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 49 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.560204 139977373710144 run_docker.py:135] I0621 12:55:41.559483 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 56 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.585425 139977373710144 run_docker.py:135] I0621 12:55:41.584762 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 61 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.686717 139977373710144 run_docker.py:135] I0621 12:55:41.686072 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 82 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.720789 139977373710144 run_docker.py:135] I0621 12:55:41.720090 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 89 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.735666 139977373710144 run_docker.py:135] I0621 12:55:41.735008 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 92 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.745175 139977373710144 run_docker.py:135] I0621 12:55:41.744497 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 94 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.788900 139977373710144 run_docker.py:135] I0621 12:55:41.788190 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 103 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.852835 139977373710144 run_docker.py:135] I0621 12:55:41.852125 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 116 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:41.909060 139977373710144 run_docker.py:135] I0621 12:55:41.908362 140113106646848 dips_plus_utils.py:536] Normal vector missing for df0 residue 128 in chain A in file /mnt/Input/raw/he/4heq_l_u.pdb I0621 12:55:42.041210 139977373710144 run_docker.py:135] I0621 12:55:42.040462 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 9 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.059900 139977373710144 run_docker.py:135] I0621 12:55:42.059195 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 13 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.079119 139977373710144 run_docker.py:135] I0621 12:55:42.078433 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 17 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.143947 139977373710144 run_docker.py:135] I0621 12:55:42.143125 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 30 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.215802 139977373710144 run_docker.py:135] I0621 12:55:42.215100 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 45 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.234920 139977373710144 run_docker.py:135] I0621 12:55:42.234243 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 49 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.267936 139977373710144 run_docker.py:135] I0621 12:55:42.267218 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 56 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.292104 139977373710144 run_docker.py:135] I0621 12:55:42.291366 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 61 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.392189 139977373710144 run_docker.py:135] I0621 12:55:42.391432 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 82 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.426449 139977373710144 run_docker.py:135] I0621 12:55:42.425676 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 89 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.440944 139977373710144 run_docker.py:135] I0621 12:55:42.440204 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 92 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.450548 139977373710144 run_docker.py:135] I0621 12:55:42.449811 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 94 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.494798 139977373710144 run_docker.py:135] I0621 12:55:42.493852 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 103 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.557198 139977373710144 run_docker.py:135] I0621 12:55:42.556405 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 116 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.614418 139977373710144 run_docker.py:135] I0621 12:55:42.613722 140113106646848 dips_plus_utils.py:630] Normal vector missing for df1 residue 128 in chain B in file /mnt/Input/raw/he/4heq_r_u.pdb I0621 12:55:42.718224 139977373710144 run_docker.py:135] I0621 12:55:42.717459 140113106646848 deepinteract_utils.py:773] Imputing missing feature values for given inputs I0621 12:55:42.719177 139977373710144 run_docker.py:135] I0621 12:55:42.718771 140113106646848 parallel.py:46] Processing 31 inputs. I0621 12:55:42.719329 139977373710144 run_docker.py:135] I0621 12:55:42.718858 140113106646848 parallel.py:62] Sequential Mode. I0621 12:55:48.405303 139977373710144 run_docker.py:135] I0621 12:55:48.404334 140113106646848 lit_model_predict_docker.py:99] Loading complex for prediction, l_chain: /mnt/input_pdbs/4heq_l_u.pdb, r_chain: /mnt/input_pdbs/4heq_r_u.pdb I0621 12:55:49.322944 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric AUROC will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. I0621 12:55:49.323281 139977373710144 run_docker.py:135] warnings.warn(*args, **kwargs) I0621 12:55:49.323480 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric AveragePrecision will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. I0621 12:55:49.323687 139977373710144 run_docker.py:135] warnings.warn(*args, **kwargs) I0621 12:55:49.323897 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:792: UserWarning: You are running on single node with no parallelization, so distributed has no effect. I0621 12:55:49.324064 139977373710144 run_docker.py:135] rank_zero_warn("You are running on single node with no parallelization, so distributed has no effect.") I0621 12:55:49.324226 139977373710144 run_docker.py:135] GPU available: False, used: False I0621 12:55:49.324383 139977373710144 run_docker.py:135] TPU available: False, using: 0 TPU cores I0621 12:55:49.324540 139977373710144 run_docker.py:135] IPU available: False, using: 0 IPUs I0621 12:55:49.377713 139977373710144 run_docker.py:135] Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase) I0621 12:55:49.378072 139977373710144 run_docker.py:135] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, predict dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 64 which is the number of cpus on this machine) in theDataLoader` init to improve performance. I0621 12:55:49.378295 139977373710144 run_docker.py:135] rank_zero_warn( I0621 12:55:50.089681 139977373710144 run_docker.py:135] I0621 12:55:50.088823 140113106646848 lit_model_predict_docker.py:298] Saved predicted contact probability map for 4heq as /mnt/input_pdbs/4heq_contact_prob_map.npy I0621 12:55:50.092932 139977373710144 run_docker.py:135] I0621 12:55:50.092391 140113106646848 lit_model_predict_docker.py:307] Saved learned node representations for the first chain graph of 4heq as /mnt/input_pdbs/4heq_graph1_node_feats.npy I0621 12:55:50.093075 139977373710144 run_docker.py:135] I0621 12:55:50.092499 140113106646848 lit_model_predict_docker.py:308] Saved learned edge representations for the first chain graph of 4heq as /mnt/input_pdbs/4heq_graph1_edge_feats.npy I0621 12:55:50.093154 139977373710144 run_docker.py:135] I0621 12:55:50.092547 140113106646848 lit_model_predict_docker.py:309] Saved learned node representations for the second chain graph of 4heq as /mnt/input_pdbs/4heq_graph2_node_feats.npy I0621 12:55:50.093222 139977373710144 run_docker.py:135] I0621 12:55:50.092598 140113106646848 lit_model_predict_docker.py:310] Saved learned edge representations for the second chain graph of 4heq as /mnt/input_pdbs/4heq_graph2_edge_feats.npy Predicting: 100%|██████████| 1/1 [00:00<00:00, 1.41it/s]Predicting: 0it [00:00, ?it/s]

This results in the final generated dill file not working properly

opened by terry-r123 8

Releases(1.1.0)

1.1.0(Feb 26, 2022)
This release provides full support for the Docking Benchmark 5 (DB5) dataset within the DeepInteract training, fine-tuning, and testing pipeline.

This release also adds DB5 download links to reference the preprocessed version of the DB5 dataset available on Zenodo.

Source code(tar.gz)
Source code(zip)
1.0.9(Jan 20, 2022)

This release includes a small bug fix in graph construction.
Source code(tar.gz)
Source code(zip)
1.0.8(Dec 21, 2021)
This release includes a few additions and updates:

Inclusion of top-k recall metric in our training pipeline

Addition of DB5 fine-tuned PyTorch LighntingModule checkpoint

Updates to Zenodo links to accommodate new Zenodo dataset version

Source code(tar.gz)
Source code(zip)
1.0.7(Nov 14, 2021)
This release includes a small feature enhancement:

Allow features for multiple chains in a single ligand/receptor to be postprocessed (i.e., add multimer feature postprocessing support)

Source code(tar.gz)
Source code(zip)
1.0.5-1.0.6(Nov 2, 2021)
This release of DeepInteract includes 4 enhancements/bug fixes inside the Docker inference pipeline:

Ensure PDB codes match for input pairs

Restore original chain IDs to EVCoupling input chain DataFrames

Install missing chain IDs into input PDB files

Halt execution of the inference pipeline if either input PDB is not found

Source code(tar.gz)
Source code(zip)
1.0.4(Nov 2, 2021)
This release of DeepInteract includes 1 enhancement/bug fix inside the Docker inference pipeline:

Install sorting and hard-indexing logic to ensure PSAIA feature DataFrames are collected and used in the correct order during inference

Source code(tar.gz)
Source code(zip)
1.0.3(Nov 1, 2021)
This release of DeepInteract includes 1 enhancement inside the Docker inference pipeline:

Recover missing PDB chain IDs in the data preprocessing pipeline (prior to inference)

Source code(tar.gz)
Source code(zip)
1.0.2(Nov 1, 2021)
This release of DeepInteract addresses 3 bugs and removal of deprecated logic inside the Docker inference pipeline:

Allow input PDB filenames to use any naming convention

Ensure that feature imputation occurs prior to each prediction by making the feature imputation function name unique

Fix file not found error for complex.dill

Remove deprecated use_dgl argument and its associated logic

Source code(tar.gz)
Source code(zip)
1.0.1(Oct 8, 2021)

This release marks the public unveiling of DeepInteract.
Source code(tar.gz)
Source code(zip)

A geometric deep learning pipeline for predicting protein interface contacts.

Related tags

Overview

DeepInteract

Description

Citing this work

First time setup

Genetic databases

Install the BFD for HH-suite3

(Smaller Alternative) Install the Small BFD for HH-suite3

(Smaller Alternative) Install Uniclust30 for HH-suite3

Repository Directory Structure

Running DeepInteract via Docker

Running DeepInteract via a Traditional Installation (for Linux-Based Operating Systems)

Installing PSAIA

Training

Download training and cross-validation DGLGraphs

Inference

Download trained model checkpoint

Predict interface contact probability maps

Acknowledgements

License and Disclaimer

DeepInteract Code License

Third-party software

Comments

Releases(1.1.0)

1.1.0(Feb 26, 2022)

1.0.9(Jan 20, 2022)

1.0.8(Dec 21, 2021)

1.0.7(Nov 14, 2021)

1.0.5-1.0.6(Nov 2, 2021)

1.0.4(Nov 2, 2021)

1.0.3(Nov 1, 2021)

1.0.2(Nov 1, 2021)

1.0.1(Oct 8, 2021)

Owner

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

A Protein-RNA Interface Predictor Based on Semantics of Sequences

Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Geometric Deep Learning Extension Library for PyTorch

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Uni-Fold: Training your own deep protein-folding models

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

Contains code for Deep Kernelized Dense Geometric Matching

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

Geometric Vector Perceptron --- a rotation-equivariant GNN for learning from biomolecular structure

Geometric Vector Perceptrons --- a rotation-equivariant GNN for learning from biomolecular structure

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Tweet Sentiment Maching Learning and streamlit

Generative Models for Graph-Based Protein Design

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle