NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

Last update: Nov 26, 2022

Related tags

Overview

NeoDTI

NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions (Bioinformatics).

Recent Update 09/06/2018

L2 regularization is added.

Requirements

Tensorflow (tested on version 1.0.1 and version 1.2.0)
tflearn
numpy (tested on version 1.13.3 and version 1.14.0)
sklearn (tested on version 0.18.1 and version 0.19.0)

Quick start

To reproduce our results:

Unzip data.zip in ./data.
Run NeoDTI_cv.py to reproduce the cross validation results of NeoDTI. Options are:
-d: The embedding dimension d, default: 1024.
-n: Global norm to be clipped, default: 1.
-k: The dimension of project matrices, default: 512.
-r: Positive and negative. Two choices: ten and all, the former one sets the positive:negative = 1:10, the latter one considers all unknown DTIs as negative examples. Default: ten.
-t: Test scenario. The DTI matrix to be tested. Choices are: o, mat_drug_protein.txt will be tested; homo, mat_drug_protein_homo_protein_drug.txt will be tested; drug, mat_drug_protein_drug.txt will be tested; disease, mat_drug_protein_disease.txt will be tested; sideeffect, mat_drug_protein_sideeffect.txt will be tested; unique, mat_drug_protein_drug_unique.txt will be tested. Default: o.
Run NeoDTI_cv_with_aff.py to reproduce the cross validation results of NeoDTI with additional compound-protein binding affinity data. Options are:
-d: The embedding dimension d, default: 1024.
-n: Global norm to be clipped, default: 1.
-k: The dimension of project matrices, default: 512.

Data description

drug.txt: list of drug names.
protein.txt: list of protein names.
disease.txt: list of disease names.
se.txt: list of side effect names.
drug_dict_map: a complete ID mapping between drug names and DrugBank ID.
protein_dict_map: a complete ID mapping between protein names and UniProt ID.
mat_drug_se.txt : Drug-SideEffect association matrix.
mat_protein_protein.txt : Protein-Protein interaction matrix.
mat_drug_drug.txt : Drug-Drug interaction matrix.
mat_protein_disease.txt : Protein-Disease association matrix.
mat_drug_disease.txt : Drug-Disease association matrix.
mat_protein_drug.txt : Protein-Drug interaction matrix.
mat_drug_protein.txt : Drug-Protein interaction matrix.
Similarity_Matrix_Drugs.txt : Drug & compound similarity scores based on chemical structures of drugs ([0,708) are drugs, the rest are compounds).
Similarity_Matrix_Proteins.txt : Protein similarity scores based on primary sequences of proteins.
mat_drug_protein_homo_protein_drug.txt: Drug-Protein interaction matrix, in which DTIs with similar drugs (i.e., drug chemical structure similarities > 0.6) or similar proteins (i.e., protein sequence similarities > 40%) were removed (see the paper).
mat_drug_protein_drug.txt: Drug-Protein interaction matrix, in which DTIs with drugs sharing similar drug interactions (i.e., Jaccard similarities > 0.6) were removed (see the paper).
mat_drug_protein_sideeffect.txt: Drug-Protein interaction matrix, in which DTIs with drugs sharing similar side effects (i.e., Jaccard similarities > 0.6) were removed (see the paper).
mat_drug_protein_disease.txt: Drug-Protein interaction matrix, in which DTIs with drugs or proteins sharing similar diseases (i.e., Jaccard similarities > 0.6) were removed (see the paper).
mat_drug_protein_unique: Drug-Protein interaction matrix, in which known unique and non-unique DTIs were labelled as 3 and 1, respectively, the corresponding unknown ones were labelled as 2 and 0 (see the paper for the definition of unique).
mat_compound_protein_bindingaffinity.txt: Compound-Protein binding affinity matrix (measured by negative logarithm of Ki).

All entities (i.e., drugs, compounds, proteins, diseases and side-effects) are organized in the same order across all files. These files: drug.txt, protein.txt, disease.txt, se.txt, drug_dict_map, protein_dict_map, mat_drug_se.txt, mat_protein_protein.txt, mat_drug_drug.txt, mat_protein_disease.txt, mat_drug_disease.txt, mat_protein_drug.txt, mat_drug_protein.txt, Similarity_Matrix_Proteins.txt, are extracted from https://github.com/luoyunan/DTINet.

Contacts

If you have any questions or comments, please feel free to email Fangping Wan (wfp15[at]tsinghua[dot]org[dot]cn) and/or Jianyang Zeng (zengjy321[at]tsinghua[dot]edu[dot]cn).

You might also like...

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Efficient implementations of Product Quantization and its variants using Pytorch and CUDA

146 Dec 28, 2022

Comments

Covid-19 KG

Hi, is the dataset for your Covid-19 KG paper: Knowledge-Graph-Based Drug Repositioning against COVID-19 by Graph Convolutional Network with Attention Mechanism?

opened by SRL94 0
The code runs in an infinite loop

When I'm running the NeoDTI cv.py code,After running the code every time, before 2975 steps, it runs again from 0 steps, and keeps cycling, unable to print the final TXT result file. I would like to ask how to solve this problem, thank you for your help.The corresponding results are as follows:

step 2975 total and dtiloss 5052731.0 119.068855 valid auc aupr, 0.9600577877248452 0.8524570307967804 test auc aupr 0.9484424972853837 0.849887829154117 WARNING:tensorflow:From D:\miniconda3\envs\python_env_3.6\lib\site-packages\tensorflow\python\util\tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Use tf.global_variables_initializer instead. step 0 total and dtiloss 235253940.0 1625.3303 valid auc aupr, 0.40999680272833844 0.085343755629563 test auc aupr 0.4201471685810438 0.07260654527827701 step 25 total and dtiloss 12311560.0 1506.531 valid auc aupr, 0.504126848792734 0.10535242762988771 test auc aupr 0.4953750481625483 0.08972111425033276

opened by Bella165 0
Statistics of the knowledge graph

Hi, thanks for releasing the source code. Could you please give the statistics of the knowledge graph? How many entities and edges are in the knowledge graph? Thank you.

opened by SRL94 0

NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

Related tags

Overview

NeoDTI

Recent Update 09/06/2018

Requirements

Quick start

Data description

Contacts

You might also like...

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

discovering subdomains, hidden paths, extracting unique links

Deep learning-based approach to discovering Granger causality networks in multivariate time series

Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Optimal space decomposition based-product quantization for approximate nearest neighbor search

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

Comments

Covid-19 KG

The code runs in an infinite loop

Statistics of the knowledge graph

Owner

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

DIR-GNN - Discovering Invariant Rationales for Graph Neural Networks

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

git《Self-Attention Attribution: Interpreting Information Interactions Inside Transformer》(AAAI 2021) GitHub:

AoT is a system for automatically generating off-target test harness by using build information.

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Implementation of Heterogeneous Graph Attention Network