MolRep: A Deep Representation Learning Library for Molecular Property Prediction

AI-Health @NSCC-gz

Last update: Dec 24, 2022

Related tags

Deep Learning MolRep

Overview

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

MolRep is a Python package for fairly measuring algorithmic progress on chemical property datasets. It currently provides a complete re-evaluation of 16 state-of-the-art deep representation models over 16 benchmark property datsaets.

If you found this package useful, please cite biorxiv for now:

Install & Usage

We provide a script to install the environment. You will need the conda package manager, which can be installed from here.

To install the required packages, follow there instructions (tested on a linux terminal):

clone the repository

git clone https://github.com/Jh-SYSU/MolRep
cd into the cloned directory

cd MolRep
run the install script

source install.sh [<your_cuda_version>]

Where <your_cuda_version> is an optional argument that can be either cpu, cu92, cu100, cu101. If you do not provide a cuda version, the script will default to cpu. The script will create a virtual environment named MolRep, with all the required packages needed to run our code. Important: do NOT run this command using bash instead of source!

Data

Data could be download from Google_Driver

Current Dataset

Dataset	Task	Task type	#Molecule	Splits	Metric	Reference
QM7	1	Regression	7160	Stratified	MAE	Wu et al.
QM8	12	Regression	21786	Random	MAE	Wu et al.
QM9	12	Regression	133885	Random	MAE	Wu et al.
ESOL	1	Regression	1128	Random	RMSE	Wu et al.
FreeSolv	1	Regression	642	Random	RMSE	Wu et al.
Lipophilicity	1	Regression	4200	Random	RMSE	Wu et al.
BBBP	1	Classification	2039	Scaffold	ROC-AUC	Wu et al.
Tox21	12	Classification	7831	Random	ROC-AUC	Wu et al.
SIDER	27	Classification	1427	Random	ROC-AUC	Wu et al.
ClinTox	2	Classification	1478	Random	ROC-AUC	Wu et al.
Liver injury	1	Classification	2788	Random	ROC-AUC	Xu et al.
Mutagenesis	1	Classification	6511	Random	ROC-AUC	Hansen et al.
hERG	1	Classification	4813	Random	ROC-AUC	Li et al.
MUV	17	Classification	93087	Random	PRC-AUC	Wu et al.
HIV	1	Classification	41127	Random	ROC-AUC	Wu et al.
BACE	1	Classification	1513	Random	ROC-AUC	Wu et al.

Methods

Current Methods

Self-/unsupervised Models

Methods	Descriptions	Reference
Mol2Vec	Mol2Vec is an unsupervised approach to learns vector representations of molecular substructures that point in similar directions for chemically related substructures.	Jaeger et al.
N-Gram graph	N-gram graph is a simple unsupervised representation for molecules that first embeds the vertices in the molecule graph and then constructs a compact representation for the graph by assembling the ver-tex embeddings in short walks in the graph.	Liu et al.
FP2Vec	FP2Vec is a molecular featurizer that represents a chemical compound as a set of trainable embedding vectors and combine with CNN model.	Jeon et al.
VAE	VAE is a framework for training two neural networks (encoder and decoder) to learn a mapping from high-dimensional molecular representation into a lower-dimensional space.	Kingma et al.

Sequence Models

Methods	Descriptions	Reference
BiLSTM	BiLSTM is an artificial recurrent neural network (RNN) architecture to encoding sequences from compound SMILES strings.	Hochreiter et al.
SALSTM	SALSTM is a self-attention mechanism with improved BiLSTM for molecule representation.	Zheng et al
Transformer	Transformer is a network based solely on attention mechanisms and dispensing with recurrence and convolutions entirely to encodes compound SMILES strings.	Vaswani et al.
MAT	MAT is a molecule attention transformer utilized inter-atomic distances and the molecular graph structure to augment the attention mechanism.	Maziarka et al.

Graph Models

Methods	Descriptions	Reference
DGCNN	DGCNN is a deep graph convolutional neural network that proposes a graph convolution model with SortPooling layer which sorts graph vertices in a consistent order to learning the embedding of molec-ular graph.	Zhang et al.
GraphSAGE	GraphSAGE is a framework for inductive representation learning on molecular graphs that used to generate low-dimensional representations for atoms and performs sum, mean or max-pooling neigh-borhood aggregation to updates the atom representation and molecular representation.	Hamilton et al.
GIN	GIN is the Graph Isomorphism Network that builds upon the limitations of GraphSAGE to capture different graph structures with the Weisfeiler-Lehman graph isomorphism test.	Xu et al.
ECC	ECC is an Edge-Conditioned Convolution Network that learns a different parameter for each edge label (bond type) on the molecular graph, and neighbor aggregation is weighted according to specific edge parameters.	Simonovsky et al.
DiffPool	DiffPool combines a differentiable graph encoder with its an adaptive pooling mechanism that col-lapses nodes on the basis of a supervised criterion to learning the representation of molecular graphs.	Ying et al.
MPNN	MPNN is a message-passing graph neural network that learns the representation of compound molecular graph. It mainly focused on obtaining effective vertices (atoms) embedding	Gilmer et al.
D-MPNN	DMPNN is another message-passing graph neural network that messages associated with directed edges (bonds) rather than those with vertices. It can make use of the bond attributes.	Yang et al.
CMPNN	CMPNN is the graph neural network that improve the molecular graph embedding by strengthening the message interactions between edges (bonds) and nodes (atoms).	Song et al.

Training

To train a model by K-fold, run 5-fold-training_example.ipynb.

Testing

To test a pretrained model, run testing-example.ipynb.

Results

Results on Classification Tasks.

Datasets	BBBP	Tox21	SIDER	ClinTox	MUV	HIV	BACE
Mol2Vec	0.9213±0.0052	0.8139±0.0081	0.6043±0.0061	0.8572±0.0054	0.1178±0.0032	0.8413±0.0047	0.8284±0.0023
N-Gram graph	0.9012±0.0385	0.8371±0.0421	0.6482±0.0437	0.8753±0.0077	0.1011±0.0000	0.8378±0.0034	0.8472±0.0057
FP2Vec	0.8076±0.0032	0.8578±0.0076	0.6678±0.0068	0.8834±0.0432	0.0856±0.0031	0.7894±0.0052	0.8129±0.0492
VAE	0.8378±0.0031	0.8315±0.0382	0.6493±0.0762	0.8674±0.0124	0.0794±0.0001	0.8109±0.0381	0.8368±0.0762
BiLSTM	0.8391±0.0032	0.8279±0.0098	0.6092±0.0303	0.8319±0.0120	0.0382±0.0000	0.7962±0.0098	0.8263±0.0031
SALSTM	0.8482±0.0329	0.8253±0.0031	0.6308±0.0036	0.8317±0.0003	0.0409±0.0000	0.8034±0.0128	0.8348±0.0019
Transformer	0.9610±0.0119	0.8129±0.0013	0.6017±0.0012	0.8572±0.0032	0.0716±0.0017	0.8372±0.0314	0.8407±0.0738
MAT	0.9620±0.0392	0.8393±0.0039	0.6276±0.0029	0.8777±0.0149	0.0913±0.0001	0.8653±0.0054	0.8519±0.0504
DGCNN	0.9311±0.0434	0.7992±0.0057	0.6007±0.0053	0.8302±0.0126	0.0438±0.0000	0.8297±0.0038	0.8361±0.0034
GraphSAGE	0.9630±0.0474	0.8166±0.0041	0.6403±0.0045	0.9116±0.0146	0.1145±0.0000	0.8705±0.0724	0.9316±0.0360
GIN	0.8746±0.0359	0.8178±0.0031	0.5904±0.0000	0.8842±0.0004	0.0832±0.0000	0.8015±0.0328	0.8275±0.0034
ECC	0.9620±0.0003	0.8677±0.0090	0.6750±0.0092	0.8862±0.0831	0.1308±0.0013	0.8733±0.0025	0.8419±0.0092
DiffPool	0.8732±0.0391	0.8012±0.0130	0.6087±0.0130	0.8345±0.0233	0.0934±0.0001	0.8452±0.0042	0.8592±0.0391
MPNN	0.9321±0.0312	0.8440±0.014	0.6313±0.0121	0.8414±0.0294	0.0572±0.0001	0.8032±0.0092	0.8493±0.0013
DMPNN	0.9562±0.0070	0.8429±0.0391	0.6378±0.0329	0.8692±0.0051	0.0867±0.0032	0.8137±0.0072	0.8678±0.0372
CMPNN	0.9854±0.0215	0.8593±0.0088	0.6581±0.0020	0.9169±0.0065	0.1435±0.0002	0.8687±0.0003	0.8932±0.0019

More results will be updated soon.

Comments

Bump pyyaml from 5.3.1 to 5.4
⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.

Bumps pyyaml from 5.3.1 to 5.4.

Changelog

Sourced from pyyaml's changelog.

5.4 (2021-01-19)

yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA

yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader

yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup

yaml/pyyaml#392 -- Fix py2 copy support for timezone objects

yaml/pyyaml#378 -- Fix compatibility with Jython

Commits

58d0cb7 5.4 release

a60f7a1 Fix compatibility with Jython

ee98abd Run CI on PR base branch changes

ddf2033 constructor.timezone: _copy & deepcopy

fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers

a001f27 Fix for CVE-2020-14343

fe15062 Add 3.9 to appveyor file for completeness sake

1e1c7fb Add a newline character to end of pyproject.toml

0b6b7d6 Start sentences and phrases for capital letters

c976915 Shell code improvements

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump urllib3 from 1.25.11 to 1.26.5
Bumps urllib3 from 1.25.11 to 1.26.5.

Release notes

Sourced from urllib3's releases.

1.26.5

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Fixed deprecation warnings emitted in Python 3.10.

Updated vendored six library to 1.16.0.

Improved performance of URL parser when splitting the authority component.

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.4

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.3

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Fixed bytes and string comparison issue with headers (Pull #2141)

Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme (Pull #2107)

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.2

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

1.26.1

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

1.26.0

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning should opt-in explicitly by setting ssl_version=ssl.PROTOCOL_TLSv1_1 (Pull #2002) Starting in urllib3 v2.0: Connections that receive a DeprecationWarning will fail

Deprecated Retry options Retry.DEFAULT_METHOD_WHITELIST, Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST and Retry(method_whitelist=...) in favor of Retry.DEFAULT_ALLOWED_METHODS, Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT, and Retry(allowed_methods=...) (Pull #2000) Starting in urllib3 v2.0: Deprecated options will be removed

... (truncated)

Changelog

Sourced from urllib3's changelog.

1.26.5 (2021-05-26)

Fixed deprecation warnings emitted in Python 3.10.

Updated vendored six library to 1.16.0.

Improved performance of URL parser when splitting the authority component.

1.26.4 (2021-03-15)

Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

1.26.3 (2021-01-26)

Fixed bytes and string comparison issue with headers (Pull #2141)

Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme. (Pull #2107)

1.26.2 (2020-11-12)

Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

1.26.1 (2020-11-11)

Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

1.26.0 (2020-11-10)

NOTE: urllib3 v2.0 will drop support for Python 2. Read more in the v2.0 Roadmap <https://urllib3.readthedocs.io/en/latest/v2-roadmap.html>_.

Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning

... (truncated)

Commits

d161647 Release 1.26.5

2d4a3fe Improve performance of sub-authority splitting in URL

2698537 Update vendored six to 1.16.0

07bed79 Fix deprecation warnings for Python 3.10 ssl module

d725a9b Add Python 3.10 to GitHub Actions

339ad34 Use pytest==6.2.4 on Python 3.10+

f271c9c Apply latest Black formatting

1884878 [1.26] Properly proxy EOF on the SSLTransport test suite

a891304 Release 1.26.4

8d65ea1 Merge pull request from GHSA-5phf-pp7p-vc2r

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Feature extraction

Hi, Could you please tell how can I extract the embeddings learned by the model? How do I parse a .fasta file to the model and extract the representations?

Thanks

opened by xixinhy 0
doesn't Install with the install.sh in cpu version

Hello, I tried to install by following the instructions. But the yaml file can not be installed with many errors. Also, there is errors like "PyTorch is not compiled with the specified version of g++".

opened by mahimanzum 0
hyper-training_with_grid_search.ipynb failed due to MorganFP model is not in config dict

I tried to run notebook hyper-training_with_grid_search.ipynb but it complains about MorganFP is not in dictionary. In MolRep/util/config_from_dict.py, model "MorganFP" is not even mentioned.

opened by kopwei 0

Owner

AI-Health @NSCC-gz

GitHub

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

75 Dec 15, 2022

Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction Official Pytorch implementation of NeurIPS'21 paper "Motif-based Graph Se

71 Dec 20, 2022

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

3 Oct 14, 2022

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

656 Dec 29, 2022

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

90 Jan 8, 2023

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

face property detection pytorch

This is the face property train code of project face-detection-project

2 Oct 18, 2021

chainladder - Property and Casualty Loss Reserving in Python

chainladder (python) chainladder - Property and Casualty Loss Reserving in Python This package gets inspiration from the popular R ChainLadder package

130 Dec 7, 2022

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

71 Nov 15, 2022

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

55 Nov 29, 2022

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

1 Dec 29, 2021

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

117 Dec 9, 2022

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations This directory contains the model architectures and experimental

35 Dec 5, 2022

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Related tags

Overview

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

Install & Usage

Data

Current Dataset

Methods

Current Methods

Self-/unsupervised Models

Sequence Models

Graph Models

Training

Testing

Results

Results on Classification Tasks.

Comments

Bump pyyaml from 5.3.1 to 5.4

Bump urllib3 from 1.25.11 to 1.26.5

1.26.5

1.26.4

1.26.3

1.26.2

1.26.1

1.26.0

1.26.5 (2021-05-26)

1.26.4 (2021-03-15)

1.26.3 (2021-01-26)

1.26.2 (2020-11-12)

1.26.1 (2020-11-11)

1.26.0 (2020-11-10)

Feature extraction

doesn't Install with the install.sh in cpu version

hyper-training_with_grid_search.ipynb failed due to MorganFP model is not in config dict

Owner

AI-Health @NSCC-gz

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

3D-Transformer: Molecular Representation with Transformer in 3D Space

face property detection pytorch

chainladder - Property and Casualty Loss Reserving in Python

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

Eff video representation - Efficient video representation through neural fields

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Doge-Prediction - Coding Club prediction ig

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Differentiable molecular simulation of proteins with a coarse-grained potential