3D-Transformer: Molecular Representation with Transformer in 3D Space

Overview

Introduction

This is the repository for 3D-Transformer.

Dataset

Quantum Chemistry

QM7 Dataset
Download (Official Website): http://quantum-machine.org/datasets/
Discription (DeepChem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html#qm7-datasets

QM8 Dataset
Download (DeepChem): https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm8_datasets.py
Discription (DeepChem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html?highlight=qm7#qm8-datasets

QM9 Dataset Download (Atom3D): https://www.atom3d.ai/smp.html
Download (Official Website): https://ndownloader.figshare.com/files/3195389
Download (MPNN Supplement): https://drive.google.com/file/d/0Bzn36Iqm8hZscHFJcVh5aC1mZFU/view?resourcekey=0-86oyPL3e3l2ZTiRpwtPDBg
Download (Schnet): https://schnetpack.readthedocs.io/en/stable/tutorials/tutorial_02_qm9.html#Loading-the-data

GEOM-QM9 Dataset Download (Official Website): https://doi.org/10.7910/DVN/JNGTDF Tutorial of usage: https://github.com/learningmatter-mit/geom/blob/master/tutorials/01_loading_data.ipynb

Material Science

COREMOF
Download (Google Drive): https://drive.google.com/drive/folders/1DMmjL-JNgUWQDU-52_DT_cX-XWNEEi-W?usp=sharing
Reproduction of PointNet++: python coremof/reproduce/main_pn_coremof.py
Reproduction of MPNN: python coremof/reproduce/main_mpnn_coremof.py
Repredoction of SchNet: (1) load COREMOF python coremof/reproduce/main_sch_coremof.py
(2) run SchNet spk_run.py train schnet custom ../../coremof.db ./coremof --split 900 100 --property LCD --features 16 --batch_size 20 --cuda
(Note: official script of Schnet cannot be reproduced successfully due to the memory limitation.)

Protein

PDBbind
Atom3d: https://github.com/drorlab/atom3d
(1) download 'split-by-sequence-identity-30' dataset from https://www.atom3d.ai/
(2) install atom3D pip install atom3d
(3) preprocess the data by running python pdbbind/dataloader_pdb.py

Models

models/tr_spe: 3D-Transformer with Sinusoidal Position Encoding (SPE)
models/tr_cpe: 3D-Transformer with Convolutional Position Encoding (CPE)
models/tr_msa: 3D-Transformer with Multi-scale Self-attention (MSA)
models/tr_afps: 3D-Transformer with Attentive Farthest Point Sampling (AFPS)
models/tr_full: 3D-Transformer with CPE + MAS + AFPS

You might also like...
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Code for the paper
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

DockStream: A Docking Wrapper to Enhance De Novo Molecular Design
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

Molecular AutoEncoder in PyTorch

MolEncoder Molecular AutoEncoder in PyTorch Install $ git clone https://github.com/cxhernandez/molencoder.git && cd molencoder $ python setup.py insta

Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Comments
  • type of dist_bar in tr_msa

    type of dist_bar in tr_msa

    Hi,

    I have difficulties with launching build_model function from tr_msa module. Can you explain what is dist_bar parameter meaning and how it should be constructed?

    Supposing it is a list of distances between atoms in pos from your example, I do as follows:

    def dist(l, r):
      return ((l[0]-r[0])**2 + (l[1]-r[1])**2 + (l[2]-r[2])**2)**0.5
    
    dst = [[dist(l, r) for r in data] for l in data]
    

    , where data is 4x3 List[List[float]] of coordinates from pos tensor.

    Then:

    from model.tr_msa import build_model
    
    model = build_model(N, n, dst).cuda()
    

    The model returned accept third argument dist as tensor, not list, so I do as follows:

    tens_dist = torch.tensor(dst).cuda()
    out = model(x, mask, tens_dist)
    

    , where x and mask as in the example code. The model computation fails with error:

    RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8, 1, 1, 1], but got 3-dimensional input of size [4, 1, 4] instead
    
    opened by SimonTsirikov 5
  • incompatible matrixes' sizes when using tr_all

    incompatible matrixes' sizes when using tr_all

    Hi,

    With updated version I can succesfully launch either tr_spe, tr_msa and tr_afps separately, but not with tr_all:

    Error log
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-115-397097d7905a>](https://localhost:8080/#) in <module>()
          3 
          4
    ----> 5 out = model(x.long(), mask, dist)
          6 
          7 print(out)
    
    8 frames
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/content/model/tr_all.py](https://localhost:8080/#) in forward(self, src, src_mask, dist)
         29 
         30     def forward(self, src, src_mask, dist):
    ---> 31         return self.generator(self.encoder(self.src_embed(src), dist, src_mask)[:, 0])
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/content/model/tr_spe.py](https://localhost:8080/#) in forward(self, x)
        269 
        270     def forward(self, x):
    --> 271         return self.proj(x)
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py](https://localhost:8080/#) in forward(self, input)
        139     def forward(self, input):
        140         for module in self:
    --> 141             input = module(input)
        142         return input
        143 
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
       1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1101                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1102             return forward_call(*input, **kwargs)
       1103         # Do not call functions when jit is used
       1104         full_backward_hooks, non_full_backward_hooks = [], []
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py](https://localhost:8080/#) in forward(self, input)
        101 
        102     def forward(self, input: Tensor) -> Tensor:
    --> 103         return F.linear(input, self.weight, self.bias)
        104 
        105     def extra_repr(self) -> str:
    
    [/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in linear(input, weight, bias)
       1846     if has_torch_function_variadic(input, weight, bias):
       1847         return handle_torch_function(linear, (input, weight, bias), input, weight, bias=bias)
    -> 1848     return torch._C._nn.linear(input, weight, bias)
       1849 
       1850 
    
    RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1 and 512x512)
    

    I have tried to investigate reasons and found that tr_afps and consequently tr_all have slightly different last part of encoder layers than others (information is got with print(model)):

    AFPS
    (norm): LayerNorm()
    (dropout): Dropout(p=0.1, inplace=False)
    (sublayer): SublayerConnection(
      (norm): LayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    
    others
    (sublayer): ModuleList(
      (0): SublayerConnection(
        (norm): LayerNorm()
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (1): SublayerConnection(
        (norm): LayerNorm()
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    

    I don't know is it really matter and how to fix it, a problem could be in a different place, but I supposed that there is a trouble in the connection between modules.

    opened by SimonTsirikov 1
  • about Molformer

    about Molformer

    hello, thanks for your harding work, i am reading your new work about Molformer project. But i can not get the project at https://github.com/smiles724/Molformer.

    opened by JWSunny 1
  • replicated script request

    replicated script request

    Do you plan to share a replicated script like PA-Graph-Transformer did. For example, a train.py script for the QM7 dataset and achieve around your best score 43.9

    opened by veya2ztn 1
Owner
NLP learner and researcher, a master's student at Columbia University
null
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

MOSES 656 Dec 29, 2022
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 7, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

CORE This is the official PyTorch implementation for the paper: Yupeng Hou, Binbin Hu, Zhiqiang Zhang, Wayne Xin Zhao. CORE: Simple and Effective Sess

RUCAIBox 26 Dec 19, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Few-Shot Graph Learning for Molecular Property Prediction

Few-shot Graph Learning for Molecular Property Prediction Introduction This is the source code and dataset for the following paper: Few-shot Graph Lea

Zhichun Guo 94 Dec 12, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022