Molecular AutoEncoder in PyTorch

Overview

MolEncoder

Molecular AutoEncoder in PyTorch

Install

$ git clone https://github.com/cxhernandez/molencoder.git && cd molencoder
$ python setup.py install

Download Dataset

$ molencoder download --dataset chembl22

Train

$ molencoder train --dataset data/chembl22.h5

Add --cuda flag to enable CUDA. Add --cont to continue training a model from a checkpoint file.

Pre-Trained Model

A pre-trained reference model is available in the ref/ directory. Currently, it performs with ~98% accuracy on the validation set after 100 epochs of training. However, if you succeed at training a better model, feel free to submit a pull request!

TODO

  • Implement encoder
  • Implement decoder
  • Add download command
  • Add train command
  • Add encode command
  • Add decode command
  • Add pre-trained model

Shoutouts

Comments
  • GRU object has no attribute weight

    GRU object has no attribute weight

    Hello,

    I tried to train the model but I got this error

    Traceback (most recent call last): File "/opt/anaconda3/bin/molencoder", line 9, in load_entry_point('molencoder==0.1a0', 'console_scripts', 'molencoder')() File "/opt/anaconda3/lib/python3.5/site-packages/molencoder-0.1a0-py3.5.egg/molencoder/cli/main.py", line 32, in main File "/opt/anaconda3/lib/python3.5/site-packages/molencoder-0.1a0-py3.5.egg/molencoder/cli/main.py", line 37, in args_func File "/opt/anaconda3/lib/python3.5/site-packages/molencoder-0.1a0-py3.5.egg/molencoder/cli/parser_train.py", line 31, in func File "/opt/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 136, in apply module.apply(fn) File "/opt/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 137, in apply fn(self) File "/opt/anaconda3/lib/python3.5/site-packages/molencoder-0.1a0-py3.5.egg/molencoder/utils.py", line 180, in initialize_weights File "/opt/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 238, in getattr type(self).name, name)) AttributeError: 'GRU' object has no attribute 'weight'

    I had to install pytorch via the conda install. The version is pytorch-0.1.12 and I am using python 3.5.2.

    Thanks

    opened by fvanclef 3
  • An unexpected error has occurred with osprey (version 0.1a)

    An unexpected error has occurred with osprey (version 0.1a)

    (dl3) dendisuhubdy@grok-machine:~/data$ molencoder train --dataset molecules/chembl22.h5 --cuda
    Epoch 0:
    t = 100, loss = 0.0532
    t = 200, loss = 0.0467
    t = 300, loss = 0.0473
    t = 400, loss = 0.0473
    t = 500, loss = 0.0468
    t = 600, loss = 0.0469
    t = 700, loss = 0.0467
    t = 800, loss = 0.0474
    t = 900, loss = 0.0470
    t = 1000, loss = 0.0456
    t = 1100, loss = 0.0481
    t = 1200, loss = 0.0443
    t = 1300, loss = 0.0485
    t = 1400, loss = 0.0474
    t = 1500, loss = 0.0465
    t = 1600, loss = 0.0469
    t = 1700, loss = 0.0460
    t = 1800, loss = 0.0453
    t = 1900, loss = 0.0476
    t = 2000, loss = 0.0466
    t = 2100, loss = 0.0454
    t = 2200, loss = 0.0471
    t = 2300, loss = 0.0462
    t = 2400, loss = 0.0458
    t = 2500, loss = 0.0464
    t = 2600, loss = 0.0475
    t = 2700, loss = 0.0474
    t = 2800, loss = 0.0442
    t = 2900, loss = 0.0463
    t = 3000, loss = 0.0455
    t = 3100, loss = 0.0458
    t = 3200, loss = 0.0457
    t = 3300, loss = 0.0470
    t = 3400, loss = 0.0467
    t = 3500, loss = 0.0450
    t = 3600, loss = 0.0473
    t = 3700, loss = 0.0471
    t = 3800, loss = 0.0471
    t = 3900, loss = 0.0461
    t = 4000, loss = 0.0497
    An unexpected error has occurred with osprey (version 0.1a), please
    consider sending the following traceback to the osprey GitHub issue tracker at:
            https://github.com/cxhernandez/molencoder/issues
    
    Traceback (most recent call last):
      File "/u/suhubdyd/.conda/envs/dl3/bin/molencoder", line 11, in <module>
        load_entry_point('molencoder==0.1a0', 'console_scripts', 'molencoder')()
      File "/u/suhubdyd/.conda/envs/dl3/lib/python3.6/site-packages/molencoder-0.1a0-py3.6.egg/molencoder/cli/main.py", line 33, in main
      File "/u/suhubdyd/.conda/envs/dl3/lib/python3.6/site-packages/molencoder-0.1a0-py3.6.egg/molencoder/cli/main.py", line 38, in args_func
      File "/u/suhubdyd/.conda/envs/dl3/lib/python3.6/site-packages/molencoder-0.1a0-py3.6.egg/molencoder/cli/parser_train.py", line 55, in func
      File "/u/suhubdyd/.conda/envs/dl3/lib/python3.6/site-packages/molencoder-0.1a0-py3.6.egg/molencoder/utils.py", line 89, in validate_model
    ValueError: too many values to unpack (expected 2)
    
    opened by dendisuhubdy 2
  • Use our own dataset?

    Use our own dataset?

    Hi,

    Thanks for providing this great code.

    I am trying to use my own dataset of SMILES strings, I couldn't find the script for preprocessing these raw strings to the input features (I notice the data you provided is already being preprocessed). If you have any, Could you point that out to me? Thanks!

    opened by kexinhuang12345 1
  • Cannot find distribution for yaml in requirements.txt

    Cannot find distribution for yaml in requirements.txt

    Hi,

    I just ran into a problem installing on anaconda python3.6 on ubuntu, after python setup.py install I got the following error:

    Installed /home/scott/anaconda3/lib/python3.6/site-packages/molencoder-0.1a0-py3.6.egg
    Processing dependencies for molencoder==0.1a0
    Searching for yaml
    Reading https://pypi.python.org/simple/yaml/
    No local packages or working download links found for yaml
    error: Could not find suitable distribution for Requirement.parse('yaml')
    

    Though changing yaml to pyyaml in the requirements.txt file seems to have fixed it.

    Not sure if this is due to my python installation or a typo, but thought it might be useful to know.

    opened by Swarchal 1
  • misc + selu

    misc + selu

    This PR includes miscellaneous changes to the code base. Of note:

    • [x] Replace ReLUs with SELU for improved performance
    • [x] Add Xavier initialization
    • [x] Add reference model which has ~98% on validation set
    opened by cxhernandez 0
  • 404 error downloading

    404 error downloading

    Hi,

    when executing

    molencoder download --dataset  zinc12
    

    I am getting a 404 error

      File "/home/sanchezg/oxford/tools/molencoder/molencoder/cli/main.py", line 37, in args_func
        args.func(args, p)
      File "/home/sanchezg/oxford/tools/molencoder/molencoder/cli/parser_download.py", line 62, in func
        urllib.request.urlretrieve(uri, fd.name, reporthook=update)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 247, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 531, in open
        response = meth(req, response)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 641, in http_response
        'http', request, response, code, msg, hdrs)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 569, in error
        return self._call_chain(*args)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/home/sanchezg/app/anaconda3/envs/deepMols/lib/python3.7/urllib/request.py", line 649, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 404: NOT FOUND
    
    opened by rsanchezgarc 0
  • Filter dimension for the first conv1d operation

    Filter dimension for the first conv1d operation

    Hi! Thanks for the implementation

    I think there might be some problem when implementing the conv1d layer here:

    https://github.com/cxhernandez/molencoder/blob/1d7e208713d8e97683650b1bbc37a9fa298b4ce0/molencoder/models.py#L56

    I think the definition of o' andi' are swapped. The input filter dimension should the number of possible characters (35), and `c' should be the padded SMILES sequence length.

    Just want to confirm this detail. I can submit a PR if that is correct.

    opened by wwang2 0
  • Dimension error using pre-trained model

    Dimension error using pre-trained model

    RuntimeError: Error(s) in loading state_dict for MolEncoder: size mismatch for dense_1.0.weight: copying a param with shape torch.Size([435, 290]) from checkpoint, the shape in current model is torch.Size([435, 270]).

    Confirmed by another user in the comments of this article, who suspects the charset might be different? https://iwatobipen.wordpress.com/2018/02/18/mol-encoder-with-pytorch/#comments

    opened by cclough 0
  • Dataset processing issue `Error: generator raised StopIteration`

    Dataset processing issue `Error: generator raised StopIteration`

    I was trying to download the chembl22 dataset and preprocess but got the following error.

    $ molencoder download --dataset chembl22
    molencoder/.venv/lib/python3.7/site-packages/pandas-1.0.3-py3.7-linux-x86_64.egg/pandas/compat/__init__.py:117: UserWarning: Could not import the lzma module. Yo
    ur installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
      warnings.warn(msg)
    Downloading Dataset...
     98% |###########################################################################################################################################  | ETA:   0:00:00  18.0 MiB/s
    Loading Dataset...
    Processing Dataset...
    Saving Dataset...
    Error: generator raised StopIteration
    100% |#############################################################################################################################################| Time:  0:11:34 193.1 KiB/s
    

    Any idea how to fix it?

    opened by hassanmohsin 1
  • Issue with data_test

    Issue with data_test

    When I try to run the code, the following error was observed. Could you help with fixing the error? Many thanks. It relates to the absence of a test_data group in the dataset. Could possibly have something to do with the updated database. Screen Shot 2019-08-12 at 19 38 26

    opened by zhangsushen1992 5
Owner
Carlos Hernández
Research Scientist @facebookresearch
Carlos Hernández
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

MOSES 656 Dec 29, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

Daniil Gavrilov 347 Nov 14, 2022
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Few-Shot Graph Learning for Molecular Property Prediction

Few-shot Graph Learning for Molecular Property Prediction Introduction This is the source code and dataset for the following paper: Few-shot Graph Lea

Zhichun Guo 94 Dec 12, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

MilaGraph 117 Dec 9, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

International Business Machines 71 Nov 15, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

AstraZeneca - Molecular AI 72 Jan 2, 2023
Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

null 4 Jan 7, 2022
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

null 55 Dec 19, 2022
Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics

null 14 Nov 6, 2022