Generative Models for Graph-Based Protein Design

Overview

Graph-Based Protein Design

This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay and Tommi Jaakkola, NeurIPS 2019.

Our approach 'designs' protein sequences for target 3D structures via a graph-conditioned, autoregressive language model:

Overview

  • struct2seq/ contains model code
  • experiments/ contains scripts for training and evaluating the model
  • data/ contains scripts for building and processing datasets in the paper

Requirements

  • Python >= 3.0
  • PyTorch >= 1.0
  • Numpy

Citation

@inproceedings{ingraham2019generative,
author = {Ingraham, John and Garg, Vikas K and Barzilay, Regina and Jaakkola, Tommi},
title = {Generative Models for Graph-Based Protein Design},
booktitle = {Advances in Neural Information Processing Systems}
year = {2019}
}
Comments
  • #Bug

    #Bug

    Dear author,

    In your github , the neurips19-graph-protein-design has a build_chain_dataset.py file under the data directory, and a file named mmtf_utils.py in the same directory, but you code

    from util_mmtf import *

    in the build_chain_dataset.py, it seems not in keeping with the file name mmtf_utils.py。

    Maybe a bug?Or i misunderstand something?

    Many thanks, Jingru Gao

    opened by SharplessSword 0
  • Why invariance instead of equivariance?

    Why invariance instead of equivariance?

    Dear author,

    In your paper, the local reference frame was used for invariant edge information. Could you explain why you used invariance instead of equivariance?

    Many thanks, Lixin Yang

    opened by colormeblue1013 0
  • Bugs in StructureLoader

    Bugs in StructureLoader

    There might be a minor bug in data.StructureLoader. Some samples will not be included in the loader, so the dataset size is not exactly the same as that reported in the paper.

    for ix in sorted_ix:
        size = self.lengths[ix]
        if size * (len(batch) + 1) <= self.batch_size:
            batch.append(ix)
            batch_max = size
        else:
            clusters.append(batch)
            batch, batch_max = [], 0
    
    opened by veghen 0
  • no 'GraphAttention' function implementation

    no 'GraphAttention' function implementation

    class SequenceModel(nn.Module): def init(self, num_letters, hidden_dim, num_layers=3, vocab=20, top_k=30, num_positional_embeddings=16): """ Graph labeling network """ super(SequenceModel, self).init()

        # Hyperparameters
        self.top_k = top_k
        self.hidden_dim = hidden_dim
        self.positional_embeddings = PositionalEmbeddings(num_positional_embeddings)
    
        # Embedding layers
        self.W_e = nn.Linear(num_positional_embeddings, hidden_dim, bias=True)
        self.W_s = nn.Embedding(vocab, hidden_dim)
    
        # Decoder
        self.decoder_layers = nn.ModuleList([
            **GraphAttention(hidden_dim, hidden_dim*3)**
            for _ in range(2 * num_layers)
        ])
        self.W_out = nn.Linear(hidden_dim, num_letters, bias=True)
    
        # Initialization
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)
    
    opened by jiyanbio 1
  • no data/cath

    no data/cath

    runfile('/Users/hmh2017/Downloads/neurips19-graph-protein-design-master/experiments/train_s2s.py', wdir='/Users/hmh2017/Downloads/neurips19-graph-protein-design-master/experiments') Number of parameters: 1526164 Traceback (most recent call last):

    File "/Users/hmh2017/Downloads/neurips19-graph-protein-design-master/experiments/train_s2s.py", line 21, in dataset = data.StructureDataset(args.file_data, truncate=None, max_length=500)

    File "../struct2seq/data.py", line 16, in init with open(jsonl_file) as f:

    FileNotFoundError: [Errno 2] No such file or directory: '../data/cath/chain_set.jsonl'

    opened by minghuihuang 1
Owner
John Ingraham
Postdoc
John Ingraham
Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

GVP Transformer (wip) Implementation of the GVP-Transformer, which was used in the paper Learning inverse folding from millions of predicted structure

Phil Wang 19 May 6, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Tristan Croll 24 Nov 23, 2022
Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training Code for our paper "Predicting lncRNA–protein interactio

zhanglabNKU 1 Nov 29, 2022
DeepCAD: A Deep Generative Network for Computer-Aided Design Models

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

YeongHyeon Park 9 Oct 25, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

null 12 Oct 28, 2022
Uni-Fold: Training your own deep protein-folding models

Uni-Fold: Training your own deep protein-folding models. This package provides an implementation of a trainable, Transformer-based deep protein foldin

DP Technology 187 Jan 4, 2023
RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

LightOn 69 Dec 22, 2022
Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Rbc-traps-generative-design - The generative design for single red clood cell hydrodynamic traps using GEFEST framework

Natural Systems Simulation Lab 4 Jun 16, 2022
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

李优 0 Mar 25, 2022
This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models. Because improving the PSNR index is not compatible with subjective effects, we hope this part of work and our academic research are independent of each other.

hzwer 190 Jan 8, 2023
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

Phil Wang 108 Nov 23, 2022
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

null 8 Jul 9, 2021
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 1, 2022
Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

Eric Alcaide 36 Nov 29, 2022
A geometric deep learning pipeline for predicting protein interface contacts.

A geometric deep learning pipeline for predicting protein interface contacts.

null 44 Dec 30, 2022
A package to predict protein inter-residue geometries from sequence data

trRosetta This package is a part of trRosetta protein structure prediction protocol developed in: Improved protein structure prediction using predicte

Ivan Anishchenko 185 Jan 7, 2023