Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Phil Wang

Last update: Dec 1, 2022

You might also like...

Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

GVP Transformer (wip) Implementation of the GVP-Transformer, which was used in the paper Learning inverse folding from millions of predicted structure

19 May 6, 2022

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 8, 2022

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

77.2k Jan 2, 2023

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training Code for our paper "Predicting lncRNA–protein interactio

1 Nov 29, 2022

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

2 Dec 14, 2022

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

RITA: a Study on Scaling Up Generative Protein Sequence Models RITA is a family of autoregressive protein models, developed by a collaboration of Ligh

69 Dec 22, 2022

Generative Models for Graph-Based Protein Design

Graph-Based Protein Design This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay

159 Dec 15, 2022

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

8 Jul 9, 2021

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

24 Nov 23, 2022

Comments

protein bert uniref90 dataset
(discussed in discord)

after running the first step (create_uniref_db) of https://github.com/nadavbra/protein_bert I got a 24GB file "uniref_proteins_and_annotations.db" . It seems it could be useful for generate sequences for this project, sharing the links there

https://gitlab.com/rom1504/uniref data

colab to get the db and do a few queries https://colab.research.google.com/drive/1BGYEBDmD0yToLNou2T-t-QbJV5wCtIBz#scrollTo=21U3PpCp-pxr There are 135301051 records in the db, in a table looking like:

CREATE TABLE "protein_annotations" ( "index" INTEGER, "tax_id" REAL, "uniprot_name" TEXT, "go_annotations" TEXT, "flat_go_annotations" TEXT, "n_go_annotations" INTEGER, "complete_go_annotation_indices" TEXT, "n_complete_go_annotations" INTEGER );

Sample look like this:

| | index | tax_id | uniprot_name | go_annotations | flat_go_annotations | n_go_annotations | complete_go_annotation_indices | n_complete_go_annotations | |---:|--------:|-----------------:|:-----------------|:----------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------|-------------------:|:---------------------------------|----------------------------:| | 0 | 0 | 1.57204e+06 | A0A5A9P0L4_9TELE | {"GO Molecular Function": ["GO:0003755", "GO:0005524", "GO:0004672", "GO:0005509"], "GO Biological Process": [], "GO Cellular Component": []} | ["GO:0003755", "GO:0004672", "GO:0005509", "GO:0005524"] | 4 | [2761, 3561, 4193, 4205] | 4 | | 1 | 1 | 648755 | UPI0016133188 | {"GO Molecular Function": [], "GO Biological Process": [], "GO Cellular Component": []} | [] | 0 | [] | 0 | | 2 | 2 | 1.93059e+06 | A0A410P257_9BACT | {"GO Molecular Function": [], "GO Biological Process": [], "GO Cellular Component": []} | [] | 0 | [] | 0 | | 3 | 3 | 519421 | UPI0019403D63 | {"GO Molecular Function": [], "GO Biological Process": [], "GO Cellular Component": []} | [] | 0 | [] | 0 | | 4 | 4 | 72004 | A0A6B0RPA5_9CETA | {"GO Molecular Function": ["GO:0005524", "GO:0004672"], "GO Biological Process": [], "GO Cellular Component": []} | ["GO:0004672", "GO:0005524"] | 2 | [3561, 4205] | 2 | | 5 | 5 | 375764 | A0A672ZWI7_9TELE | {"GO Molecular Function": [], "GO Biological Process": [], "GO Cellular Component": []} | [] | 0 | [] | 0 | | 6 | 6 | 1.41558e+06 | A0A6P7YNV3_9AMPH | {"GO Molecular Function": ["GO:0005524", "GO:0004672"], "GO Biological Process": [], "GO Cellular Component": ["GO:0005886"]} | ["GO:0004672", "GO:0005524", "GO:0005886"] | 3 | [3561, 4205, 4526] | 3 | | 7 | 7 | 240159 | A0A4U5TZD8_COLLU | {"GO Molecular Function": ["GO:0005524", "GO:0004672"], "GO Biological Process": [], "GO Cellular Component": ["GO:0016021", "GO:0005886"]} | ["GO:0004672", "GO:0005524", "GO:0005886", "GO:0016021"] | 4 | [3561, 4205, 4526, 10019] | 4 | | 8 | 8 | 146911 | UPI00074FFD9C | {"GO Molecular Function": [], "GO Biological Process": [], "GO Cellular Component": []} | [] | 0 | [] | 0 | | 9 | 9 | 260995 | A0A6P8RG40_GEOSA | {"GO Molecular Function": ["GO:0005524", "GO:0004672"], "GO Biological Process": [], "GO Cellular Component": ["GO:0005886"]} | ["GO:0004672", "GO:0005524", "GO:0005886"] | 3 | [3561, 4205, 4526] | 3 |
opened by rom1504 4

Releases(0.0.36)

0.0.36(Aug 16, 2021)

Source code(tar.gz)
Source code(zip)
0.0.35(Aug 9, 2021)

Source code(tar.gz)
Source code(zip)
0.0.34(Jul 7, 2021)

Source code(tar.gz)
Source code(zip)
0.0.33(Jul 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.32(Jul 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.29(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.28a(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.27(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.26(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.25(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.24(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.23(Jul 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.21(Jul 3, 2021)

Source code(tar.gz)
Source code(zip)
0.0.20(Jul 3, 2021)

Source code(tar.gz)
Source code(zip)
0.0.19(Jul 3, 2021)

Source code(tar.gz)
Source code(zip)
0.0.18(Jul 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.17(Jul 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.16(Jul 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.14(Jul 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.12(Jul 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.11(Jul 1, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Jul 1, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9a(Jun 30, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Jun 30, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Jun 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Jun 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5a(Jun 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Jun 25, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3a(Jun 25, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2a(Jun 25, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention

GitHub

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

6 Mar 3, 2022

CLOOB training (JAX) and inference (JAX and PyTorch)

cloob-training Pretrained models There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint train

64 Nov 27, 2022

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

5k Jan 4, 2023

GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

14 Nov 29, 2022

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Related tags

Overview

ProGen - (wip)

Install

Usage

Training from Uniref

Todo

Citations

You might also like...

Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

Generative Models for Graph-Based Protein Design

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Comments

protein bert uniref90 dataset

Releases(0.0.36)

0.0.36(Aug 16, 2021)

0.0.35(Aug 9, 2021)

0.0.34(Jul 7, 2021)

0.0.33(Jul 6, 2021)

0.0.32(Jul 6, 2021)

0.0.29(Jul 4, 2021)

0.0.28a(Jul 4, 2021)

0.0.27(Jul 4, 2021)

0.0.26(Jul 4, 2021)

0.0.25(Jul 4, 2021)

0.0.24(Jul 4, 2021)

0.0.23(Jul 4, 2021)

0.0.21(Jul 3, 2021)

0.0.20(Jul 3, 2021)

0.0.19(Jul 3, 2021)

0.0.18(Jul 2, 2021)

0.0.17(Jul 2, 2021)

0.0.16(Jul 2, 2021)

0.0.14(Jul 2, 2021)

0.0.12(Jul 2, 2021)

0.0.11(Jul 1, 2021)

0.0.10(Jul 1, 2021)

0.0.9a(Jun 30, 2021)

0.0.8(Jun 30, 2021)

0.0.7(Jun 29, 2021)

0.0.6(Jun 29, 2021)

0.0.5a(Jun 28, 2021)

0.0.5(Jun 25, 2021)

0.0.3a(Jun 25, 2021)

0.0.2a(Jun 25, 2021)

Owner

Phil Wang

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

CLOOB training (JAX) and inference (JAX and PyTorch)

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

GAN JAX - A toy project to generate images from GANs with JAX

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Replication of Pix2Seq with Pretrained Model

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Official implementation of "Generating 3D Molecules for Target Protein Binding"