Code for Fold2Seq paper from ICML 2021

Overview

[ICML2021] Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Fold2Seq Architecture

Environment file:

Data and Feature Generation:

  • Go to data/ and check the README there.

How to train the model:

  • go to src/ and run:

python train.py --data_path $path_to_the_data_dictionary --lr $learning_rate --model_save $path_to_the_saved_model

How to generate sequences:

  • go to src/ and run:

python inference.py --trained_model $path_to_the_trained_model --output $path_to_the_output_file --data_path $path_to_the_data_dictionary

Fold2Seq generated structures against natural structures:

Fold2Seq structures

Comments
  • Error creating from environment.yml

    Error creating from environment.yml

    It seems the environment file is broken when following the code at the top of the environment.yml file:

    $ conda create --name fold2seq --file environment.yml
    Collecting package metadata (current_repodata.json): done
    Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
    Collecting package metadata (repodata.json): done
    Solving environment: failed
    
    PackagesNotFoundError: The following packages are not available from current channels:
    
      - rdflib==5.0.0=pypi_0
      - decorator==4.4.2=pypi_0
      - pandas==1.2.3=pypi_0
      - libidn2==2.3.0=h516909a_0
      - mmseqs2==13.45111=h95f258a_1
      - jinja2==2.11.3=pypi_0
      - libgcc-ng==9.3.0=h2828fa1_18
      - pytorch==1.7.1=py3.8_cuda10.2.89_cudnn7.6.5_0
      - torchaudio==0.7.2=py38
      - llvmlite==0.35.0=pypi_0
      - torch-sparse==0.6.9=pypi_0
      - idna==2.10=pypi_0
      - esm==0.2.0=pypi_0
      - tokenizers==0.8.0rc4=pypi_0
      - scipy==1.5.4=pypi_0
      - networkx==2.5=pypi_0
      - sacremoses==0.0.43=pypi_0
      - markupsafe==1.1.1=pypi_0
      - packaging==20.9=pypi_0
      - torch-geometric==1.6.3=pypi_0
      - biopython==1.77=pypi_0
      - h5py==3.2.0=pypi_0
      - matplotlib==3.3.1=pypi_0
      - cycler==0.10.0=pypi_0
      - isodate==0.6.0=pypi_0
      - regex==2020.11.13=pypi_0
      - pyparsing==2.4.7=pypi_0
      - torchvision==0.8.2=py38_cu102
      - kiwisolver==1.2.0=pypi_0
      - bzip2==1.0.8=h7f98852_4
      - tqdm==4.58.0=pypi_0
      - pytz==2021.1=pypi_0
      - libstdcxx-ng==9.3.0=h6de172a_18
      - googledrivedownloader==0.4=pypi_0
      - transformers==3.0.0=pypi_0
      - torch-scatter==2.0.6=pypi_0
      - python-dateutil==2.8.1=pypi_0
      - sentencepiece==0.1.95=pypi_0
      - gawk==5.1.0=h7f98852_0
      - requests==2.25.1=pypi_0
      - torch-cluster==1.5.9=pypi_0
      - filelock==3.0.12=pypi_0
      - torch-spline-conv==1.2.1=pypi_0
      - libunistring==0.9.10=h14c3975_0
      - scikit-learn==0.24.0=pypi_0
      - sklearn==0.0=pypi_0
      - numba==0.52.0=pypi_0
      - gettext==0.19.8.1=hf34092f_1004
      - python-louvain==0.15=pypi_0
      - chardet==4.0.0=pypi_0
      - libgomp==9.3.0=h2828fa1_18
      - openssl==1.1.1k=h7f98852_0
      - ase==3.21.1=pypi_0
      - joblib==1.0.0=pypi_0
      - threadpoolctl==2.1.0=pypi_0
      - certifi==2020.6.20=pypi_0
      - ca-certificates==2020.12.5=ha878542_0
      - urllib3==1.26.3=pypi_0
      - _libgcc_mutex==0.1=conda_forge
      - python_abi==3.8=1_cp38
      - click==7.1.2=pypi_0
      - wget==1.20.1=h22169c7_0
    
    Current channels:
    
      - https://repo.anaconda.com/pkgs/main/linux-64
      - https://repo.anaconda.com/pkgs/main/noarch
      - https://repo.anaconda.com/pkgs/r/linux-64
      - https://repo.anaconda.com/pkgs/r/noarch
    
    To search for alternate channels that may provide the conda package you're
    looking for, navigate to
    
        https://anaconda.org
    
    and use the search bar at the top of the page.
    

    The =pypi_0 looks suspicious and the channel information was not included. So I tried removing the =pypi_0 and adding the channel information:

    $ conda create --name fold2seq --file new-environment.yml -c pytorch -c huggingface -c conda-forge
    Collecting package metadata (current_repodata.json): done
    Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
    Collecting package metadata (repodata.json): done
    Solving environment: failed
    
    PackagesNotFoundError: The following packages are not available from current channels:
    
      - mmseqs2==13.45111=h95f258a_1
      - torch-sparse==0.6.9
      - sklearn==0.0
      - esm==0.2.0
      - torch-geometric==1.6.3
      - scipy==1.5.4
      - torch-cluster==1.5.9
      - torch-spline-conv==1.2.1
      - transformers==3.0.0
      - torch-scatter==2.0.6
      - tokenizers==0.8.0rc4
      - h5py==3.2.0
    
    Current channels:
    
      - https://conda.anaconda.org/pytorch/linux-64
      - https://conda.anaconda.org/pytorch/noarch
      - https://conda.anaconda.org/anaconda/linux-64
      - https://conda.anaconda.org/anaconda/noarch
      - https://conda.anaconda.org/huggingface/linux-64
      - https://conda.anaconda.org/huggingface/noarch
      - https://conda.anaconda.org/conda-forge/linux-64
      - https://conda.anaconda.org/conda-forge/noarch
      - https://repo.anaconda.com/pkgs/main/linux-64
      - https://repo.anaconda.com/pkgs/main/noarch
      - https://repo.anaconda.com/pkgs/r/linux-64
      - https://repo.anaconda.com/pkgs/r/noarch
    

    Help would be appreciated

    opened by bhoov 1
  • Error running fold_feat_gen.py

    Error running fold_feat_gen.py

    I'm getting errors running the preprocessing script:

    total number of seqs: 470679
    removed # seqs in ss.txt: 9337
    Traceback (most recent call last):
      File "fold_feat_gen.py", line 154, in <module>
        x1,x2 = selection(line[0] ,  line[2], int(line[3]), int(line[4]), seq_ss)  
      File "fold_feat_gen.py", line 44, in selection
        start = start.replace(')','')
    AttributeError: 'int' object has no attribute 'replace'
    

    start and end are integers when I print them, so doesn't make sense that a string replace is applied to them... Any idea what's going on here?

    opened by aiXander 1
  • Regarding to Data Source and Data Structure

    Regarding to Data Source and Data Structure

    Hi, I am facing a data source problem. I would like to apply my own data to your amazing model, but I cannot try to make the wrong data structure that fits your model' expected input.

    Can I be provided the right data structure or just be shown the file "../data/domain_dict_full.pkl" to figure this problem out?

    By the way, there is a bug on the file fold_feat_gen.py lines 42 and 48: variables 'start' and 'end' should be strings so as to fit function 'replace'. Similarly, the same file lines 83, 87, 89, and 91: information extraction of nested dictionary cannot be simply implemented by indexing as ss['seq'].

    Looking forward to your precious reply. Many thanks!!

    opened by chq1155 1
  • Regarding to Data Source and Data Structure

    Regarding to Data Source and Data Structure

    Hi, I am facing a data source problem. I would like to apply my own data to your amazing model, but I cannot try to make the wrong data structure that fits your model' expected input.

    Can I be provided the right data structure or just be shown the file "../data/domain_dict_full.pkl" to figure this problem out?

    By the way, there is a bug on the file fold_feat_gen.py lines 42 and 48: variables 'start' and 'end' should be strings so as to fit function 'replace'. Similarly, the same file lines 83, 87, 89, and 91: information extraction of nested dictionary cannot be simply implemented by indexing as ss['seq'].

    Looking forward to your precious reply. Many thanks!!

    opened by chq1155 0
  • How to generate domain data?

    How to generate domain data?

    I can get a domain_seq.pkl containing coords and seq info from the code, but the dataset requires emb, padding and foldclass which the code can't generate. So how can I generate them , write my own code or is there some details that I miss ?

    opened by XZK9 6
Owner
International Business Machines
International Business Machines
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 1, 2023
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 4, 2023
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

RASP Setup Mac or Linux Run ./setup.sh . It will create a python3 virtual environment and install the dependencies for RASP. It will also try to insta

null 141 Jan 3, 2023
Official code for UnICORNN (ICML 2021)

UnICORNN (Undamped Independent Controlled Oscillatory RNN) [ICML 2021] This repository contains the implementation to reproduce the numerical experime

Konstantin Rusch 21 Dec 22, 2022
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 2, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 885 Jan 1, 2023
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Delving into Deep Imbalanced Regression This repository contains the implementation code for paper: Delving into Deep Imbalanced Regression Yuzhe Yang

Yuzhe Yang 568 Dec 30, 2022
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

Kwai Inc. 3.1k Jan 4, 2023
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

null 47 Oct 11, 2022
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

VITA 51 Dec 29, 2022
[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

Graph Contrastive Learning Automated PyTorch implementation for Graph Contrastive Learning Automated [talk] [poster] [appendix] Yuning You, Tianlong C

Shen Lab at Texas A&M University 80 Nov 23, 2022
How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

AdamBNN This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021. In this

Zechun Liu 47 Sep 20, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

[PDF] | [Slides] The official implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021 Long talk) Installation Inst

MilaGraph 117 Dec 9, 2022