RFDesign: Protein hallucination and inpainting with RoseTTAFold
Jue Wang ([email protected])
Doug Tischer ([email protected])
Sidney Lisanza ([email protected])
David Juergens ([email protected])
Joe Watson ([email protected])
This repository contains code for protein hallucination or inpainting, as described in our preprint. Code for postprocessing and analysis scripts included in scripts/
.
License
All code is released under the MIT license.
All weights for neural networks are released for non-commercial use only under the Rosetta-DL license.
Installation
- Clone the repository:
git clone https://git.ipd.uw.edu/jue/rfdesign.git
cd rfdesign
- Create environment and install dependencies:
cd envs
conda env create -f SE3.yml
- Download model weights (see license info above).
wget https://files.ipd.uw.edu/pub/rfdesign/weights.tar.gz
tar xzf weights.tar.gz
- Configure path to weights. Put a file called config.json in
hallucination/
andinpainting/
with the path to the weights directory. An example file is in each folder to copy from.
Dependencies
If you want/need to configure your environment manually, here are the packages in our environment:
- python 3.8
- pytorch 1.10.1
- cudatoolkit 11.3.1
- numpy
- scipy
- requests
- packaging
- pytorch-geometric (installation instructions)
- dgl (installation instructions)
- se3-transformer (install from github)
- lie_learn
- icecream (for
inpainting.py
)
Notes
- If you are running this on digs at the IPD, you don't need to do steps 3-4.
- If you are getting output pdbs that are a ball of disconnected segments (as viewed in pymol), this may be due to a problem with the spherical harmonics cached by SE3-transformer. A workaround is to copy the
hallucination/cache/
folder (a correct, clean copy of the cache) to your working directory before runninghallucinate.py
orinpaint.py
.
Usage
See READMEs in hallucination/
and inpainting/
subfolders.
References
J. Wang, S. Lisanza, D. Juergens, D. Tischer, et al. Deep learning methods for designing proteins scaffolding functional sites. bioRxiv (2021). link
M. Baek, et al., Accurate prediction of protein structures and interactions using a three-track neural network, Science (2021). link
An earlier version of our hallucination method can be found at the trdesign-motif repo and published at:
D. Tischer, S. Lisanza, J. Wang, R. Dong, I. Anishchenko, L. F. Milles, S. Ovchinnikov, D. Baker. Design of proteins presenting discontinuous functional sites using deep learning. (2020) bioRxiv link
Our work is based on previous hallucination methods for unconstrained protein generation and fixed-backbone sequence design (trDesign repo):
I Anishchenko, SJ Pellock, TM Chidyausiku, ..., S Ovchinnikov, D Baker. De novo protein design by deep network hallucination. (2021) Nature link
C Norn, B Wicky, D Juergens, S Liu, D Kim, B Koepnick, I Anishchenko, Foldit Players, D Baker, S Ovchinnikov. Protein sequence design by conformational landscape optimization. (2021) PNAS link