This package contains deep learning models and related scripts for RoseTTAFold

Last update: Jan 3, 2023

Related tags

Deep Learning RoseTTAFold

Overview

RoseTTAFold

This package contains deep learning models and related scripts to run RoseTTAFold This repository is the official implementation of RoseTTAFold: Accurate prediction of protein structures and interactions using a 3-track network.

Installation

clone the package

git clone https://github.com/RosettaCommons/RoseTTAFold
cd RoseTTAFold

create conda environment using RoseTTAFold-linux.yml file and folding-linux.yml file. The latter required to run pyrosetta version only (run_pyrosetta_ver.sh).

conda env create -f RoseTTAFold-linux.yml
conda env create -f folding-linux.yml

download network weights (under Rosetta-DL Software license -- please see below) While the code is licensed under the MIT License, the trained weights and data for RoseTTAFold are made available for non-commercial use only under the terms of the Rosetta-DL Software license. You can find details at https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE.txt

wget https://files.ipd.uw.edu/pub/RoseTTAFold/weights.tar.gz
tar xfz weights.tar.gz

download and install third-party software if you want to run the entire modeling script (run_pyrosetta_ver.sh)

./install_dependencies.sh

download sequence and structure databases

# uniref30 [46G]
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
mkdir -p UniRef30_2020_06
tar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./UniRef30_2020_06

# BFD [272G]
wget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz
mkdir -p bfd
tar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd

# structure templates [10G]
wget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz
tar xfz pdb100_2021Mar03.tar.gz

Obtain a PyRosetta licence and install the package in the newly created folding conda environment (link).

Usage

cd example
../run_[pyrosetta, e2e]_ver.sh input.fa .

Expected outputs

For the pyrosetta version, user will get five final models having estimated CA rms error at the B-factor column (model/model_[1-5].crderr.pdb). For the end-to-end version, there will be a single PDB output having estimated residue-wise CA-lddt at the B-factor column (t000_.e2e.pdb).

Credit to performer-pytorch and SE(3)-Transformer codes

The code in the network/performer_pytorch.py is strongly based on this repo which is pytorch implementation of Performer architecture. The codes in network/equivariant_attention is from the original SE(3)-Transformer repo which accompanies the paper 'SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks' by Fabian et al.

References

M Baek, et al., Accurate prediction of protein structures and interactions using a 3-track network, bioRxiv (2021). link

Comments

"out of memory" on multi-GPU platform

Hi,

First of all, thanks for sharing such great work.

I ran this toolkit on the GPU node (consisting of 8 * V100) of HPC. Unfortunately, when executing the predict_e2e.py script, the program can only call one GPU, which generates an out of memory error. I consulted with HPC maintenance, and they analyzed that it might be a problem with the script's configuration when executing. I wonder if there is a way to solve this problem? Or can I ask to set the parameters for multiple GPUs in the future update?

Here is the command,

python RoseTTAFold/network/predict_e2e.py \
-m RoseTTAFold/weights \
-i RoseTTAFold/example/t000_.msa0.a3m \
-o RoseTTAFold/example/t000_.e2e \
--hhr RoseTTAFold/example/t000_.hhr \
--atab RoseTTAFold/example/t000_.atab \
--db RoseTTAFold/pdb100_2021Mar03/pdb100_2021Mar03 1>RoseTTAFold/example/log/network.stdout 2>RoseTTAFold/example/log/network.stderr

And log/network.stderr:

DGL backend not selected or invalid.  Assuming PyTorch for now.
Using backend: pytorch
...
...
...
  File "/share/home/wangq/wyf/RoseTTAFold/network/equivariant_attention/modules.py", line 303, in forward
    kernel = torch.sum(R * basis[f'{self.degree_in},{self.degree_out}'], -1)
RuntimeError: CUDA out of memory. Tried to allocate 3.43 GiB (GPU 0; 31.72 GiB total capacity; 23.57 GiB already allocated; 2.78 GiB free; 27.29 GiB reserved in total by PyTorch)

I appreciate any advice and suggestions.

Thanks to all~

opened by IvanWoo22 16

Issues on shared system - file permission

I am setting this RoseTTAFold on shared system where the users do not have write access to instlation folder. I have installed successfully, but when I run the test I see that RoseTTAFold try to write to the central installation.

run_pyrosetta_ver.sh input.fa .
(network.stderr file)
PermissionError: [Errno 13] Permission denied: '/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/from_se3cnn/cache/trans_Q/mutex'

Is there a way to tell RoseTTAFold to use the user home directory instead ?

cat network.stderr 
DGL backend not selected or invalid.  Assuming PyTorch for now.
Using backend: pytorch
Traceback (most recent call last):
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/predict_pyRosetta.py", line 200, in <module>
    pred.predict(args.a3m_fn, args.out_prefix, args.hhr, args.atab)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/predict_pyRosetta.py", line 158, in predict
    logit_s, init_crds, pred_lddt = self.model(msa, seq, idx_pdb, t1d=t1d, t2d=t2d)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/RoseTTAFoldModel.py", line 53, in forward
    msa, pair, xyz, lddt = self.feat_extractor(msa, pair, seq1hot, idx)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/Attention_module_w_str.py", line 476, in forward
    msa, pair, xyz = self.iter_block_2[i_m](msa, pair, xyz, seq1hot, idx, top_k=top_ks[i_m])
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/Attention_module_w_str.py", line 363, in forward
    xyz, state = self.str2str(msa.float(), pair.float(), xyz.float(), seq1hot, idx, top_k=top_k)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/Attention_module_w_str.py", line 241, in forward
    shift = self.se3(G, msa.reshape(B*L, -1, 1), l1_feats)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/SE3_network.py", line 104, in forward
    basis, r = get_basis_and_r(G, self.num_degrees-1)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/modules.py", line 102, in get_basis_and_r
    basis = get_basis(G, max_degree, compute_gradients)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/modules.py", line 61, in get_basis
    Q_J = utils_steerable._basis_transformation_Q_J(J, d_in, d_out)
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/from_se3cnn/cache_file.py", line 75, in wrapper
    with FileSystemMutex(mutexfile):
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/from_se3cnn/cache_file.py", line 42, in __enter__
    self.acquire()
  File "/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/from_se3cnn/cache_file.py", line 26, in acquire
    self.handle = open(self.filename, 'w')
PermissionError: [Errno 13] Permission denied: '/cluster/software/RoseTTAFold/1.0.0.1/RoseTTAFold/network/equivariant_attention/from_se3cnn/cache/trans_Q/mutex'

Regards, Sabry

opened by Sabryr 8

How to use 2 track network?

As the instruction of usage shows, the output of 2track network for eukaryotes is complex.npz file. This just shows a distogram probability(maybe..?). But the paper shows that we can predict its model using 2track. How can we predict its structure using 2track network?

opened by what-is-what 7
Example/pyrosetta/log/folding.stderr - No moduled named 'pyrosetta'

I performed run_pyrosetta.sh using input.fa in example folder. However, I cannot understand why the folding.stderr in example folder(from github) states about pyrosetta is not imported. As I tried, it prints nothing(local). Also the score in modelQ.dat is absolutely different. I cannot understand which part is going wrong...

opened by what-is-what 7
failed hhsearch
Hello everyone, I encountered issues while running run_e2e.sh script.

These are the error messages from log files:

make_ss.stderr [makemat] FATAL ERROR: Unable to recover checkpoint from t000_.msa0.tmp.chk

Bad psipred pass1 file format! rm: cannot remove '/home/xx/RoseTTAFold-main/example/aa474/t000_.msa0.a3m.csb.hhblits.ss2': No such file or directory

hhsearch.stderr

14:28:52.127 ERROR: In /opt/conda/conda-bld/hhsuite_1616660820288/work/src/hhalignment.cpp:223: Read:

14:28:52.127 ERROR: sequence ss_pred contains no residues.

Does anyone encounter the same issues and can solve them? Thank you in advance
opened by jessu10 6
"sequence ss_pred contains no residues."when running hhsearch
detail:

09:14:24.642 ERROR: In /opt/conda/conda-bld/hhsuite_1616660820288/work/src/hhalignment.cpp:223: Read:

09:14:24.642 ERROR: sequence ss_pred contains no residues.
opened by Meowooo 4
PDB100 database compatability issues

Multiple users having issues using the PDB100 database with HHblits (v3.3) (see https://github.com/soedinglab/hh-suite/issues/275). I noticed that PDB100 does not contain the the _a3m database. This causes the per-merge to crash. Is there any reason to not include the a3m file?

opened by martin-steinegger 4
casp14 result
This is a wonderful job. But I have two questions.

This picture is the result of the evaluation of casp14.

Do you only use UniRef30_2020_06 and bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt database to search MSAs?

Do you only use pdb100_2021Mar03 database to search templates(with above MSAs)?
opened by TianjieY 4

python3 $PIPEDIR/DAN-msa/ErrorPredictorMSA.py --roll -p $CPU $WDIR/t000_.3track.npz $WDIR/pdb-3track $WDIR/pdb-3track failure

Hello,

I am just clueless about the failure of the last step of run_pyrosetta_ver.sh:

Singularity> /app/RoseTTAFold/run_pyrosetta_ver.sh Sulfolobus_S_Layer.fasta Sulfolobus_S_Layer.d2
Running HHblits
Running PSIPRED
Running hhsearch
Predicting distance and orientations
Running parallel RosettaTR.py
Running DeepAccNet-msa

logs reads:

PyRosetta-4 2021 [Rosetta PyRosetta4.Release.python37.ubuntu 2021.34+release.5eb89ef1fc1a9146e2c7aa29194bc6267733596c 2021-08-23T13:12:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
/app/RoseTTAFold/DAN-msa/models/smTr
Traceback (most recent call last):
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
         [[2d_conv/lddt/truediv/_1161]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
0 successful operations.
0 derived errors ignored.

opened by truatpasteurdotfr 3

Update predict_complex README

When I first ran the complex structure prediction (step 4), it was outputting a model where the break between the two subunits was in the totally wrong place.

After a lot of digging through the predict_complex.py file, I realized that the numbers after the "-Ls" argument needed to be changed to represent the lengths of the individual protein subunits being modeled. I finally figured this out when I saw that the lengths of the two example subunits are 218 aa and 310 aa. I imagine other people will run into this same issue until it's clarified in the README.

opened by neilfleckSCRI 3
Issue with running hhsearch to completion

I'm working through getting RoseTTAFold operational using the example sequence.

HHblits and PSIPRED run to completion, however I encounter an error during hhsearch. In the hhsearch.stderr logfile, it says that the sequence ss_pred contains no residues. I checked t000_.msa0.ss2.a3m and the fastas for >ss_pred and >ss_conf are indeed empty, however the other alignment sequences are still present.

I tried to fix by recompiling hhsuite using the source binaries as suggested in your FAQ, however that doesn't seem to fix the issue. Does anyone have suggestions on how to move forward?

opened by joshsimp 3
[Question] Is Linux ARM64 supported ?

Hello Rosetta community!

I'd like to ask whether it is safe to run RoseTTAFold on Linux ARM64 machines ? Does anyone have positive experience with it ?

I am talking about cloud friendly environments like AWS Graviton, Oracle/Azure/Google Ampere and Huawei's TaiShan. I am not asking about RasberryPi!

Thank you! Gancho

opened by gancho-ivanov 2

[make_msa] ERROR

Dear RoseTTAFold Team,

Hi, I'm a user!

I tried to predict a large protein (>2500 amino acids) using RoseTTAFold in Linux server.

But, I got this message.

[log/make_msa.stderr]

...skipping...
- 18:32:30.435 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-10.id90cov75.a3m

- 18:32:31.159 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-10.a3m

- 18:32:32.804 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-10.a3m

- 18:32:32.804 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-10.id90cov50.a3m

- 18:32:33.536 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-10.a3m

- 19:21:05.155 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.a3m

- 19:21:05.156 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.id90cov75.a3m

- 19:21:05.882 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.a3m

- 19:21:07.572 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.a3m

- 19:21:07.572 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.id90cov50.a3m

- 19:21:08.304 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-6.a3m

- 20:06:41.142 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.a3m

- 20:06:41.143 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.id90cov75.a3m

- 20:06:41.944 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.a3m

- 20:06:43.603 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.a3m

- 20:06:43.603 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.id90cov50.a3m

- 20:06:44.375 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db0.1e-3.a3m

- 21:33:28.849 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.a3m

- 21:33:28.862 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.id90cov75.a3m

- 21:33:29.606 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.a3m

- 21:33:31.181 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.a3m

- 21:33:31.181 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.id90cov50.a3m

- 21:33:31.899 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-30.a3m

- 01:45:15.929 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.a3m

- 01:45:15.929 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.id90cov75.a3m

- 01:45:16.768 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.a3m

- 01:45:18.302 INFO: Input file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.a3m

- 01:45:18.302 INFO: Output file = /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.id90cov50.a3m

- 01:45:19.051 WARNING: Maximum number 65535 of sequences exceeded in file /data7/Programs/RoseTTAFold/results/polq/polq_wild/hhblits/t000_.db1.1e-10.a3m

/data7/Programs/RoseTTAFold/input_prep/make_msa.sh: line 31: 31368 Killed                  $HHBLITS -i $prev_a3m -oa3m $tmp_dir/t000_.db$i.$e.a3m -e $e -v 0 -d ${DATABASES[$i]}

How can solve this problem??

Many thanks,

Oh.

opened by phdstudent-oje 0

fix permission errors on shared file systems

Changes the cache directory path from the installation directory to the users home directory to avoid a crash due to insufficient permissions.

Should fix #120

opened by zemu-unile 0
"ERROR: failed to load model" for complex-modeling

Hi. While running the following command for complex modeling, python network/predict_complex.py -i filtered.a3m -o complex -Ls 218 310

The message comes as: Using backend: pytorch ERROR: failed to load model

Please suggest as how we can solve this error. Thanks in advance!

opened by munmunbhasin 1

Releases(v1.1.0)

v1.1.0(Nov 2, 2021)

Minor updates to include the simpler two-track version of RoseTTAFold used for PPI screening.
Source code(tar.gz)
Source code(zip)
v1.0.0(Jul 3, 2021)

This is the first release of RoseTTAFold.
Source code(tar.gz)
Source code(zip)

This package contains deep learning models and related scripts for RoseTTAFold

Related tags

Overview

RoseTTAFold

Installation

Usage

Expected outputs

Links

Credit to performer-pytorch and SE(3)-Transformer codes

References

Comments

This picture is the result of the evaluation of casp14.

Releases(v1.1.0)

v1.1.0(Nov 2, 2021)

v1.0.0(Jul 3, 2021)

Owner

A scientific and useful toolbox, which contains practical and effective long-tail related tricks with extensive experimental results

Fully-automated scripts for collecting AI-related papers

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

Omniverse sample scripts - A guide for developing with Python scripts on NVIDIA Ominverse

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

For holding anime-related object classification and detection models

Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.