Open source code for AlphaFold.

DeepMind

Last update: Jan 2, 2023

Related tags

Text Data & NLP alphafold

Overview

AlphaFold

This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.

First time setup

The following steps are required in order to run AlphaFold:

Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.
Download genetic databases (see below).
Download model parameters (see below).
Check that AlphaFold will be able to use a GPU by running:
```
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

Genetic databases

This step requires rsync and aria2c to be installed on your machine.

AlphaFold needs multiple genetic (sequence) databases to run:

UniRef90,
MGnify,
BFD,
Uniclust30,
PDB70,
PDB (structures in the mmCIF format).

We provide a script scripts/download_all_data.sh that can be used to download and set up all of these databases. This should take 8–12 hours.

📒 Note: The total download size is around 428 GB and the total size when unzipped is 2.2 TB. Please make sure you have a large enough hard drive space, bandwidth and time to download.

This script will also download the model parameter files. Once the script has finished, you should have the following directory structure:

$DOWNLOAD_DIR/                             # Total: ~ 2.2 TB (download: 428 GB)
    bfd/                                   # ~ 1.8 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 64 GB (download: 32.9 GB)
        mgy_clusters.fa
    params/                                # ~ 3.5 GB (download: 3.5 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # LICENSE,
        # = 11 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
        mmcif_files/
            # About 180,000 .cif files.
        obsolete.dat
    uniclust30/                            # ~ 87 GB (download: 24.9 GB)
        uniclust30_2018_08/
            # 13 files.
    uniref90/                              # ~ 59 GB (download: 29.7 GB)
        uniref90.fasta

Model parameters

While the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters are made available for non-commercial use only under the terms of the CC BY-NC 4.0 license. Please see the Disclaimer below for more detail.

The AlphaFold parameters are available from https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar, and are downloaded as part of the scripts/download_all_data.sh script. This script will download parameters for:

5 models which were used during CASP14, and were extensively validated for structure prediction quality (see Jumper et al. 2021, Suppl. Methods 1.12 for details).
5 pTM models, which were fine-tuned to produce pTM (predicted TM-score) and predicted aligned error values alongside their structure predictions (see Jumper et al. 2021, Suppl. Methods 1.9.7 for details).

Running AlphaFold

The simplest way to run AlphaFold is using the provided Docker script. This was tested on Google Cloud with a machine using the nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional 3 TB disk, and an A100 GPU.

Clone this repository and cd into it.

git clone https://github.com/deepmind/alphafold.git

Modify DOWNLOAD_DIR in docker/run_docker.py to be the path to the directory containing the downloaded databases.

Build the Docker image:

docker build -f docker/Dockerfile -t alphafold .

Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.
```
pip3 install -r docker/requirements.txt
```
Run run_docker.py pointing to a FASTA file containing the protein sequence for which you wish to predict the structure. If you are predicting the structure of a protein that is already in PDB and you wish to avoid using it as a template, then max_template_date must be set to be before the release date of the structure. For example, for the T1050 CASP14 target:
```
python3 docker/run_docker.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14
```
By default, Alphafold will attempt to use all visible GPU devices. To use a subset, specify a comma-separated list of GPU UUID(s) or index(es) using the --gpu_devices flag. See GPU enumeration for more details.
You can control AlphaFold speed / quality tradeoff by adding either --preset=full_dbs or --preset=casp14 to the run command. We provide the following presets:
- casp14: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with 8 ensemblings.
- full_dbs: The model in this preset is 8 times faster than the casp14 preset with a very minor quality drop (-0.1 average GDT drop on CASP14 domains). It runs with all genetic databases and with no ensembling.
Running the command above with the casp14 preset would look like this:
```
python3 docker/run_docker.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14 --preset=casp14
```

AlphaFold output

The outputs will be in a subfolder of output_dir in run_docker.py. They include the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings. The output_dir directory will have the following structure:

output_dir/
    features.pkl
    ranked_{0,1,2,3,4}.pdb
    ranking_debug.json
    relaxed_model_{1,2,3,4,5}.pdb
    result_model_{1,2,3,4,5}.pkl
    timings.json
    unrelaxed_model_{1,2,3,4,5}.pdb
    msas/
        bfd_uniclust_hits.a3m
        mgnify_hits.sto
        uniref90_hits.sto

The contents of each output file are as follows:

features.pkl – A pickle file containing the input feature Numpy arrays used by the models to produce the structures.
unrelaxed_model_*.pdb – A PDB format text file containing the predicted structure, exactly as outputted by the model.
relaxed_model_*.pdb – A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction, see Jumper et al. 2021, Suppl. Methods 1.8.6 for details.
ranked_*.pdb – A PDB format text file containing the relaxed predicted structures, after reordering by model confidence. Here ranked_0.pdb should contain the prediction with the highest confidence, and ranked_4.pdb the prediction with the lowest confidence. To rank model confidence, we use predicted LDDT (pLDDT), see Jumper et al. 2021, Suppl. Methods 1.9.6 for details.
ranking_debug.json – A JSON format text file containing the pLDDT values used to perform the model ranking, and a mapping back to the original model names.
timings.json – A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.
msas/ - A directory containing the files describing the various genetic tool hits that were used to construct the input MSA.
result_model_*.pkl – A pickle file containing a nested dictionary of the various Numpy arrays directly produced by the model. In addition to the output of the structure module, this includes auxiliary outputs such as distograms and pLDDT scores. If using the pTM models then the pTM logits will also be contained in this file.

This code has been tested to match mean top-1 accuracy on a CASP14 test set with pLDDT ranking over 5 model predictions (some CASP targets were run with earlier versions of AlphaFold and some had manual interventions; see our forthcoming publication for details). Some targets such as T1064 may also have high individual run variance over random seeds.

Inferencing many proteins

The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required.

We do not provide a bulk inference script, but it should be straightforward to develop on top of the RunModel.predict method with a parallel system for precomputing multi-sequence alignments. Alternatively, this script can be run repeatedly with only moderate overhead.

Note on reproducibility

AlphaFold's output for a small number of proteins has high inter-run variance, and may be affected by changes in the input data. The CASP14 target T1064 is a notable example; the large number of SARS-CoV-2-related sequences recently deposited changes its MSA significantly. This variability is somewhat mitigated by the model selection process; running 5 models and taking the most confident.

To reproduce the results of our CASP14 system as closely as possible you must use the same database versions we used in CASP. These may not match the default versions downloaded by our scripts.

For genetics:

UniRef90: v2020_01
MGnify: v2018_12
Uniclust30: v2018_08
BFD: only version available

For templates:

PDB: (downloaded 2020-05-14)
PDB70: (downloaded 2020-05-13)

An alternative for templates is to use the latest PDB and PDB70, but pass the flag --max_template_date=2020-05-14, which restricts templates only to structures that were available at the start of CASP14.

Citing this work

If you use the code or data in this package, please cite:

@Article{AlphaFold2021,
  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  doi     = {10.1038/s41586-021-03819-2},
  note    = {(Accelerated article preview)},
}

Acknowledgements

AlphaFold communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

License and Disclaimer

This is not an officially supported Google product.

AlphaFold Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Model Parameters License

The AlphaFold parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Mirrored Databases

The following databases have been mirrored by DeepMind, and are available with reference to the following:

BFD (unmodified), by Steinegger M. and Söding J., available under a Creative Commons Attribution-ShareAlike 4.0 International License.

Comments

Getting "ValueError: Cannot create a tensor proto whose content is larger than 2GB." for smaller sequences

Hello! I'm getting the error "ValueError: Cannot create a tensor proto whose content is larger than 2GB." when running AlphaFold jobs for proteins longer than ~650 residues. I have tried using --max_template_data=1900-01-01 in order to limit MSA size, but this has not helped. I using 1 GPU, 8 CPU cores, and 150GB memory on my university's supercomputer. I am not interested in using --preset=reduced_dbs. Thanks!!
error report

opened by DavidB256 26
AttributeError: 'Config' object has no attribute 'jax_experimental_name_stack'

Hello,

I am predicting protein multimers using AlphaFold2 however it always failed and showed something like the following:

UnfilteredStackTrace Traceback (most recent call last) in 44 processed_feature_dict = model_runner.process_features(np_example, random_seed=0) ---> 45 prediction = model_runner.predict(processed_feature_dict, random_seed=random.randrange(sys.maxsize)) 46

23 frames UnfilteredStackTrace: AttributeError: 'Config' object has no attribute 'jax_experimental_name_stack'

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

AttributeError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/haiku/_src/module.py in wrapped(self, *args, **kwargs) 406 f = functools.partial(unbound_method, self) 407 f = functools.partial(run_interceptors, f, method_name, self) --> 408 if jax.config.jax_experimental_name_stack and module_name: 409 local_module_name = module_name.split("/")[-1] 410 f = jax.named_call(f, name=local_module_name)

AttributeError: 'Config' object has no attribute 'jax_experimental_name_stack'

It shows "Attribute error" but I have no idea about coding to solve this issue. I would really appreciate if someone knows what is going on.

Tons of thanks! Kaka
duplicate

opened by zhangkaka1123 19
ValueError: The number of positions must match the number of atoms

Hi. Alphafold is a fantastic tool! Thank you so much for this.

I encountered this error (ValueError: The number of positions must match the number of atoms) that is associated to a specific protein run in multimer mode. No problem encountered when this specific protein was analyzed in monomer mode. No problem as well when other proteins were analyzed in multimer mode. Only when this specific protein is included in multimer run that I get the error. Have you encountered this type of error? If so, what can you recommend for possible solutions? I can share the details of my analysis if needed. Thank you.
error report

opened by aldrinlugena 18
Update to current OpenMM

AlphaFold pins OpenMM to 7.5.1, an old version that is no longer supported. It also requires applying a patch related to disulfide bonds. The current version of OpenMM includes that fix, so the patch isn't needed anymore.

I suggest removing the version pin so it will use the latest version, and removing the patch that is no longer necessary. I can submit a PR to make the changes if you want.
feature request

opened by peastman 17
Could not find CIFs in /mmcif_files

I have been working to get alphafold running with a singularity image on my schools cluster. There was an issue using Rsync to get the whole PDB downloaded in cif format so we used FTP and manually put them in the right folder. All of the files exist in .cif format in the proper directory and the directory is specified correctly in my run script, but every time it fails with this error:

Could not find CIFs in Directory that has all the cif files

Any clue what may be wrong or how I could go about fixing it?
error report

opened by jcs0902 17
RuntimeError: HHblits failed
I had try to predict a protein's stucture, but it failed, can you help me?

/data/apps/alphafold/alphafold200_1/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line! 'command line!' % flag_name) I0803 23:01:13.789970 139723086509888 templates.py:837] Using precomputed obsolete pdbs /data/public/pdb_mmcif/obsolete.dat. I0803 23:01:14.698018 139723086509888 tpu_client.py:54] Starting the local TPU driver. I0803 23:01:14.719603 139723086509888 xla_bridge.py:214] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local:// I0803 23:01:14.966662 139723086509888 xla_bridge.py:214] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. I0803 23:01:23.010689 139723086509888 run_alphafold.py:261] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] I0803 23:01:23.010993 139723086509888 run_alphafold.py:273] Using random seed 1376016366686354143 for the data pipeline I0803 23:01:23.035536 139723086509888 jackhmmer.py:130] Launching subprocess "/data/apps/alphafold/alphafold200/bin/jackhmmer -o /dev/null -A /tmp/tmp1o0pnq_t/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./data/22007.fasta /data/public/uniref90/uniref90.fasta" I0803 23:01:23.066604 139723086509888 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0803 23:15:18.661020 139723086509888 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 835.594 seconds I0803 23:15:21.842122 139723086509888 jackhmmer.py:130] Launching subprocess "/data/apps/alphafold/alphafold200/bin/jackhmmer -o /dev/null -A /tmp/tmpegmh9ytz/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./data/22007.fasta /data/public/mgnify/mgy_clusters.fa" I0803 23:15:21.911606 139723086509888 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query I0803 23:28:52.221145 139723086509888 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 810.309 seconds I0803 23:29:15.419130 139723086509888 hhsearch.py:76] Launching subprocess "/data/apps/alphafold/alphafold200/bin/hhsearch -i /tmp/tmpd2kes4s2/query.a3m -o /tmp/tmpd2kes4s2/output.hhr -maxseq 1000000 -d /data/public/pdb70/pdb70" I0803 23:29:15.580710 139723086509888 utils.py:36] Started HHsearch query I0803 23:33:59.937626 139723086509888 utils.py:40] Finished HHsearch query in 284.356 seconds I0803 23:37:12.944531 139723086509888 hhblits.py:128] Launching subprocess "/data/apps/alphafold/alphafold200/bin/hhblits -i ./data/22007.fasta -cpu 4 -oa3m /tmp/tmpg957yehn/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /data/public/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /data/public/uniclust30/uniclust30_2018_08/uniclust30_2018_08" I0803 23:37:13.247214 139723086509888 utils.py:36] Started HHblits query I0804 00:46:09.004908 139723086509888 utils.py:40] Finished HHblits query in 4135.755 seconds E0804 00:46:09.019532 139723086509888 hhblits.py:138] HHblits failed. HHblits stderr begin: E0804 00:46:09.021161 139723086509888 hhblits.py:141] - 23:37:48.447 INFO: Searching 65983866 column state sequences. E0804 00:46:09.021250 139723086509888 hhblits.py:141] - 23:37:49.637 INFO: Searching 15161831 column state sequences. E0804 00:46:09.021305 139723086509888 hhblits.py:141] - 23:37:49.720 INFO: ./data/22007.fasta is in A2M, A3M or FASTA format E0804 00:46:09.021353 139723086509888 hhblits.py:141] - 23:37:49.721 INFO: Iteration 1 E0804 00:46:09.021399 139723086509888 hhblits.py:141] - 23:37:50.452 INFO: Prefiltering database E0804 00:46:09.021443 139723086509888 hhblits.py:141] - 23:55:52.099 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 6905944 E0804 00:46:09.021486 139723086509888 hhblits.py:141] - 00:03:15.552 WARNING: Number of hits passing 2nd prefilter (reduced from 319529 to allowed maximum of 100000). E0804 00:46:09.021530 139723086509888 hhblits.py:141] You can increase the allowed maximum using the -maxfilt option. E0804 00:46:09.021573 139723086509888 hhblits.py:141] - 00:07:13.586 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 1127018 E0804 00:46:09.021616 139723086509888 hhblits.py:141] - 00:07:44.644 WARNING: database contains sequences that exceed maximum allowed size (maxres = 20001). Max sequence length can be increased with parameter -maxres. E0804 00:46:09.021663 139723086509888 hhblits.py:141] - 00:07:45.136 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 178190 E0804 00:46:09.021706 139723086509888 hhblits.py:141] - 00:07:45.137 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 178190 E0804 00:46:09.021749 139723086509888 hhblits.py:141] - 00:07:45.137 INFO: Scoring 178190 HMMs using HMM-HMM Viterbi alignment E0804 00:46:09.021792 139723086509888 hhblits.py:141] - 00:07:45.366 INFO: Alternative alignment: 0 E0804 00:46:09.021835 139723086509888 hhblits.py:141] - 00:07:58.684 INFO: 2000 alignments done E0804 00:46:09.021913 139723086509888 hhblits.py:141] - 00:08:14.142 INFO: 4000 alignments done E0804 00:46:09.021959 139723086509888 hhblits.py:141] - 00:08:28.631 INFO: 6000 alignments done E0804 00:46:09.022015 139723086509888 hhblits.py:141] - 00:08:42.732 INFO: 8000 alignments done E0804 00:46:09.022057 139723086509888 hhblits.py:141] - 00:08:55.653 INFO: 10000 alignments done E0804 00:46:09.022100 139723086509888 hhblits.py:141] - 00:09:06.817 INFO: 12000 alignments done E0804 00:46:09.022142 139723086509888 hhblits.py:141] - 00:09:18.051 INFO: 14000 alignments done E0804 00:46:09.022186 139723086509888 hhblits.py:141] - 00:09:33.523 INFO: 16000 alignments done E0804 00:46:09.022228 139723086509888 hhblits.py:141] - 00:09:43.505 INFO: 18000 alignments done E0804 00:46:09.022271 139723086509888 hhblits.py:141] - 00:09:53.848 INFO: 20000 alignments done E0804 00:46:09.022314 139723086509888 hhblits.py:141] - 00:10:11.309 INFO: 22000 alignments done E0804 00:46:09.022356 139723086509888 hhblits.py:141] - 00:10:21.097 INFO: 24000 alignments done E0804 00:46:09.022399 139723086509888 hhblits.py:141] - 00:10:30.242 INFO: 26000 alignments done E0804 00:46:09.022442 139723086509888 hhblits.py:141] - 00:10:39.877 INFO: 28000 alignments done E0804 00:46:09.022485 139723086509888 hhblits.py:141] - 00:10:49.706 INFO: 30000 alignments done E0804 00:46:09.022527 139723086509888 hhblits.py:141] - 00:11:02.306 INFO: 32000 alignments done E0804 00:46:09.022569 139723086509888 hhblits.py:141] - 00:11:14.206 INFO: 34000 alignments done E0804 00:46:09.022611 139723086509888 hhblits.py:141] - 00:11:24.107 INFO: 36000 alignments done E0804 00:46:09.022654 139723086509888 hhblits.py:141] - 00:11:34.196 INFO: 38000 alignments done E0804 00:46:09.022696 139723086509888 hhblits.py:141] - 00:11:43.886 INFO: 40000 alignments done E0804 00:46:09.022738 139723086509888 hhblits.py:141] - 00:11:54.462 INFO: 42000 alignments done E0804 00:46:09.022781 139723086509888 hhblits.py:141] - 00:12:06.316 INFO: 44000 alignments done E0804 00:46:09.022823 139723086509888 hhblits.py:141] - 00:12:16.029 INFO: 46000 alignments done E0804 00:46:09.022878 139723086509888 hhblits.py:141] - 00:12:25.527 INFO: 48000 alignments done E0804 00:46:09.022922 139723086509888 hhblits.py:141] - 00:12:34.905 INFO: 50000 alignments done E0804 00:46:09.022972 139723086509888 hhblits.py:141] - 00:12:44.296 INFO: 52000 alignments done E0804 00:46:09.023016 139723086509888 hhblits.py:141] - 00:12:54.272 INFO: 54000 alignments done E0804 00:46:09.023058 139723086509888 hhblits.py:141] - 00:13:05.223 INFO: 56000 alignments done E0804 00:46:09.023101 139723086509888 hhblits.py:141] - 00:13:23.589 INFO: 58000 alignments done E0804 00:46:09.023143 139723086509888 hhblits.py:141] - 00:13:34.849 INFO: 60000 alignments done E0804 00:46:09.023186 139723086509888 hhblits.py:141] - 00:13:46.660 INFO: 62000 alignments done E0804 00:46:09.023228 139723086509888 hhblits.py:141] - 00:13:59.061 INFO: 64000 alignments done E0804 00:46:09.023270 139723086509888 hhblits.py:141] - 00:14:11.361 INFO: 66000 alignments done E0804 00:46:09.023312 139723086509888 hhblits.py:141] - 00:14:23.280 INFO: 68000 alignments done E0804 00:46:09.023355 139723086509888 hhblits.py:141] - 00:14:35.515 INFO: 70000 alignments done E0804 00:46:09.023397 139723086509888 hhblits.py:141] - 00:14:47.450 INFO: 72000 alignments done E0804 00:46:09.023440 139723086509888 hhblits.py:141] - 00:15:00.765 INFO: 74000 alignments done E0804 00:46:09.023482 139723086509888 hhblits.py:141] - 00:15:14.500 INFO: 76000 alignments done E0804 00:46:09.023525 139723086509888 hhblits.py:141] - 00:15:28.118 INFO: 78000 alignments done E0804 00:46:09.023567 139723086509888 hhblits.py:141] - 00:15:41.805 INFO: 80000 alignments done E0804 00:46:09.023609 139723086509888 hhblits.py:141] - 00:15:55.681 INFO: 82000 alignments done E0804 00:46:09.023651 139723086509888 hhblits.py:141] - 00:16:09.212 INFO: 84000 alignments done E0804 00:46:09.023693 139723086509888 hhblits.py:141] - 00:16:22.941 INFO: 86000 alignments done E0804 00:46:09.023748 139723086509888 hhblits.py:141] - 00:16:34.996 INFO: 88000 alignments done E0804 00:46:09.023792 139723086509888 hhblits.py:141] - 00:16:47.443 INFO: 90000 alignments done E0804 00:46:09.023835 139723086509888 hhblits.py:141] - 00:16:58.603 INFO: 92000 alignments done E0804 00:46:09.023892 139723086509888 hhblits.py:141] - 00:17:10.273 INFO: 94000 alignments done E0804 00:46:09.023935 139723086509888 hhblits.py:141] - 00:17:21.990 INFO: 96000 alignments done E0804 00:46:09.023984 139723086509888 hhblits.py:141] - 00:17:36.342 INFO: 98000 alignments done E0804 00:46:09.024028 139723086509888 hhblits.py:141] - 00:17:51.060 INFO: 100000 alignments done E0804 00:46:09.024070 139723086509888 hhblits.py:141] - 00:17:59.710 INFO: 102000 alignments done E0804 00:46:09.024112 139723086509888 hhblits.py:141] - 00:18:07.873 INFO: 104000 alignments done E0804 00:46:09.024154 139723086509888 hhblits.py:141] - 00:18:16.130 INFO: 106000 alignments done E0804 00:46:09.024196 139723086509888 hhblits.py:141] - 00:18:24.375 INFO: 108000 alignments done E0804 00:46:09.024238 139723086509888 hhblits.py:141] - 00:18:32.424 INFO: 110000 alignments done E0804 00:46:09.024280 139723086509888 hhblits.py:141] - 00:18:42.069 INFO: 112000 alignments done E0804 00:46:09.024322 139723086509888 hhblits.py:141] - 00:18:49.615 INFO: 114000 alignments done E0804 00:46:09.024364 139723086509888 hhblits.py:141] - 00:18:58.052 INFO: 116000 alignments done E0804 00:46:09.024406 139723086509888 hhblits.py:141] - 00:19:06.386 INFO: 118000 alignments done E0804 00:46:09.024449 139723086509888 hhblits.py:141] - 00:19:14.232 INFO: 120000 alignments done E0804 00:46:09.024490 139723086509888 hhblits.py:141] - 00:19:22.616 INFO: 122000 alignments done E0804 00:46:09.024533 139723086509888 hhblits.py:141] - 00:19:30.370 INFO: 124000 alignments done E0804 00:46:09.024574 139723086509888 hhblits.py:141] - 00:19:38.019 INFO: 126000 alignments done E0804 00:46:09.024617 139723086509888 hhblits.py:141] - 00:19:45.593 INFO: 128000 alignments done E0804 00:46:09.024659 139723086509888 hhblits.py:141] - 00:19:45.601 INFO: Stop after DB-HHM: 128000 because early stop 19.5091 < filter cutoff 20 E0804 00:46:09.024701 139723086509888 hhblits.py:141] - 00:19:45.610 INFO: Alternative alignment: 1 E0804 00:46:09.024743 139723086509888 hhblits.py:141] - 00:25:51.603 INFO: 106040 alignments done E0804 00:46:09.024785 139723086509888 hhblits.py:141] - 00:25:52.034 INFO: Alternative alignment: 2 E0804 00:46:09.024827 139723086509888 hhblits.py:141] - 00:30:34.459 INFO: 81231 alignments done E0804 00:46:09.024880 139723086509888 hhblits.py:141] - 00:30:35.078 INFO: Alternative alignment: 3 E0804 00:46:09.024923 139723086509888 hhblits.py:141] - 00:34:38.509 INFO: 68285 alignments done E0804 00:46:09.024972 139723086509888 hhblits.py:141] - 00:34:44.972 INFO: Premerge done E0804 00:46:09.025016 139723086509888 hhblits.py:141] - 00:34:45.002 INFO: Realigning 89456 HMM-HMM alignments using Maximum Accuracy algorithm E0804 00:46:09.025061 139723086509888 hhblits.py:142] HHblits stderr end Traceback (most recent call last): File "/data/public/alphafold2/alphafold/run_alphafold.py", line 303, in app.run(main) File "/data/apps/alphafold/alphafold200_1/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/data/apps/alphafold/alphafold200_1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/data/public/alphafold2/alphafold/run_alphafold.py", line 285, in main random_seed=random_seed) File "/data/public/alphafold2/alphafold/run_alphafold.py", line 129, in predict_structure msa_output_dir=msa_output_dir) File "/data/public/alphafold2/alphafold/alphafold/data/pipeline.py", line 171, in process input_fasta_path) File "/data/public/alphafold2/alphafold/alphafold/data/tools/hhblits.py", line 144, in query stdout.decode('utf-8'), stderr[:500_000].decode('utf-8'))) RuntimeError: HHblits failed stdout:

stderr:

23:37:48.447 INFO: Searching 65983866 column state sequences.

23:37:49.637 INFO: Searching 15161831 column state sequences.

23:37:49.720 INFO: ./data/22007.fasta is in A2M, A3M or FASTA format

23:37:49.721 INFO: Iteration 1

23:37:50.452 INFO: Prefiltering database

23:55:52.099 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 6905944

00:03:15.552 WARNING: Number of hits passing 2nd prefilter (reduced from 319529 to allowed maximum of 100000). You can increase the allowed maximum using the -maxfilt option.

00:07:13.586 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment) : 1127018

00:07:44.644 WARNING: database contains sequences that exceed maximum allowed size (maxres = 20001). Max sequence length can be increased with parameter -maxres.

00:07:45.136 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment) : 178190

00:07:45.137 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 178190

00:07:45.137 INFO: Scoring 178190 HMMs using HMM-HMM Viterbi alignment

00:07:45.366 INFO: Alternative alignment: 0

00:07:58.684 INFO: 2000 alignments done

00:08:14.142 INFO: 4000 alignments done

00:08:28.631 INFO: 6000 alignments done

00:08:42.732 INFO: 8000 alignments done

00:08:55.653 INFO: 10000 alignments done

00:09:06.817 INFO: 12000 alignments done

00:09:18.051 INFO: 14000 alignments done

00:09:33.523 INFO: 16000 alignments done

00:09:43.505 INFO: 18000 alignments done

00:09:53.848 INFO: 20000 alignments done

00:10:11.309 INFO: 22000 alignments done

00:10:21.097 INFO: 24000 alignments done

00:10:30.242 INFO: 26000 alignments done

00:10:39.877 INFO: 28000 alignments done

00:10:49.706 INFO: 30000 alignments done

00:11:02.306 INFO: 32000 alignments done

00:11:14.206 INFO: 34000 alignments done

00:11:24.107 INFO: 36000 alignments done

00:11:34.196 INFO: 38000 alignments done

00:11:43.886 INFO: 40000 alignments done

00:11:54.462 INFO: 42000 alignments done

00:12:06.316 INFO: 44000 alignments done

00:12:16.029 INFO: 46000 alignments done

00:12:25.527 INFO: 48000 alignments done

00:12:34.905 INFO: 50000 alignments done

00:12:44.296 INFO: 52000 alignments done

00:12:54.272 INFO: 54000 alignments done

00:13:05.223 INFO: 56000 alignments done

00:13:23.589 INFO: 58000 alignments done

00:13:34.849 INFO: 60000 alignments done

00:13:46.660 INFO: 62000 alignments done

00:13:59.061 INFO: 64000 alignments done

00:14:11.361 INFO: 66000 alignments done

00:14:23.280 INFO: 68000 alignments done

00:14:35.515 INFO: 70000 alignments done

00:14:47.450 INFO: 72000 alignments done

00:15:00.765 INFO: 74000 alignments done

00:15:14.500 INFO: 76000 alignments done

00:15:28.118 INFO: 78000 alignments done

00:15:41.805 INFO: 80000 alignments done

00:15:55.681 INFO: 82000 alignments done

00:16:09.212 INFO: 84000 alignments done

00:16:22.941 INFO: 86000 alignments done

00:16:34.996 INFO: 88000 alignments done

00:16:47.443 INFO: 90000 alignments done

00:16:58.603 INFO: 92000 alignments done

00:17:10.273 INFO: 94000 alignments done

00:17:21.990 INFO: 96000 alignments done

00:17:36.342 INFO: 98000 alignments done

00:17:51.060 INFO: 100000 alignments done

00:17:59.710 INFO: 102000 alignments done

00:18:07.873 INFO: 104000 alignments done

00:18:16.130 INFO: 106000 alignments done

00:18:24.375 INFO: 108000 alignments done

00:18:32.424 INFO: 110000 alignments done

00:18:42.069 INFO: 112000 alignments done

00:18:49.615 INFO: 114000 alignments done

00:18:58.052 INFO: 116000 alignments done

00:19:06.386 INFO: 118000 alignments done

00:19:14.232 INFO: 120000 alignments done

00:19:22.616 INFO: 122000 alignments done

00:19:30.370 INFO: 124000 alignments done

00:19:38.019 INFO: 126000 alignments done

00:19:45.593 INFO: 128000 alignments done

00:19:45.601 INFO: Stop after DB-HHM: 128000 because early stop 19.5091 < filter cutoff 20

00:19:45.610 INFO: Alternative alignment: 1

00:25:51.603 INFO: 106040 alignments done

00:25:52.034 INFO: Alternative alignment: 2

00:30:34.459 INFO: 81231 alignments done

00:30:35.078 INFO: Alternative alignment: 3

00:34:38.509 INFO: 68285 alignments done

00:34:44.972 INFO: Premerge done

00:34:45.002 INFO: Realigning 89456 HMM-HMM alignments using Maximum Accuracy algorithm

error report
opened by huangjk1103 17
jax.tree_util.tree_multimap() is deprecated

I got the follwoing messenge in cell 5 (run alphafold & relax with AMBER):

/usr/local/lib/python3.7/dist-packages/jax/_src/tree_util.py:189: FutureWarning: jax.tree_util.tree_multimap() is deprecated. Please use jax.tree_util.tree_map() instead as a drop-in replacement. 'instead as a drop-in replacement.', FutureWarning)

Any idea what exactly the issue is? -Michael.
duplicate

opened by MWNautilus 16
Error w multimer modeling

Trying a sequence with two FASTA files, one after the other, getting this error after ~ 80 minutes, during msa creation. Any thoughts?

I1103 20:24:47.638477 140257921226560 run_docker.py:222] _run_main(main, args) I1103 20:24:47.638627 140257921226560 run_docker.py:222] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main I1103 20:24:47.638826 140257921226560 run_docker.py:222] sys.exit(main(argv)) I1103 20:24:47.638977 140257921226560 run_docker.py:222] File "/app/alphafold/run_alphafold.py", line 412, in main I1103 20:24:47.639122 140257921226560 run_docker.py:222] is_prokaryote=is_prokaryote) I1103 20:24:47.639266 140257921226560 run_docker.py:222] File "/app/alphafold/run_alphafold.py", line 169, in predict_structure I1103 20:24:47.639411 140257921226560 run_docker.py:222] is_prokaryote=is_prokaryote) I1103 20:24:47.639565 140257921226560 run_docker.py:222] File "/app/alphafold/alphafold/data/pipeline_multimer.py", line 282, in process I1103 20:24:47.639721 140257921226560 run_docker.py:222] is_prokaryote=is_prokaryote, I1103 20:24:47.639865 140257921226560 run_docker.py:222] File "/app/alphafold/alphafold/data/feature_processing.py", line 70, in pair_and_merge I1103 20:24:47.640007 140257921226560 run_docker.py:222] chains=np_chains_list, prokaryotic=is_prokaryote) I1103 20:24:47.640149 140257921226560 run_docker.py:222] File "/app/alphafold/alphafold/data/msa_pairing.py", line 92, in create_paired_features I1103 20:24:47.640291 140257921226560 run_docker.py:222] chains, prokaryotic) I1103 20:24:47.640434 140257921226560 run_docker.py:222] File "/app/alphafold/alphafold/data/msa_pairing.py", line 349, in pair_sequences I1103 20:24:47.640576 140257921226560 run_docker.py:222] common_species.remove(b'') # Remove target sequence species. I1103 20:24:47.640726 140257921226560 run_docker.py:222] ValueError: list.remove(x): x not in list
error report

opened by arashnh11 16

failed to build docker

Wonderful repo !!! Thanks so much for the Alphafold2 Team. And I have a problem in building alphafold docker. Error information is shown below. It seems that the related cuda resource is not found. Thanks all.

$ docker build -f docker/Dockerfile -t alphafold .

Sending build context to Docker daemon  12.69MB
Step 1/19 : ARG CUDA=11.0
Step 2/19 : FROM nvidia/cuda:${CUDA}-base
 ---> 2ec708416bb8
Step 3/19 : ARG CUDA
 ---> Using cache
 ---> 076eace7d488
Step 4/19 : SHELL ["/bin/bash", "-c"]
 ---> Using cache
 ---> b57b88dc2b9a
Step 5/19 : RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y       build-essential       cmake       cuda-command-line-tools-${CUDA/./-}       git       hmmer       kalign       tzdata       wget     && rm -rf /var/lib/apt/lists/*
 ---> Running in 52e953f25c2b
...
...


E: Failed to fetch https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/by-hash/SHA256/27b2dbb347d54776dec155d98e1eefbde6c10a3fd1295007c3e836cfd9b98522  404  Not Found [IP: 58.205.210.80 443]
E: Some index files failed to download. They have been ignored, or old ones used instead.

opened by kuixu 16

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 117: invalid continuation byte

run_docker.py:193] I0810 12:44:02.365815 140477161113408 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 35.796 seconds run_docker.py:193] Traceback (most recent call last): run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 302, in run_docker.py:193] app.run(main) run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run run_docker.py:193] _run_main(main, args) run_docker.py:193] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main run_docker.py:193] sys.exit(main(argv)) run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 284, in main run_docker.py:193] random_seed=random_seed) run_docker.py:193] File "/app/alphafold/run_alphafold.py", line 128, in predict_structure run_docker.py:193] msa_output_dir=msa_output_dir) run_docker.py:193] File "/app/alphafold/alphafold/data/pipeline.py", line 134, in process run_docker.py:193] input_fasta_path)[0] run_docker.py:193] File "/app/alphafold/alphafold/data/tools/jackhmmer.py", line 163, in query run_docker.py:193] return [self._query_chunk(input_fasta_path, self.database_path)] run_docker.py:193] File "/app/alphafold/alphafold/data/tools/jackhmmer.py", line 140, in _query_chunk run_docker.py:193] 'Jackhmmer failed\nstderr:\n%s\n' % stderr.decode('utf-8')) run_docker.py:193] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 117: invalid continuation byte
error report

opened by hopanoid 15
Non docker setup

Dear AlphaFold authors,

Since most of the IT does not allow docker on their HPC's or production servers, we have made a small attempt to create a How-To for non-docker AlphaFold setup.

Can be found here: https://github.com/kalininalab/alphafold_non_docker

Thank you for the tool!

Kind regards, Sanjay
setup

opened by sanjaysrikakulam 15

Fixing zip command issue in Google Colab.

When running the Google Colab notebook implementation cell 5. Run AlphaFold and download prediction is throwing the following error:

NotImplementedError                       Traceback (most recent call last)
[<ipython-input-1-bc0091fa34e2>](https://localhost:8080/#) in <module>()
    577 sequence = 'AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADASDLNAYDYLIIGCPTWNVGELQSDWEGIYDDLDSVNFQGKKVAYFGAGDQVGYSDNFQDAMGILEEKISSLGSQTVGYWPIEGYDFNESKAVRNNQFVGLAIDEDNQPDLTKNRIKTWVSQLKSEFGL'  #@param {type:"string"}
    578 
--> 579 run_prediction(sequence)

3 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in _run_command(cmd, clear_streamed_output)
    166   if locale_encoding != _ENCODING:
    167     raise NotImplementedError(
--> 168         'A UTF-8 locale is required. Got {}'.format(locale_encoding))
    169 
    170   parent_pty, child_pty = pty.openpty()

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

As discussed in the issue: https://github.com/deepmind/alphafold/issues/483

This is more of a Google Colab issue when trying to create theoutput_dir zip file from theoutput_dir folder. I solved this by usingshutil to create the zip file.

I simply replace the line of code !zip -q -r {output_dir}.zip {output_dir} from cell 5. Run AlphaFold and download prediction with shutil.make_archive(output_dir, 'zip', output_dir) I also made sure to add import shutil at the first cell.

This will create the zip file without using any terminal commands. This fix should not cause any issues in the future since we don't rely on the Google Colab terminal commands anymore.

opened by gmihaila 2

Implement case-insensitive mmcif parsing

Alphafold database cif-s are annotated with capitalized PEPTIDE. Parsing these files with the existing case-sensitive function results in empty mmcif_object. Suggest switching to case-insensitive parsing.

opened by SimonKitSangChu 1

CUDA_ERROR_ILLEGAL_ADDRESS error with AlphaFold multimer 2.3.0

Hi!

I am trying to run AlphaFold 2.3.0 multimer and encountered this error: Execution of replica 0 failed: INTERNAL: Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered (see details below). I was wondering if you could help me resolve it? Thank you very much!!

Machine spec etc.:

Ubuntu 20.04 LTS
GPU: nvidia RTX 3090
cuda version: tried both 11.1.1 and 11.4.0 (both gave the same error)
total length of the protein complex: ~2200 aa

When I searched related errors online, it seems there are generally two solutions proposed: (1) change to a newer cuda version https://github.com/deepmind/dm-haiku/issues/204, or (2) disable unified memory https://github.com/deepmind/alphafold/issues/406.

I tried using cuda 11.4.0 instead of 11.1.1 by changing the following lines in Dockerfile, but the same error persists.

ARG CUDA=11.1.1 --->>> ARG CUDA=11.4.0
FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04 --->>> FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu20.04
conda install -y -c conda-forge cudatoolkit==${CUDA_VERSION} --->>> conda install -y -c "nvidia/label/cuda-11.4.0" cuda-toolkit

As for (2) disable unified memory, I am worried that this would give me out of memory error given the size of the protein.

Not sure if this is relevant, but this is a recent problem and prediction for this and other similarly-sized complexes worked fine before (was using v2.2.0 before, and I wonder if this is an issue with e.g. version of jax or jaxlib).

Thank you very much!

Error message:

I1226 15:42:37.167795 140027647279936 run_docker.py:255] I1226 06:42:37.167222 140635718940480 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I1226 15:42:39.806318 140027647279936 run_docker.py:255] I1226 06:42:39.805861 140635718940480 amber_minimize.py:68] Restraining 17790 / 35357 particles.
I1226 15:45:15.867727 140027647279936 run_docker.py:255] I1226 06:45:15.866998 140635718940480 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I1226 15:45:42.889597 140027647279936 run_docker.py:255] 2022-12-26 06:45:42.889173: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
I1226 15:45:42.899475 140027647279936 run_docker.py:255] Traceback (most recent call last):
I1226 15:45:42.899577 140027647279936 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module>
I1226 15:45:42.899646 140027647279936 run_docker.py:255] app.run(main)
I1226 15:45:42.899709 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I1226 15:45:42.899771 140027647279936 run_docker.py:255] _run_main(main, args)
I1226 15:45:42.899834 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I1226 15:45:42.899896 140027647279936 run_docker.py:255] sys.exit(main(argv))
I1226 15:45:42.899957 140027647279936 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main
I1226 15:45:42.900018 140027647279936 run_docker.py:255] predict_structure(
I1226 15:45:42.900109 140027647279936 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 243, in predict_structure
I1226 15:45:42.900174 140027647279936 run_docker.py:255] relaxed_pdb_str, _, violations = amber_relaxer.process(
I1226 15:45:42.900234 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/relax/relax.py", line 62, in process
I1226 15:45:42.900292 140027647279936 run_docker.py:255] out = amber_minimize.run_pipeline(
I1226 15:45:42.900353 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/relax/amber_minimize.py", line 489, in run_pipeline
I1226 15:45:42.900412 140027647279936 run_docker.py:255] ret.update(get_violation_metrics(prot))
I1226 15:45:42.900472 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/relax/amber_minimize.py", line 357, in get_violation_metrics
I1226 15:45:42.900531 140027647279936 run_docker.py:255] structural_violations, struct_metrics = find_violations(prot)
I1226 15:45:42.900591 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/relax/amber_minimize.py", line 339, in find_violations
I1226 15:45:42.900651 140027647279936 run_docker.py:255] violations = folding.find_structural_violations(
I1226 15:45:42.900712 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/model/folding.py", line 761, in find_structural_violations
I1226 15:45:42.900774 140027647279936 run_docker.py:255] between_residue_clashes = all_atom.between_residue_clash_loss(
I1226 15:45:42.900835 140027647279936 run_docker.py:255] File "/app/alphafold/alphafold/model/all_atom.py", line 783, in between_residue_clash_loss
I1226 15:45:42.900898 140027647279936 run_docker.py:255] dists = jnp.sqrt(1e-10 + jnp.sum(
I1226 15:45:42.900959 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/numpy/reductions.py", line 216, in sum
I1226 15:45:42.901019 140027647279936 run_docker.py:255] return _reduce_sum(a, axis=_ensure_optional_axes(axis), dtype=dtype, out=out,
I1226 15:45:42.901078 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback
I1226 15:45:42.901138 140027647279936 run_docker.py:255] return fun(*args, **kwargs)
I1226 15:45:42.901199 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/api.py", line 623, in cache_miss
I1226 15:45:42.901261 140027647279936 run_docker.py:255] out_flat = call_bind_continuation(execute(*args_flat))
I1226 15:45:42.901322 140027647279936 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I1226 15:45:42.901383 140027647279936 run_docker.py:255] out_flat = compiled.execute(in_flat)
I1226 15:45:42.901446 140027647279936 run_docker.py:255] jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
I1226 15:45:42.901507 140027647279936 run_docker.py:255]
I1226 15:45:42.901568 140027647279936 run_docker.py:255] The stack trace below excludes JAX-internal frames.
I1226 15:45:42.901637 140027647279936 run_docker.py:255] The preceding is the original exception that occurred, unmodified.

opened by sz-1002 3

Colab pro + failing to predict >2000 residues

Hi, I am using AlphaFold 2 (Colab pro +) to predict some proteins with long sequences. While it successfully predicted the structures of shorter sequences, it keeps failing with longer ones even tho I upgraded to colab pro + to have higher memory. Here is what I am getting:

Is there a way to fix this issue or is AlphaFold 2 not currently able to predict >2000 bp sequences? Thanks in advance!

opened by HBCH1 0

Releases(v2.3.0)

v2.3.0(Dec 13, 2022)
Version v2.3.0 updates the AlphaFold-Multimer model parameters. These new models are expected to be more accurate on large protein complexes but use the same model architecture and training methodology as our previously released AlphaFold-Multimer paper. See the v2.3.0 release note for more details.

Thanks to various memory optimisations, AlphaFold-Multimer now uses less GPU memory and it can therefore handle longer proteins.

A number of other bug fixes and small improvements have been made.

Change log

Added new AlphaFold-Multimer models with better accuracy on large protein complexes.

Added early stopping to recycling.

Added filtering for non-protein sequences in the pdb_seqres download script to prevent template search errors.

Fixed a bug where histidine residues had sometimes swapped atom coordinates after relaxation (thanks @avwillems).

Updated MGnify to 2022_05, UniRef90 to 2022_01, UniClust30 to 2021_03, UniProt in Colab notebook to 2021_04.

Used bf16 in multimer inference – reduces GPU memory usage.

Upcast to fp32 when using bf16 in LayerNorm and replace hk.LayerNorm with common_modules.LayerNorm.

Updated Jax to 0.3.25 and Haiku to 0.0.9 for consistency with the AlphaFold Colab notebook.

Changed TriangleMultiplication to use fused projections and various other memory optimisations.

Upgraded Python version in the AlphaFold Colab notebook to 3.8.

AlphaFold Colab notebook usability improvements – multimers with up to 20 chains are now supported, higher sequence length limits, number of recycling iterations can now be controlled, and added an option to run single chains on the multimer model.

Relaxation metrics are now saved in relax_metrics.json.

Some Jax deprecation errors were addressed (thanks @jinmingyi1998).

Various documentation and code improvements (thanks @mathe42).

Source code(tar.gz)
Source code(zip)
v2.2.4(Sep 21, 2022)
Version v2.2.4 is a bug fix release

Change log

Bump versions of third party libraries: jax 0.3.17, absl-py 1.0.0, haiku 0.0.7, numpy 1.21.6, tensorflow 2.9.0

Adapt jnp.take to account for behaviour with the new jax version, see https://github.com/deepmind/alphafold/issues/513 (thanks @sokrypton).

Reduce size of docker image by removing package caches, see https://github.com/deepmind/alphafold/pull/526 (thanks @TheDen).

Fix incorrect argument in backbone_loss, see https://github.com/deepmind/alphafold/issues/570 (thanks @sokrypton).

Source code(tar.gz)
Source code(zip)
v2.2.3(Aug 25, 2022)
Version v2.2.3 is a bug fix release.

Change log

Pin Conda version to 4.13.0 to prevent Docker/Colab setup issues (thanks @Meghpal, @michaelkeith18).

Change the Colab PAE json output to new format that matches the format used in the new release of the AlphaFold Protein Structure Database (AFDB). See the AFDB FAQ for a description of the new format.

Add a readme file for AFDB.

Type hint improvements.

Fix tests and improve internal testing infrastructure.

Fix Dockerfile breakage due to https://github.com/google/jax/issues/11142.

Source code(tar.gz)
Source code(zip)
v2.2.2(Jun 13, 2022)

A small bug fix release that fixes a bug introduced in v2.2.1 (thanks @lucajovine).
Source code(tar.gz)
Source code(zip)
v2.2.1(Jun 13, 2022)
Version v2.2.1 is a bug fix release.

Change log

Update from CUDA 11.1 to to 11.1.1 which addresses the public key issue.

Pin protobuf version to 3.20.1 (thanks @britnyblu, @ShoufaChen, @aputron).

Clarify in the README that AlphaFold works only under Linux.

Fix the jax.tree_multimap deprecation warning.

Do not reuse the temporary output directory in run_alphafold_test (thanks @branfosj).

Fix the version in setup.py (thanks @cmeesters).

Source code(tar.gz)
Source code(zip)
v2.2.0(Mar 10, 2022)
Version v2.2.0 updates the AlphaFold-Multimer model parameters. These new models have greatly reduced numbers of clashes on average and are slightly more accurate. Read the updated AlphaFold-Multimer paper for more details.

A number of other bug fixes and small improvements have been made.

Change log

Added new AlphaFold-Multimer models with greatly reduced numbers of clashes on average and slightly increased accuracy.

Use DeviceRequest rather than runtime=nvidia to expose GPUs to the container (thanks @aburger).

Simplified mounting of files in Docker.

Removed unused bias argument in GlobalAttention (thanks @breadbread1984).

Removed prokaryotic MSA pairing algorithm as it didn’t improve accuracy on average.

Added the ability to run with multiple seeds per model to match the AlphaFold-Multimer paper.

Fixed degraded performance when using num_recycle=0 with models trained with recycling due to incorrect skipping of layers (thanks @sokrypton).

Added split_rng=False (current default) to sharded_map to support new Haiku release.

Removed unused code in amber_minimize.py.

Source code(tar.gz)
Source code(zip)
v2.1.2(Jan 28, 2022)
Version v2.1.2 is a bug fix release that also includes the earlier license change.

Change log

Update the license of the AlphaFold parameters from CC BY-NC 4.0 to CC BY 4.0. There are no changes to the actual model parameters.

The relaxation stage now runs on GPU by default and should be roughly 3x faster thanks to that. You can control this behaviour using the enable_gpu_relax flag (thanks @ojcharles).

The relaxation stage can now be disabled using the run_relax flag (thanks @bkpoon).

AlphaFold in Docker is now run as the current user not as the root, you can control that using the docker_user flag (thanks @akors).

Truncate the MSA when reading the raw Stockholm file to prevent out of memory issues. This should help in cases where the MSA found by Jackhmmer was massive (thanks @hegelab).

Update Dockerfile CUDA version to 11.1 and fix JAX version (thanks @chrisroat).

Small README, Colab, and flag documentation improvements.

Source code(tar.gz)
Source code(zip)
v2.1.1(Nov 5, 2021)
Version v2.1.1 is a bug fix release for the AlphaFold-Multimer release (v2.1.0).

Change log:

Fixed a bug which caused a crash if the multimer input fasta file contained SwissProt identifiers in the sequence descriptions (thanks @arashnh11, @RodenLuo).

Fixed a bug in the Colab notebook with single-chain PAE visualisation (thanks @Alleko).

A few README clarifications and additions.

Source code(tar.gz)
Source code(zip)
v2.1.0(Nov 2, 2021)
Version 2.1.0 adds the AlphaFold-Multimer model and fixes a number of issues reported in the last few months.

Change log:

[new feature] AlphaFold-Multimer data pipeline, model and metrics have been added. Use model_preset=multimer to run with AlphaFold-Multimer.

[change] AlphaFold-Multimer no longer pre-processes the features via TensorFlow but instead does it in the JAX module code.

Added a note and a check that the directory with data is outside the AlphaFold repository for faster Docker builds (thanks @jamespeapen).

Advertise Python 3.7, 3.8, 3.9, 3.10 in setup.py (thanks @anukaal).

Added an FAQ explaining that the Colab on free tier can time out (thanks @mooingcat).

Stop using hardcoded /tmp, instead use $TMPDIR (thanks @meson800, @EricDeveaud).

Make run_docker fully configurable via flags: data_dir, docker_image_name, output_dir (thanks @akors, @chrisroat).

Look for stereo_chemical_props.txt relative to the residue_constants module (thanks @rjgildea).

Crop UniRef90 MSA to 10,000 sequences to prevent hitting the 2 GB proto field limit and use less memory (thanks @biohegedus and @chrisroat).

Finding third party tool binaries is now more robust and gives you better errors if any are missing (thanks @FanchTheSystem).

Refactor and a few fixes and usability improvements in the AlphaFold Colab.

Source code(tar.gz)
Source code(zip)
v2.0.1(Sep 30, 2021)
Version 2.0.1 is mainly a bug fix release. We thank everyone who reported issues and proposed solutions.

Change log:

[new feature] Added AlphaFold Colab notebook that enables convenient folding from your browser.

[new feature] The reduced_dbs preset was added together with small BFD.

Some of the genetic databases are now mirrored on GCP.

Added a missing data/__init__.py and model/tf/__init__.py files.

README fixes and additions.

Switched to using cudnn base image based on Ubuntu 18.04.

Switched to tensorflow-cpu since we don't need a GPU when running the data pipeline.

Improved logging in the AlphaFold pipeline.

Fixed a few typos and added and fixed a few comments.

Added pLDDT in the B-factor column of the output PDBs.

Skip obsolete PDB templates that don't have a replacement.

Small test improvements.

Source code(tar.gz)
Source code(zip)
v2.0.0(Jul 16, 2021)

Source code(tar.gz)
Source code(zip)