Install alphafold on the local machine, get out of docker.

Overview

header

AlphaFold

This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.

CASP14 predictions

First time setup

The following steps are required in order to run AlphaFold:

Install on Ubuntu

  1. Requirements

  2. Install softwares

    git clone https://github.com/kuixu/alphafold.git
    cd alphafold
    
    ./install_on_local.sh

    or step by step

        
    conda create -n af2 python=3.8 -y
    conda activate af2
    
    conda install -y -c nvidia cudnn==8.0.4
    conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2
    
    conda install -y -c conda-forge \
        openmm=7.5.1 \
        pdbfixer \
        pip
    
    # python pkgs
    pip3 install --upgrade pip \
        && pip3 install -r ./requirements.txt \
        && pip3 install --upgrade "jax[cuda111]" -f \
        https://storage.googleapis.com/jax-releases/jax_releases.html
    
    # work_path=/path/to/alphafold-code
    work_path=$(PWD)
    
    # update openmm 
    a=$(which python)
    cd $(dirname $(dirname $a))/lib/python3.8/site-packages
    patch -p0 < $work_path/docker/openmm.patch
    
    
    
  3. Download genetic databases (see below).

  4. Download model parameters (see below).

  5. Set path.

    # Set to target of scripts/download_all_databases.sh
    DOWNLOAD_DIR = '/path/to/database'
    
    # Path to a directory that will store the results.
    output_dir = '/path/to/output_dir'
    
    

Genetic databases

This step requires rsync and aria2c to be installed on your machine.

AlphaFold needs multiple genetic (sequence) databases to run:

We provide a script scripts/download_all_data.sh that can be used to download and set up all of these databases. This should take 8–12 hours.

📒 Note: The total download size is around 428 GB and the total size when unzipped is 2.2 TB. Please make sure you have a large enough hard drive space, bandwidth and time to download.

This script will also download the model parameter files. Once the script has finished, you should have the following directory structure:

$DOWNLOAD_DIR/                             # Total: ~ 2.2 TB (download: 428 GB)
    bfd/                                   # ~ 1.8 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 64 GB (download: 32.9 GB)
        mgy_clusters.fa
    params/                                # ~ 3.5 GB (download: 3.5 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # LICENSE,
        # = 11 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
        mmcif_files/
            # About 180,000 .cif files.
        obsolete.dat
    uniclust30/                            # ~ 87 GB (download: 24.9 GB)
        uniclust30_2018_08/
            # 13 files.
    uniref90/                              # ~ 59 GB (download: 29.7 GB)
        uniref90.fasta

Model parameters

While the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters are made available for non-commercial use only under the terms of the CC BY-NC 4.0 license. Please see the Disclaimer below for more detail.

The AlphaFold parameters are available from https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar, and are downloaded as part of the scripts/download_all_data.sh script. This script will download parameters for:

  • 5 models which were used during CASP14, and were extensively validated for structure prediction quality (see Jumper et al. 2021, Suppl. Methods 1.12 for details).
  • 5 pTM models, which were fine-tuned to produce pTM (predicted TM-score) and predicted aligned error values alongside their structure predictions (see Jumper et al. 2021, Suppl. Methods 1.9.7 for details).

Running AlphaFold on local

  1. Clone this repository and cd into it.

  2. Run run_alphafold.py pointing to a FASTA file containing the protein sequence for which you wish to predict the structure. If you are predicting the structure of a protein that is already in PDB and you wish to avoid using it as a template, then max_template_date must be set to be before the release date of the structure. For example, for the T1050 CASP14 target:

    python3 run_alphafold.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14
    # or simply
    exp/run_local.sh T1050.fasta

    By default, Alphafold will attempt to use all visible GPU devices. To use a subset, specify a comma-separated list of GPU UUID(s) or index(es) using the CUDA_VISIBLE_DEVICES=0.

  3. You can control AlphaFold speed / quality tradeoff by adding either --preset=full_dbs or --preset=casp14 to the run command. We provide the following presets:

    • casp14: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with 8 ensemblings.
    • full_dbs: The model in this preset is 8 times faster than the casp14 preset with a very minor quality drop (-0.1 average GDT drop on CASP14 domains). It runs with all genetic databases and with no ensembling.

    Running the command above with the casp14 preset would look like this:

    python3 docker/run_docker.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14 --preset=casp14

AlphaFold output

The outputs will be in a subfolder of output_dir in run_docker.py. They include the computed MSAs, unrelaxed structures, relaxed structures, ranked structures, raw model outputs, prediction metadata, and section timings. The output_dir directory will have the following structure:

output_dir/
    features.pkl
    ranked_{0,1,2,3,4}.pdb
    ranking_debug.json
    relaxed_model_{1,2,3,4,5}.pdb
    result_model_{1,2,3,4,5}.pkl
    timings.json
    unrelaxed_model_{1,2,3,4,5}.pdb
    msas/
        bfd_uniclust_hits.a3m
        mgnify_hits.sto
        uniref90_hits.sto

The contents of each output file are as follows:

  • features.pkl – A pickle file containing the input feature Numpy arrays used by the models to produce the structures.
  • unrelaxed_model_*.pdb – A PDB format text file containing the predicted structure, exactly as outputted by the model.
  • relaxed_model_*.pdb – A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction, see Jumper et al. 2021, Suppl. Methods 1.8.6 for details.
  • ranked_*.pdb – A PDB format text file containing the relaxed predicted structures, after reordering by model confidence. Here ranked_0.pdb should contain the prediction with the highest confidence, and ranked_4.pdb the prediction with the lowest confidence. To rank model confidence, we use predicted LDDT (pLDDT), see Jumper et al. 2021, Suppl. Methods 1.9.6 for details.
  • ranking_debug.json – A JSON format text file containing the pLDDT values used to perform the model ranking, and a mapping back to the original model names.
  • timings.json – A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.
  • msas/ - A directory containing the files describing the various genetic tool hits that were used to construct the input MSA.
  • result_model_*.pkl – A pickle file containing a nested dictionary of the various Numpy arrays directly produced by the model. In addition to the output of the structure module, this includes auxiliary outputs such as distograms and pLDDT scores. If using the pTM models then the pTM logits will also be contained in this file.

This code has been tested to match mean top-1 accuracy on a CASP14 test set with pLDDT ranking over 5 model predictions (some CASP targets were run with earlier versions of AlphaFold and some had manual interventions; see our forthcoming publication for details). Some targets such as T1064 may also have high individual run variance over random seeds.

Inferencing many proteins

The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required.

We do not provide a bulk inference script, but it should be straightforward to develop on top of the RunModel.predict method with a parallel system for precomputing multi-sequence alignments. Alternatively, this script can be run repeatedly with only moderate overhead.

Note on reproducibility

AlphaFold's output for a small number of proteins has high inter-run variance, and may be affected by changes in the input data. The CASP14 target T1064 is a notable example; the large number of SARS-CoV-2-related sequences recently deposited changes its MSA significantly. This variability is somewhat mitigated by the model selection process; running 5 models and taking the most confident.

To reproduce the results of our CASP14 system as closely as possible you must use the same database versions we used in CASP. These may not match the default versions downloaded by our scripts.

For genetics:

For templates:

  • PDB: (downloaded 2020-05-14)
  • PDB70: (downloaded 2020-05-13)

An alternative for templates is to use the latest PDB and PDB70, but pass the flag --max_template_date=2020-05-14, which restricts templates only to structures that were available at the start of CASP14.

Citing this work

If you use the code or data in this package, please cite:

@Article{AlphaFold2021,
  author  = {Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  doi     = {10.1038/s41586-021-03819-2},
  note    = {(Accelerated article preview)},
}

Acknowledgements

AlphaFold communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

License and Disclaimer

This is not an officially supported Google product.

Copyright 2021 DeepMind Technologies Limited.

AlphaFold Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Model Parameters License

The AlphaFold parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Comments
  • Could not find HHBlits database

    Could not find HHBlits database

    Traceback (most recent call last):
      File "run_alphafold.py", line 338, in <module>
        app.run(main)
      File "/home/yulab/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 312, in run
        _run_main(main, args)
      File "/home/yulab/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
        sys.exit(main(argv))
      File "run_alphafold.py", line 273, in main
        data_pipeline = pipeline.DataPipeline(
      File "/home/yulab/software/alphafold/alphafold/data/pipeline.py", line 101, in __init__
        self.hhblits_bfd_uniclust_runner = hhblits.HHBlits(
      File "/home/yulab/software/alphafold/alphafold/data/tools/hhblits.py", line 83, in __init__
        raise ValueError(f'Could not find HHBlits database {database_path}')
    ValueError: Could not find HHBlits database /data1/AF2_data/uniclust30/UniRef30_2020_02
    

    in file run_alphafold.py, change line 77 to the following code will fix it.

        DOWNLOAD_DIR, 'uniclust30', 'UniRef30_2020_02', 'UniRef30_2020_02')
    
    opened by Neutrino0532 3
  • Wrong order of conda packages?

    Wrong order of conda packages?

    Hi, thanks for your docker-less solution!

    I always got Unsatisfiable Errors while installing bioconda. After installing conda-forge (and the others) it work!

    Again, thanks!

    opened by muthoff 2
  • ValueError: Could not find CIFs in /data01/xukui/alphafold/pdb_mmcif/mmcif_files

    ValueError: Could not find CIFs in /data01/xukui/alphafold/pdb_mmcif/mmcif_files

    Hello, I'm getting this error. Wondering is there a step I missed in setup. Can I download these mmcif files from somewhere? I cannot see them in your repository structure!

    Many thanks, Linda

    E0930 14:56:15.892669 139744909522752 templates.py:860] Could not find CIFs in /data01/xukui/alphafold/pdb_mmcif/mmcif_files
    Traceback (most recent call last):
      File "run_alphafold.py", line 338, in <module>
        app.run(main)
      File "/home/linda/programs/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 312, in run
        _run_main(main, args)
      File "/home/linda/programs/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
        sys.exit(main(argv))
      File "run_alphafold.py", line 265, in main
        template_featurizer = templates.TemplateHitFeaturizer(
      File "/home/linda/alphafold/alphafold/data/templates.py", line 861, in __init__
        raise ValueError(f'Could not find CIFs in {self._mmcif_dir}')
    ValueError: Could not find CIFs in /data01/xukui/alphafold/pdb_mmcif/mmcif_files
    
    opened by linda5mith 1
  • Setting working directory in install_on_local.sh

    Setting working directory in install_on_local.sh

    The script install_on_local.sh sets the variable work_path to $(PWD). In bash, this tries to run the command PWD, which usually doesn't exist. I believe, it should be $PWD, ${PWD} or $(pwd).

    https://github.com/kuixu/alphafold/blob/21530fa4783d3e876df83f2596d1306cfbe0c98f/install_on_local.sh#L24

    Also, ideally, some quotes should probably be added to support people running this in paths with spaces, eg:

    # work_path=/path/to/alphafold-code
    work_path="$PWD"
    # update openmm 
    a="$(which python)"
    cd "$(dirname "$(dirname "$a")")/lib/python3.8/site-packages"
    patch -p0 < "$work_path/docker/openmm.patch"
    
    opened by phusen 1
  • Bump tensorflow from 2.5.0 to 2.5.1

    Bump tensorflow from 2.5.0 to 2.5.1

    Bumps tensorflow from 2.5.0 to 2.5.1.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.5.1

    Release 2.5.1

    This release introduces several vulnerability fixes:

    • Fixes a heap out of bounds access in sparse reduction operations (CVE-2021-37635)
    • Fixes a floating point exception in SparseDenseCwiseDiv (CVE-2021-37636)
    • Fixes a null pointer dereference in CompressElement (CVE-2021-37637)
    • Fixes a null pointer dereference in RaggedTensorToTensor (CVE-2021-37638)
    • Fixes a null pointer dereference and a heap OOB read arising from operations restoring tensors (CVE-2021-37639)
    • Fixes an integer division by 0 in sparse reshaping (CVE-2021-37640)
    • Fixes a division by 0 in ResourceScatterDiv (CVE-2021-37642)
    • Fixes a heap OOB in RaggedGather (CVE-2021-37641)
    • Fixes a std::abort raised from TensorListReserve (CVE-2021-37644)
    • Fixes a null pointer dereference in MatrixDiagPartOp (CVE-2021-37643)
    • Fixes an integer overflow due to conversion to unsigned (CVE-2021-37645)
    • Fixes a bad allocation error in StringNGrams caused by integer conversion (CVE-2021-37646)
    • Fixes a null pointer dereference in SparseTensorSliceDataset (CVE-2021-37647)
    • Fixes an incorrect validation of SaveV2 inputs (CVE-2021-37648)
    • Fixes a null pointer dereference in UncompressElement (CVE-2021-37649)
    • Fixes a segfault and a heap buffer overflow in {Experimental,}DatasetToTFRecord (CVE-2021-37650)
    • Fixes a heap buffer overflow in FractionalAvgPoolGrad (CVE-2021-37651)
    • Fixes a use after free in boosted trees creation (CVE-2021-37652)
    • Fixes a division by 0 in ResourceGather (CVE-2021-37653)
    • Fixes a heap OOB and a CHECK fail in ResourceGather (CVE-2021-37654)
    • Fixes a heap OOB in ResourceScatterUpdate (CVE-2021-37655)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToSparse (CVE-2021-37656)
    • Fixes an undefined behavior arising from reference binding to nullptr in MatrixDiagV* ops (CVE-2021-37657)
    • Fixes an undefined behavior arising from reference binding to nullptr in MatrixSetDiagV* ops (CVE-2021-37658)
    • Fixes an undefined behavior arising from reference binding to nullptr and heap OOB in binary cwise ops (CVE-2021-37659)
    • Fixes a division by 0 in inplace operations (CVE-2021-37660)
    • Fixes a crash caused by integer conversion to unsigned (CVE-2021-37661)
    • Fixes an undefined behavior arising from reference binding to nullptr in boosted trees (CVE-2021-37662)
    • Fixes a heap OOB in boosted trees (CVE-2021-37664)
    • Fixes vulnerabilities arising from incomplete validation in QuantizeV2 (CVE-2021-37663)
    • Fixes vulnerabilities arising from incomplete validation in MKL requantization (CVE-2021-37665)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToVariant (CVE-2021-37666)
    • Fixes an undefined behavior arising from reference binding to nullptr in unicode encoding (CVE-2021-37667)
    • Fixes an FPE in tf.raw_ops.UnravelIndex (CVE-2021-37668)
    • Fixes a crash in NMS ops caused by integer conversion to unsigned (CVE-2021-37669)
    • Fixes a heap OOB in UpperBound and LowerBound (CVE-2021-37670)
    • Fixes an undefined behavior arising from reference binding to nullptr in map operations (CVE-2021-37671)
    • Fixes a heap OOB in SdcaOptimizerV2 (CVE-2021-37672)
    • Fixes a CHECK-fail in MapStage (CVE-2021-37673)
    • Fixes a vulnerability arising from incomplete validation in MaxPoolGrad (CVE-2021-37674)
    • Fixes an undefined behavior arising from reference binding to nullptr in shape inference (CVE-2021-37676)
    • Fixes a division by 0 in most convolution operators (CVE-2021-37675)
    • Fixes vulnerabilities arising from missing validation in shape inference for Dequantize (CVE-2021-37677)
    • Fixes an arbitrary code execution due to YAML deserialization (CVE-2021-37678)
    • Fixes a heap OOB in nested tf.map_fn with RaggedTensors (CVE-2021-37679)

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.5.1

    This release introduces several vulnerability fixes:

    • Fixes a heap out of bounds access in sparse reduction operations (CVE-2021-37635)
    • Fixes a floating point exception in SparseDenseCwiseDiv (CVE-2021-37636)
    • Fixes a null pointer dereference in CompressElement (CVE-2021-37637)
    • Fixes a null pointer dereference in RaggedTensorToTensor (CVE-2021-37638)
    • Fixes a null pointer dereference and a heap OOB read arising from operations restoring tensors (CVE-2021-37639)
    • Fixes an integer division by 0 in sparse reshaping (CVE-2021-37640)
    • Fixes a division by 0 in ResourceScatterDiv (CVE-2021-37642)
    • Fixes a heap OOB in RaggedGather (CVE-2021-37641)
    • Fixes a std::abort raised from TensorListReserve (CVE-2021-37644)
    • Fixes a null pointer dereference in MatrixDiagPartOp (CVE-2021-37643)
    • Fixes an integer overflow due to conversion to unsigned (CVE-2021-37645)
    • Fixes a bad allocation error in StringNGrams caused by integer conversion (CVE-2021-37646)
    • Fixes a null pointer dereference in SparseTensorSliceDataset (CVE-2021-37647)
    • Fixes an incorrect validation of SaveV2 inputs (CVE-2021-37648)
    • Fixes a null pointer dereference in UncompressElement (CVE-2021-37649)
    • Fixes a segfault and a heap buffer overflow in {Experimental,}DatasetToTFRecord (CVE-2021-37650)
    • Fixes a heap buffer overflow in FractionalAvgPoolGrad (CVE-2021-37651)
    • Fixes a use after free in boosted trees creation (CVE-2021-37652)
    • Fixes a division by 0 in ResourceGather (CVE-2021-37653)
    • Fixes a heap OOB and a CHECK fail in ResourceGather (CVE-2021-37654)
    • Fixes a heap OOB in ResourceScatterUpdate (CVE-2021-37655)
    • Fixes an undefined behavior arising from reference binding to nullptr in RaggedTensorToSparse

    ... (truncated)

    Commits
    • 8222c1c Merge pull request #51381 from tensorflow/mm-fix-r2.5-build
    • d584260 Disable broken/flaky test
    • f6c6ce3 Merge pull request #51367 from tensorflow-jenkins/version-numbers-2.5.1-17468
    • 3ca7812 Update version numbers to 2.5.1
    • 4fdf683 Merge pull request #51361 from tensorflow/mm-update-relnotes-on-r2.5
    • 05fc01a Put CVE numbers for fixes in parentheses
    • bee1dc4 Update release notes for the new patch release
    • 47beb4c Merge pull request #50597 from kruglov-dmitry/v2.5.0-sync-abseil-cmake-bazel
    • 6f39597 Merge pull request #49383 from ashahab/abin-load-segfault-r2.5
    • 0539b34 Merge pull request #48979 from liufengdb/r2.5-cherrypick
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • can't find file to patch at input line 5

    can't find file to patch at input line 5

    patch -p0 < /mnt/e/total_scripts/ref_pkgs/bioinfor/protein/conda_alphafold/alphafold/docker/openmm.patch can't find file to patch at input line 5 Perhaps you used the wrong -p or --strip option? The text leading up to this was:

    |Index: simtk/openmm/app/topology.py |=================================================================== |--- simtk.orig/openmm/app/topology.py |+++ simtk/openmm/app/topology.py

    File to patch:

    opened by ZihaoXingUP 0
  • database/pdb_mmcif/raw/*/*.cif': No such file or directory

    database/pdb_mmcif/raw/*/*.cif': No such file or directory

    Hi there,

    I started downloading all data into my designated folder with sufficient room. While I am able to download params, bfd, mgnify, pdb70, and uniclust30, I got error message for downloading mmcif.

    $ ./download_pdb70.sh /database
    02/02 09:21:07 [NOTICE] Downloading 1 item(s)
    [#b9fa7b 1.8GiB/19GiB(9%) CN:1 DL:2.7MiB ETA:1h47m45s]
    

    but

    $ ./download_pdb_mmcif.sh /database
    Running rsync to fetch all mmCIF files (note that the rsync progress estimate might be inaccurate)...
    Unzipping all mmCIF files...
    Flattening all mmCIF files...
    mv: cannot stat '/home5/alphafold_database/pdb_mmcif/raw/*/*.cif': No such file or directory
    

    I also noticed that I am unable to download uniref90. Is this might be the firewall issue on my end? My university did put firewall on any ftp source.

    Please help.

    Thank you!

    David

    opened by CFDavidHou 3
  • Issue in installing the software

    Issue in installing the software

    Dear All, I am trying to install the software on my system (ubuntu) and when I execute the script for installing the software (install_on_local.sh) I get following error; """ Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.

    UnsatisfiableError: The following specifications were found to be incompatible with each other:

    Output in format: Requested package -> Available versions

    Package _openmp_mutex conflicts for: hhsuite==3.3.0 -> _openmp_mutex[version='>=4.5'] python=3.7 -> libgcc-ng[version='>=7.5.0'] -> _openmp_mutex[version='>=4.5'] kalign2 -> libgcc-ng[version='>=9.3.0'] -> _openmp_mutex[version='>=4.5'] hmmer -> libgcc-ng[version='>=9.3.0'] -> _openmp_mutex[version='>=4.5']

    Package libstdcxx-ng conflicts for: python=3.7 -> libstdcxx-ng[version='>=7.2.0|>=7.3.0'] hmmer -> libstdcxx-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=9.3.0'] hhsuite==3.3.0 -> libstdcxx-ng[version='>=7.5.0|>=9.3.0'] hhsuite==3.3.0 -> python[version='>=3.9,<3.10.0a0'] -> libstdcxx-ng[version='>=7.2.0|>=7.3.0']

    Package libgcc-ng conflicts for: hhsuite==3.3.0 -> perl[version='>=5.26.2,<5.26.3.0a0'] -> libgcc-ng[version='>=7.2.0|>=7.3.0'] hhsuite==3.3.0 -> libgcc-ng[version='>=7.5.0|>=9.3.0']The following specifications were found to be incompatible with your system:

    • feature:/linux-64::__glibc==2.31=0
    • hhsuite==3.3.0 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
    • hmmer -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
    • kalign2 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
    • python=3.7 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

    Your installed version is: 2.31

    Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: - Found conflicts! Looking for incompatible packages. """ I mostly appreciate to suggest me what the issue comes from and how I can solve it? Thanks so much Maryam

    opened by Maryamtarazkar 1
  • Core dumpled

    Core dumpled

    I am getting the following error:

    `I0826 14:47:31.789251 139693760276288 run_alphafold.py:185] Running model model_1 WARNING:tensorflow:From /home/zapata/alphafold/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead W0826 14:47:32.193996 139693760276288 deprecation.py:528] From /home/zapata/alphafold/alphafold/model/tf/input_pipeline.py:151: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead I0826 14:47:37.243475 139693760276288 model.py:132] Running predict with shape(feat) = {'aatype': (4, 485), 'residue_index': (4, 485), 'seq_length': (4,), 'template_aatype': (4, 4, 485), 'template_all_atom_masks': (4, 4, 485, 37), 'template_all_atom_positions': (4, 4, 485, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 485), 'msa_mask': (4, 508, 485), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 485, 3), 'template_pseudo_beta_mask': (4, 4, 485), 'atom14_atom_exists': (4, 485, 14), 'residx_atom14_to_atom37': (4, 485, 14), 'residx_atom37_to_atom14': (4, 485, 37), 'atom37_atom_exists': (4, 485, 37), 'extra_msa': (4, 5120, 485), 'extra_msa_mask': (4, 5120, 485), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 485), 'true_msa': (4, 508, 485), 'extra_has_deletion': (4, 5120, 485), 'extra_deletion_value': (4, 5120, 485), 'msa_feat': (4, 508, 485, 49), 'target_feat': (4, 485, 22)} 2021-08-26 14:48:24.665668: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2021-08-26 14:48:24.665708: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gemm_algorithm_picker.cc:113] Check failed: stream->parent()->GetBlasGemmAlgorithms(&algorithms) Fatal Python error: Aborted

    Current thread 0x00007f0cfcf56740 (most recent call first): File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 385 in backend_compile File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 81 in compile_or_get_cached File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 772 in _xla_callable File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/linear_util.py", line 262 in memoized_fun File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 619 in _xla_call_impl File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/core.py", line 613 in process_call File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/core.py", line 1617 in process File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/core.py", line 1605 in call_bind File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/core.py", line 1614 in bind File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/_src/api.py", line 405 in cache_miss File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162 in reraise_with_filtered_traceback File "/home/zapata/alphafold/alphafold/model/model.py", line 134 in predict File "run_alphafold.py", line 192 in predict_structure File "run_alphafold.py", line 310 in main File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/home/zapata/anaconda3/envs/af2/lib/python3.8/site-packages/absl/app.py", line 312 in run File "run_alphafold.py", line 338 in Aborted (core dumped) ` I have nvidia455 and cuda 11.11. I created an environment as indicated but It does not run.

    opened by tavolivos 0
  • Unable to initialize backend 'gpu'

    Unable to initialize backend 'gpu'

    Hello,I have follow you readme to install the anaconda environment,and it can run with cpu ,but it can not run with gpu ,and in my devices have a nvidia RTX TITAN GPU with 24G momeory ,when I whatever use bash python3 run_alphafold.py --fasta_paths=T1050.fasta --max_template_date=2020-05-14# or simply exp/run_local.sh T1050.fasta it warning with I0820 16:00:20.270564 140257858221888 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local:// I0820 16:00:20.281177 140257858221888 xla_bridge.py:212] Unable to initialize backend 'gpu': Not found: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host I0820 16:00:20.281787 140257858221888 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. W0820 16:00:20.282057 140257858221888 xla_bridge.py:215] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I feel it like run only with cpu ,and it only use 315MiB GPU Momory when I try to input nvidia-smi to look how much GPU memory use.

    opened by HGX-001 7
Owner
Kui Xu
Researcher, interested in Computational Biology, and 3D Computer Vision.
Kui Xu
A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Face-Detection-flask-gunicorn-nginx-docker This is a simple implementation of dockerized face-detection restful-API implemented with flask, Nginx, and

Pooya-Mohammadi 30 Dec 17, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

null 34 May 24, 2022
Download & Install mods for your favorit game with a few simple clicks

Husko's SteamWorkshop Downloader ?? IMPORTANT ❗ ?? The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i

Husko 67 Nov 25, 2022
pip install python-office

?? python for office ?? http://www.python4office.cn/ ?? ?? English Documentation ?? 简介 Python-office 是一个 Python 自动化办公第三方库,能解决大部分自动化办公的问题。而且每个功能只需一行代码,

程序员晚枫 272 Dec 29, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022
A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

Face Recognition GUI This repository is a GUI version of Face Recognition by Adam Geitgey, where e.g. Docker and Tkinter are utilized. All the materia

Kasper Henriksen 6 Dec 5, 2022
Collection of Docker images for ML/DL and video processing projects

Collection of Docker images for ML/DL and video processing projects. Overview of images Three types of images differ by tag postfix: base: Python with

OSAI 87 Nov 22, 2022
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 15.6k Jan 1, 2023
Notes taking website build with Docker + Django + React.

Notes website. Try it in browser! / But how to run? Description. This is monorepository with notes website. Website provides web interface for creatin

Kirill Zhosul 2 Jul 27, 2022
Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

Xavier 33 Oct 12, 2022
Try out deep learning models online on Google Colab

Try out deep learning models online on Google Colab

Erdene-Ochir Tuguldur 1.5k Dec 27, 2022
GPT, but made only out of gMLPs

GPT - gMLP This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will

Phil Wang 80 Dec 1, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

Nazim Shaikh 64 Nov 2, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022