A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Overview

ParallelFold

Author: Bozitao Zhong

This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model) of Alphafold2 local version.

How to install

First you should install Alphafold2. You can choose one of the following methods to install Alphafold locally.

  • Use official version from DeepMind with docker.
  • There are some other versions install Alphafold without docker.
  • Also you can use my guide which based on non_docker version and it can adjust to different cuda versions (cuda driver >= 10.1)

Then, put these 4 files in your Alphafold folder, this folder should have an original run_alphafold.py file, and I use a run_alphafold.sh file to run Alphafold easily (learned from non_docker version)

4 files:

  • run_alphafold.py: modified version of original run_alphafold.py, it skips featuring steps when there exists feature.pkl in output folder
  • run_alphaold.sh: bash script to run run_alphafold.py
  • run_feature.py: modified version of original run_alphafold.py, it exit python process after finished writing feature.pkl
  • run_feature.sh: bash scripts to run run_feature.py

How to run

First, you need CPUs to run run_feature.sh:

./run_feature.sh -d data -o output -m model_1 -f input/test3.fasta -t 2021-07-27

8 CPUs is enough, according to my test, more CPUs won't help with speed.

GPU can accelerate the hhblits step (but I think you choose this repo because GPU is expensive)

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/JOBNAME/

PS: Here I put my input files in an input folder to better organize my files, you can remove this.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -f input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

I have also upload my scripts in SJTU HPC (using slurm): sub_alphafold.slurm and sub_feature.slurm

Other Files

In ./Alphafold folder, I modified some python files (hhblits.py, hmmsearch.py, jackhmmer.py) , give these steps more CPUs for acceleration. But these processes have been tested and shown to be unable to accelerate by providing more CPU. Maybe this is because

Probably because DeepMind uses a wrapped process, I'm trying to improve it (work in progress).

If you have any question, please send your problem in issues

Comments
  • 运行脚本后,还是有问题。

    运行脚本后,还是有问题。

    博士好! 我发现我运行脚本后,cpu部分是可以正常运行了,但是GPU部分不管短序列(200+aa)还是长序列(1800+aa),都会报错,我的脚本如下: #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27 ./run_alphafold.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1,model_2,model_3,model_4,model_5 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27

    用了1张GPU卡提交的。

    报错内容如下:

    87 I0927 17:05:14.162350 139818804778816 xla_bridge.py:226] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. 88 I0927 17:05:23.883118 139818804778816 run_alphafold.py:272] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] 89 I0927 17:05:23.883379 139818804778816 run_alphafold.py:285] Using random seed 491376288278862761 for the data pipeline 90 I0927 17:05:23.892619 139818804778816 run_alphafold.py:151] Running model model_1 91 I0927 17:05:34.480318 139818804778816 model.py:131] Running predict with shape(feat) = {'aatype': (4, 233), 'residue_index': (4, 233), 'seq_length': (4,) , 'template_aatype': (4, 4, 233), 'template_all_atom_masks': (4, 4, 233, 37), 'template_all_atom_positions': (4, 4, 233, 37, 3), 'template_sum_probs': (4 , 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 233), 'msa_mask': (4, 508, 233), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'templat e_mask': (4, 4), 'template_pseudo_beta': (4, 4, 233, 3), 'template_pseudo_beta_mask': (4, 4, 233), 'atom14_atom_exists': (4, 233, 14), 'residx_atom14_to_ atom37': (4, 233, 14), 'residx_atom37_to_atom14': (4, 233, 37), 'atom37_atom_exists': (4, 233, 37), 'extra_msa': (4, 5120, 233), 'extra_msa_mask': (4, 51 20, 233), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 233), 'true_msa': (4, 508, 233), 'extra_has_deletion': (4, 5120, 233), 'extra_deletion_v alue': (4, 5120, 233), 'msa_feat': (4, 508, 233, 49), 'target_feat': (4, 233, 22)} 92 2021-09-27 17:05:35.143686: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Run ning ptxas --version returned 32512 93 2021-09-27 17:05:35.324896: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilati on of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, pl ease verify that sufficient filesystem space is provided. 94 Fatal Python error: Aborted 95 96 Thread 0x00007f2a1a311740 (most recent call first): 97 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 387 in backend_compile 98 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 324 in xla_primitive_callable 99 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 188 in cached 100 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 195 in wrapper 101 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 275 in apply_primitive 102 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 612 in process_primitive 103 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 267 in bind 104 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 388 in shift_right_logical 105 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 229 in threefry_seed 106 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 191 in seed_with_impl 107 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/random.py", line 105 in PRNGKey 108 File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/model.py", line 133 in predict 109 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 158 in predict_structure 110 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 289 in main 111 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main 112 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 312 in run 113 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 316 in

    从92-113行,不论序列长短都会出现这种报错。这是什么原因引起的呢? @Zuricho

    opened by zhoujingyu13687306871 16
  • Where Can I find The Protein sequence?

    Where Can I find The Protein sequence?

    After Reading the Article, AlphaFold Deployment and Optimization on HPC Platform, I want make some experiments according to the arctile, But I cannot find the Protein sequence online. Can you tell me the way to downloading the fasta file in the article?

    opened by yanchenmochen 4
  • How to run GPU part?

    How to run GPU part?

    How do I run model inference on GPU part of the process after featurization step? Does the model inference step automatically find feature.pkl in some folder?

    opened by hrzolix 4
  • How to accelerate the HHBLITS step with GPU

    How to accelerate the HHBLITS step with GPU

    Halo! Thanks for your good job! I have some question about this job:

    Q1: Do you Know how to accelerate the HHBLITS step with GPU? image

    Q2: I use --cpu 8 to run jackhmmer but alway just use 2 cpu and I dont know why

    image

    opened by Licko0909 4
  • 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: '  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    2022-01-11 09:19:02.638037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 28422 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:41:00.0, compute capability: 7.0) I0111 09:19:03.171788 47078973446272 model.py:165] Running predict with shape(feat) = {'aatype': (4, 45), 'residue_index': (4, 45), 'seq_length': (4,), 'template_aatype': (4, 4, 45), 'template_all_atom_masks': (4, 4, 45, 37), 'template_all_atom_positions': (4, 4, 45, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 45), 'msa_mask': (4, 508, 45), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 45, 3), 'template_pseudo_beta_mask': (4, 4, 45), 'atom14_atom_exists': (4, 45, 14), 'residx_atom14_to_atom37': (4, 45, 14), 'residx_atom37_to_atom14': (4, 45, 37), 'atom37_atom_exists': (4, 45, 37), 'extra_msa': (4, 5120, 45), 'extra_msa_mask': (4, 5120, 45), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 45), 'true_msa': (4, 508, 45), 'extra_has_deletion': (4, 5120, 45), 'extra_deletion_value': (4, 5120, 45), 'msa_feat': (4, 508, 45, 49), 'target_feat': (4, 45, 22)} 2022-01-11 09:19:03.503247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Running ptxas --version returned 32512 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

    Thread 0x00002ad16d7d1880 (most recent call first): File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 360 in backend_compile File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 297 in xla_primitive_callable File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 179 in cached File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 186 in wrapper File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 248 in apply_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 603 in process_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 264 in bind File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 382 in shift_right_logical File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 75 in PRNGKey File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/model.py", line 167 in predict File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 210 in predict_structure File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 429 in main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455 in ./run_alphafold.sh: line 233: 7015 Aborted python $alphafold_script --fasta_paths=$fasta_path --model_names=$model_selection --data_dir=$data_dir --output_dir=$output_dir --jackhmmer_binary_path=$jackhmmer_binary_path --hhblits_binary_path=$hhblits_binary_path --hhsearch_binary_path=$hhsearch_binary_path --hmmsearch_binary_path=$hmmsearch_binary_path --hmmbuild_binary_path=$hmmbuild_binary_path --kalign_binary_path=$kalign_binary_path --uniref90_database_path=$uniref90_database_path --mgnify_database_path=$mgnify_database_path --bfd_database_path=$bfd_database_path --small_bfd_database_path=$small_bfd_database_path --uniclust30_database_path=$uniclust30_database_path --uniprot_database_path=$uniprot_database_path --pdb70_database_path=$pdb70_database_path --pdb_seqres_database_path=$pdb_seqres_database_path --template_mmcif_dir=$template_mmcif_dir --max_template_date=$max_template_date --obsolete_pdbs_path=$obsolete_pdbs_path --db_preset=$db_preset --model_preset=$model_preset --benchmark=$benchmark --amber_relaxation=$amber_relaxation --recycling=$recycling --run_feature=$run_feature --logtostderr

    opened by chenshixinnb 3
  • ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

    根据您的步骤安装conda环境, 在conda环境中执行:import jax; print(jax.devices()) 报错:ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74. 请问如何解决呢,谢谢!

    opened by chenshixinnb 3
  • somthing wrong occured when I run the job

    somthing wrong occured when I run the job

    hi,dear author , I installed the required modules according to the link requirements, but the following error occurred when I was running the script. Can you help me find out what is causing it? My installation steps are as follows: 1、conda create --prefix=/data/home/zhoujy/run/alphafold2 python=3.8 2、conda activate /data/home/zhoujy/run/alphafold2 3、conda install cudatoolkit=10.1 cudnn 4、pip install tensorflow==2.3.0 5、pip install biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 6、pip install --upgrade jax jaxlib==0.1.69+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

    and then , I run the script:

    #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Tb927.10.2950.fasta -t 2021-07-27

    result show as follows: Traceback (most recent call last): File "/data/run01/zhoujy/ParallelFold-main/run_feature.py", line 33, in from alphafold.model import data File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/data.py", line 20, in from alphafold.model import utils File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/utils.py", line 21, in import haiku as hk File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/init.py", line 17, in from haiku import data_structures File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/data_structures.py", line 17, in from haiku._src.data_structures import to_immutable_dict File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/data_structures.py", line 30, in from haiku._src import utils File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/utils.py", line 24, in import jax File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/init.py", line 16, in from .api import ( File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/api.py", line 38, in from . import core File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 31, in from . import dtypes File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/dtypes.py", line 31, in from .lib import xla_client File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/lib/init.py", line 51, in from jaxlib import pytree ImportError: cannot import name 'pytree' from 'jaxlib' (/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jaxlib/init.py)

    why ? I need you help

    opened by zhoujingyu13687306871 2
  • Limit RAM usage

    Limit RAM usage

    Im trying to run a fasta file with 3643 in length. MSA part was done, but the inference part tried to allocate 80 GB of VRAM on GPU which I dont have access to, Graphic cards are NVIDIA Tesla V100 16 GB. Now im trying to run inference on CPU which is a very slow process, and the job keeps using a lot of RAM and expand the usage as the time passes. Can I limit usage of RAM somehow? Or can I run inference on more graphic cards maybe with parallel process?

    opened by hrzolix 1
  • GPU利用率问题

    GPU利用率问题

    博士好!我昨天进行多次尝试后,现在可以运行了,但是我发现运行run_alphafold.sh脚本的时候,涉及GPU计算部分,在相当长的一段时间处于CPU运行状态,GPU利用率长时间为0,我尝试计算一条序列长为2000的蛋白质,用了4个V100的卡,计算了9天,这个速度和情况这个是否正常呢?另外前面在安装tensorflow阶段,是否有必要安装GPU版的tensorflow呢?

    @Zuricho

    opened by zhoujingyu13687306871 1
  • Error after GPU part

    Error after GPU part

    Hi, after installation the "CPU part" (jackhammer and hhblits) work well. But when i start the gpu part, i've got this error message: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    1st part: ./run_feature.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27 2st part: ./run_alphafold.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27

    Full error message: File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 411, in cache_miss out_flat = xla.xla_call( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1618, in bind return call_bind(self, fun, *args, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1609, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1621, in process return trace.process_call(self, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 615, in process_call return primitive.impl(f, *tracers, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 622, in _xla_call_impl compiled_fun = _xla_callable(fun, device, backend, name, donated_invars, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun ans = call(fun, *args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 694, in _xla_callable return lower_xla_callable(fun, device, backend, name, donated_invars, *arg_specs).compile().unsafe_call File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 702, in lower_xla_callable jaxpr, out_avals, consts = pe.trace_to_jaxpr_final( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1522, in trace_to_jaxpr_final jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1500, in trace_to_subjaxpr_dynamic ans = fun.call_wrapped(*in_tracers) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 166, in call_wrapped ans = self.f(*args, **dict(self.params, **kwargs)) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) jax._src.traceback_util.UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.


    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

    opened by ebettler 1
  • Running ParallelFold on reduced database?

    Running ParallelFold on reduced database?

    Is it possible to run ParallelFold on reduced_dbs, or is it not yet supported? I tried to use -c reduced_dbs but it did not work. Then I tried modifying the bfd_path set in run_alphafold.sh, somehow it threw directory/file cannot found error. (I'm pretty sure it's there bc I'm able to run alphafold using it). Thank you for your help in advance!

    opened by xinyu-g 0
  • Is CPU acceleration failed?

    Is CPU acceleration failed?

    Last day, I make some experiments in a Server to run the ./run_alphafold.sh -d /dataset/ -o result -p monomer -m model_2 -i input/T1061.fasta and I read the log, confused, the T1061 is 949AA. ` I0822 07:33:00.806264 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpxbrk9wt6/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//uniref90/uniref90.fasta" I0822 07:33:01.157015 140553952322112 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0822 07:37:27.058227 140553952322112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.901 seconds I0822 07:37:27.072012 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpnn6am537/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//mgnify/mgy_clusters_2018_12.fa" I0822 07:37:27.439405 140553952322112 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query I0822 07:42:42.192071 140553952322112 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 314.752 seconds I0822 07:42:42.364272 140553952322112 hhsearch.py:85] Launching subprocess "/opt/conda/bin/hhsearch -i /tmp/tmpog4q4684/query.a3m -o /tmp/tmpog40/pdb70" I0822 07:42:42.712445 140553952322112 utils.py:36] Started HHsearch query I0822 07:44:18.199999 140553952322112 utils.py:40] Finished HHsearch query in 95.487 seconds I0822 07:44:18.555797 140553952322112 hhblits.py:128] Launching subprocess "/opt/conda/bin/hhblits -i input/T1061.fasta -cpu 4 -oa3m /tmp/tmpz9oq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dataset//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_optst30_2018_08" I0822 07:44:19.050110 140553952322112 utils.py:36] Started HHblits query

    I0822 09:01:02.278290 140553952322112 utils.py:40] Finished HHblits query in 4603.228 seconds ` feature extraction spend time: 5305.185729026794 feature extraction Completed succesfully

    I print the feature extraction time, find that , the 5305 is almost equals to the sum of each db search time, but according to the article, I think the feature extraction spend time should be almost equal to HHblits search, so can you explain the confusing problem?

    opened by yanchenmochen 3
  • failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

    When I use the code to compute T1050.fasta, which is composed of 700 residuses, the command line output the problem。 The Environment is GPU: A100, Ubuntu,but I use higher version jax and jaxlib, is it the problem causing this?

    (parafold) root@node33-a100:~# pip list | grep jax jax 0.3.15 jaxlib 0.3.15+cuda11.cudnn82

    opened by yanchenmochen 3
  • Too many command-line arguments

    Too many command-line arguments

    Hi,

    First of all, thanks for developing this tool, I'm looking forward to playing with it!

    I installed the ParallelFold into a Ubuntu 18 machine, and the full alphafold database into an external drive.

    When running the command: $ ./run_alphafold.sh -d /media/qhr/"My Passport"/alphafold/AlphaFold_DB -o output -p monomer_ptm -i input/GA98.fasta -m model_1 -f

    I get the Error: Too many command-line arguments.

    Also get the same error by calling directly to run_alphafold.py: $ python3 run_alphafold.py --fasta_paths=input/GA98.fasta --model_preset=monomer --data_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB --output_dir=output --uniref90_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/uniref90 --mgnify_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/mgnify --template_mmcif_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif --obsolete_pdbs_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif/obsolete.dat --use_gpu_relax=True bfd_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/bfd --max_template_date=2020-05-14

    Is it possible that the space in the name of the external drive "My Passport" is causing such error?

    Thanks! Ana

    opened by AnaValero 1
  • Alphafold2 v/s Parafold timings

    Alphafold2 v/s Parafold timings

    I have a fundamental doubt about the difference between Alphafold2 and Parafold running procedure, how to determine whether Parafold is doing Parallel task unlike sequential tasks performed by Alphafold2 for the first step involving Jackhmmer, Jackhmmer and HHblits searches.

    Snippets of log files obtained from running Alphafold2 and Parafold

    Alphafold2 log:

    I0409 14:04:28.020900 139865793787712 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0409 14:04:28.021180 139865793787712 run_alphafold.py:393] Using random seed 1420247507508611084 for the data pipeline
    I0409 14:04:28.021463 139865793787712 run_alphafold.py:161] Predicting seq1
    I0409 14:04:28.037414 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpm1u84thu/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1/fasta_files/seq1.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0409 14:04:28.111756 139865793787712 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0409 14:10:17.276236 139865793787712 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 349.164 seconds
    I0409 14:10:17.462168 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpub1qi595/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq1.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0409 14:10:17.513182 139865793787712 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0409 14:16:32.112656 139865793787712 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 374.599 seconds
    I0409 14:16:33.369129 139865793787712 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpyot74k7r/query.a3m -o /tmp/tmpyot74k7r/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0409 14:16:33.466009 139865793787712 utils.py:36] Started HHsearch query
    I0409 14:22:32.148045 139865793787712 utils.py:40] Finished HHsearch query in 358.682 seconds
    I0409 14:22:32.838686 139865793787712 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq1.fasta -cpu 4 -oa3m /tmp/tmpedyoxta1/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0409 14:22:32.926801 139865793787712 utils.py:36] Started HHblits query
    I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds
    

    Parafold log:

    I0427 21:17:27.915049 140305630689088 run_alphafold.py:397] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
    I0427 21:17:27.915312 140305630689088 run_alphafold.py:414] Using random seed 1534697036303804749 for the data pipeline
    I0427 21:17:27.915629 140305630689088 run_alphafold.py:165] Predicting seq2
    I0427 21:17:27.925500 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp5fo28348/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//uniref90/uniref90.fasta"
    I0427 21:17:27.996705 140305630689088 utils.py:36] Started Jackhmmer (uniref90.fasta) query
    I0427 21:23:54.643056 140305630689088 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 386.646 seconds
    I0427 21:23:54.829476 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmprs3za6w_/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
    I0427 21:23:54.875119 140305630689088 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
    I0427 21:31:38.409492 140305630689088 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 463.534 seconds
    I0427 21:31:39.768360 140305630689088 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpjgr58ebb/query.a3m -o /tmp/tmpjgr58ebb/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
    I0427 21:31:39.850885 140305630689088 utils.py:36] Started HHsearch query
    I0427 21:39:23.420352 140305630689088 utils.py:40] Finished HHsearch query in 463.569 seconds
    I0427 21:39:24.173583 140305630689088 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq2.fasta -cpu 4 -oa3m /tmp/tmpmzl5arhr/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
    I0427 21:39:24.259592 140305630689088 utils.py:36] Started HHblits query
    I0428 01:34:31.302148 140305630689088 utils.py:40] Finished HHblits query in 14107.042 seconds
    

    They look similar to me, and both use 8cpus, 8cpus, and 4cpus, respectively. Please clarify this for me.

    Thank you Aditi

    opened by adi1bioinfo 0
  • An error in feature generation

    An error in feature generation

    Hi, When I used your new version to make fearure.pkl, this error occurred, could you give any advice on how to solve it?

    FATAL Flags parsing error: Unknown command line flag 'model_names'. Did you mean: model_preset ? Pass --helpshort or --helpfull to see help on flags.

    opened by YiningWang2 1
Releases(v1.1)
Owner
Bozitao Zhong
Protein Design
Bozitao Zhong
Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Divide and Remaster Utility Tools Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper The DnR d

Darius Petermann 46 Dec 11, 2022
Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

Yasunori Shimura 12 Nov 9, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

Manas Sharma 19 Feb 28, 2022
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Karoo GP Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and

Kai Staats 149 Jan 9, 2023
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 2, 2022
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.

Poisson Image Editing - A Parallel Implementation Jiayi Weng (jiayiwen), Zixu Chen (zixuc) Poisson Image Editing is a technique that can fuse two imag

Jiayi Weng 110 Dec 27, 2022
Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

Chenxu Peng 26 Dec 3, 2022
Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Graph Transformer - Pytorch Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2. This was recently used by bot

Phil Wang 97 Dec 28, 2022
Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Invariant Point Attention - Pytorch Implementation of Invariant Point Attention as a standalone module, which was used in the structure module of Alph

Phil Wang 113 Jan 5, 2023
Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, as a standalone package for Pytorch

Triangle Multiplicative Module - Pytorch Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or c

Phil Wang 22 Oct 28, 2022
Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

GVP Transformer (wip) Implementation of the GVP-Transformer, which was used in the paper Learning inverse folding from millions of predicted structure

Phil Wang 19 May 6, 2022
GrabGpu_py: a scripts for grab gpu when gpu is free

GrabGpu_py a scripts for grab gpu when gpu is free. WaitCondition: gpu_memory >

tianyuluan 3 Jun 18, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation (ICCV2021)

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation This is a pytorch project for the paper Dynamic Divide-and-Conquer Ad

DV Lab 29 Nov 21, 2022
Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

HV-plane reconstruction from a single 360 image Code for our paper in CVPR 2021: Indoor Panorama Planar 3D Reconstruction via Divide and Conquer (pape

sunset 36 Jan 3, 2023
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

null 55 Nov 23, 2022
N-Person-Check-Checker-Splitter - A calculator app use to divide checks

N-Person-Check-Checker-Splitter This is my from-scratch programmed calculator ap

null 2 Feb 15, 2022