Annotates sequences with Eggnog-mapper and hhblits against PDB70

Last update: Apr 5, 2022

Related tags

Miscellaneous Annotating_from_PDB

Overview

Annotating "hypothetical" proteins with the PDB

See config/ for configuration information.

This workflow takes as input a set of protein sequences. It clusters them and then functionally annotates the clusters' representatives using Eggnog DB, and picks those without KO annotations to continue the process. These "hypothetical proteins" get aligned by hhblits against Uniclust30 and then against PDB70.

Testing

Decompress selected_seqs_by_size.tar.gz and use that path in the config file (already set).

To see the commands being executed (-p) without an actual execution of the workflow, use -n. -r prints the "reason" for execution of each rule.

snakemake  --cores 16 -r -p -n

--cores N specify the max. number of cores used by the whole workflow, so if a rule has set more cores, it will use no more than N.

Without the -n the workflow will be executed.

Results

All the results will be placed inside /results. The file all_genes_kos.tsv presents a list of all the genes which have one or more KO terms assigned (the rule propagate_annotations propagates the annotations from the cluster representatives to their members). That file then is used to build a new table, compatible with ko_mapper.py, which will produce 3 files:

{prefix}_module_completeness.tab
{prefix}_heatmap.pdf
{prefix}_barplot.pdf

Others

The rules.pdf represents the DAG of this workflow, but it doesn't include the rules related to hhblits, because those rules depend on a checkpoint rule. This isn't a bug, but it's related to how snakemake works.

You might also like...

Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

guess-the-numbers Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Number guessing game

5 Oct 9, 2021

Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

password-generator Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Password generator

3 Oct 9, 2021

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

2 Nov 27, 2021

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

6k Jan 6, 2023

A free and open-source chess improvement app that combines the power of Lichess and Anki.

Annotates sequences with Eggnog-mapper and hhblits against PDB70

Related tags

Overview

Annotating "hypothetical" proteins with the PDB

Testing

Results

Others

You might also like...

Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

A free and open-source chess improvement app that combines the power of Lichess and Anki.

HPomb Is Socail Engineering Tool , Used For Bombing , Spoofing and Anonymity Available For Linux And Android(Termux)

TickerRain is an open-source web app that stores and analysis Reddit posts in a transparent and semi-interactive manner.

Code and yara rules to detect and analyze Cobalt Strike

Hook and simulate global keyboard events on Windows and Linux.

Owner

BloodCheck enables Red and Blue Teams to manage multiple Neo4j databases and run Cypher queries against a BloodHound dataset.

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Dot Browser is a privacy-conscious web browser with smarts built-in for protection against trackers and advertisments online.

Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check database.

Explore related sequences in the OEIS

Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Check COVID locations of interest against Google location history

A calculator to test numbers against the collatz conjecture

An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.