Annotates sequences with Eggnog-mapper and hhblits against PDB70

Overview

Annotating "hypothetical" proteins with the PDB

See config/ for configuration information.

This workflow takes as input a set of protein sequences. It clusters them and then functionally annotates the clusters' representatives using Eggnog DB, and picks those without KO annotations to continue the process. These "hypothetical proteins" get aligned by hhblits against Uniclust30 and then against PDB70.

Testing

Decompress selected_seqs_by_size.tar.gz and use that path in the config file (already set).

To see the commands being executed (-p) without an actual execution of the workflow, use -n. -r prints the "reason" for execution of each rule.

snakemake  --cores 16 -r -p -n

--cores N specify the max. number of cores used by the whole workflow, so if a rule has set more cores, it will use no more than N.

Without the -n the workflow will be executed.

Results

All the results will be placed inside /results. The file all_genes_kos.tsv presents a list of all the genes which have one or more KO terms assigned (the rule propagate_annotations propagates the annotations from the cluster representatives to their members). That file then is used to build a new table, compatible with ko_mapper.py, which will produce 3 files:

  • {prefix}_module_completeness.tab
  • {prefix}_heatmap.pdf
  • {prefix}_barplot.pdf

Others

The rules.pdf represents the DAG of this workflow, but it doesn't include the rules related to hhblits, because those rules depend on a checkpoint rule. This isn't a bug, but it's related to how snakemake works.

You might also like...
Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls
Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

guess-the-numbers Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Number guessing game

Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls
Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls

password-generator Built with Python programming language and QT library and Guess the number in three easy, medium and hard rolls Password generator

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

A simple script written using symbolic python that takes as input a desired metric and automatically calculates and outputs the Christoffel Pseudo-Tensor, Riemann Curvature Tensor, Ricci Tensor, Scalar Curvature and the Kretschmann Scalar

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

A free and open-source chess improvement app that combines the power of Lichess and Anki.
A free and open-source chess improvement app that combines the power of Lichess and Anki.

A free and open-source chess improvement app that combines the power of Lichess and Anki. Chessli Project Activity & Issue Tracking PyPI Build & Healt

HPomb Is Socail Engineering Tool , Used For Bombing , Spoofing and Anonymity Available For Linux And Android(Termux)
HPomb Is Socail Engineering Tool , Used For Bombing , Spoofing and Anonymity Available For Linux And Android(Termux)

HPomb v2020.02 Coming Soon Created By Secanonm HPomb Is Socail Engineering Tool , Used For Bombing , Spoofing and Anonymity Available For Linux And An

TickerRain is an open-source web app that stores and analysis Reddit posts in a transparent and semi-interactive manner.
TickerRain is an open-source web app that stores and analysis Reddit posts in a transparent and semi-interactive manner.

TickerRain is an open-source web app that stores and analysis Reddit posts in a transparent and semi-interactive manner

Code and yara rules to detect and analyze Cobalt Strike

Cobalt Strike Resources This repository contains: analyze.py: a script to analyze a Cobalt Strike beacon (python analyze.py BEACON) extract.py; extrac

Hook and simulate global keyboard events on Windows and Linux.

keyboard Take full control of your keyboard with this small Python library. Hook global events, register hotkeys, simulate key presses and much more.

Owner
null
BloodCheck enables Red and Blue Teams to manage multiple Neo4j databases and run Cypher queries against a BloodHound dataset.

BloodCheck BloodCheck enables Red and Blue Teams to manage multiple Neo4j databases and run Cypher queries against a BloodHound dataset. Installation

Mr B0b 16 Nov 5, 2021
Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Henry Wilkinson 2 Nov 18, 2022
Dot Browser is a privacy-conscious web browser with smarts built-in for protection against trackers and advertisments online.

?? Take back your privacy with Dot Browser, the privacy-conscious web browser that protects you from being tracked and monitored online.

Dot HQ 1k Jan 7, 2023
Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check database.

DVOF_check_tool Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check d

null 3 Apr 18, 2022
Explore related sequences in the OEIS

OEIS explorer This is a tool for exploring two different kinds of relationships between sequences in the OEIS: mentions (links) of other sequences on

Alex Hall 6 Mar 15, 2022
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Olga Tsiouri 1 Jan 23, 2022
Laurence Billingham 1 Feb 16, 2022
Check COVID locations of interest against Google location history

Location of Interest Checker Script to compare COVID locations of interest to Google location history. The script produces a map plot (as shown below)

null 9 Mar 30, 2022
A calculator to test numbers against the collatz conjecture

The Collatz Calculator This is an algorithm custom built by Kyle Dickey, used to test numbers against the simple rules of the Collatz Conjecture.

Kyle Dickey 2 Jun 14, 2022
An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to art and design.

Awesome AI for Art & Design An awesome list of AI for art and design - resources, and popular datasets and how we may apply computer vision tasks to a

Margaret Maynard-Reid 20 Dec 21, 2022