polyprotein_stats
A small tool to test and visualize protein embeddings and amino acid proportions.
Currently deployed on streamlit.io.
Given a set of proteins, this tool seeks to answer these questions:
- is there a enrichment of one or two amino acids?
- can amino acid proportions, plus sequence length discriminate the input proteins from the rest of the proteome?
- can the averaged per residue embeddings from a deep neural network discriminate the input proteins from the rest of the proteome?
Embeddings for the human proteins is from the ProtT5 embedder at full precision (prottrans_t5_xl_u50 model) that were generated by Christian Dallago and Burkhard Rost (Zenodo). Also, see 'ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning' by Elnaggar et al. biorxiv preprint