SARS-Cov-2 Recombinant Finder for fasta sequences

Overview

Sc2rf - SARS-Cov-2 Recombinant Finder

Pronounced: Scarf

What's this?

Sc2rf can search genome sequences of SARS-CoV-2 for potential recombinants - new virus lineages that have (partial) genes from more than one parent lineage.

Is it already usable?

This is a very young project, started on March 5th, 2022. As such, proceed with care. Results may be wrong or misleading, and with every update, anything can still change a lot.

Anyway, I'm happy that scientists are already seeing benefits from Sc2rf and using it to prepare lineage proposals for cov-lineages/pango-designation.

Though I already have a lot of ideas and plans for Sc2rf (see at the bottom of this document), I'm very open for suggestions and feature requests. Please write an issue, start a discussion or get in touch via mail or twitter!

Example output

Screenshot of the terminal output of Sc2rf

Requirements and Installation

You need at least Python 3.6 and you need to install the requirements first. You might use something like python3 -m pip install -r requirements.txt to do that. There's a setup.py which you should probably ignore, since it's work in progress and does not work as intented yet.

Also, you need a terminal which supports ANSI control sequences to display colored text. On Linux, MacOS, etc. it should probably work.

On Windows, color support is tricky. On a recent version of Windows 10, it should work, but if it doesn't, install Windows Terminal from GitHub or Microsoft Store and run it from there.

Basic Usage

Start with a .fasta file with one or more sequences which might contain recombinants. Your sequences have to be aligned to the reference.fasta. If they are not, you will get an error message like:

Sequence hCoV-19/Phantasialand/EFWEFWD not properly aligned, length is 29718 instead of 29903.

(For historical reasons, I always used Nextclade to get aligned sequences, but you might also use Nextalign or any other tool. Installing them is easy on Linux or MacOS, but not on Windows. You can also use a web-based tool like MAFFT.)

Then call:

sc2rf.py <your_filename.fasta>

If you just need some fasta files for testing, you can search the pango-lineage proposals for recombinant issues with fasta-files, or take some files from my shared-sequences repository, which might not contain any actual recombinants, but hundreds of sequences that look like they were!

No output / some sequences not shown

By default, a lot filters are active to show only the likely recombinants, so that you can input 10000s of sequences and just get output for the interesting ones. If you want, you can disable all filters like that, which is only recommended for small input files with less than 100 sequences:

sc2rf.py --parents 1-35 --breakpoints 0-100 \
--unique 1 --max-ambiguous 10000 <your_filename.fasta>

or even

sc2rf.py --parents 1-35 --breakpoints 0-100 \
--unique 1 --max-ambiguous 10000 --force-all-parents \
--clades all <your_filename.fasta>

The meaning of these parameters is described below.

Advanced Usage

You can execute sc2rf.py -h to get excactly this help message:

usage: sc2rf.py [-h] [--primers [PRIMER ...]]
                [--primer-intervals [INTERVAL ...]]
                [--parents INTERVAL] [--breakpoints INTERVAL]
                [--clades [CLADES ...]] [--unique NUM]
                [--max-intermission-length NUM]
                [--max-intermission-count NUM]
                [--max-name-length NUM] [--max-ambiguous NUM]
                [--force-all-parents]
                [--select-sequences INTERVAL]
                [--enable-deletions] [--show-private-mutations]
                [--rebuild-examples] [--mutation-threshold NUM]
                [--add-spaces [NUM]] [--sort-by-id [NUM]]
                [--verbose] [--ansi] [--hide-progress]
                [--csvfile CSVFILE]
                [input ...]

Analyse SARS-CoV-2 sequences for potential, unknown recombinant
variants.

positional arguments:
  input                 input sequence(s) to test, as aligned
                        .fasta file(s) (default: None)

optional arguments:
  -h, --help            show this help message and exit

  --primers [PRIMER ...]
                        Filenames of primer set(s) to visualize.
                        The .bed formats for ARTIC and EasySeq
                        are recognized and supported. (default:
                        None)

  --primer-intervals [INTERVAL ...]
                        Coordinate intervals in which to
                        visualize primers. (default: None)

  --parents INTERVAL, -p INTERVAL
                        Allowed number of potential parents of a
                        recombinant. (default: 2-4)

  --breakpoints INTERVAL, -b INTERVAL
                        Allowed number of breakpoints in a
                        recombinant. (default: 1-4)

  --clades [CLADES ...], -c [CLADES ...]
                        List of variants which are considered as
                        potential parents. Use Nextstrain clades
                        (like "21B"), or Pango Lineages (like
                        "B.1.617.1") or both. Also accepts "all".
                        (default: ['20I', '20H', '20J', '21I',
                        '21J', 'BA.1', 'BA.2', 'BA.3'])

  --unique NUM, -u NUM  Minimum of substitutions in a sample
                        which are unique to a potential parent
                        clade, so that the clade will be
                        considered. (default: 2)

  --max-intermission-length NUM, -l NUM
                        The maximum length of an intermission in
                        consecutive substitutions. Intermissions
                        are stretches to be ignored when counting
                        breakpoints. (default: 2)

  --max-intermission-count NUM, -i NUM
                        The maximum number of intermissions which
                        will be ignored. Surplus intermissions
                        count towards the number of breakpoints.
                        (default: 8)

  --max-name-length NUM, -n NUM
                        Only show up to NUM characters of sample
                        names. (default: 30)

  --max-ambiguous NUM, -a NUM
                        Maximum number of ambiguous nucs in a
                        sample before it gets ignored. (default:
                        50)

  --force-all-parents, -f
                        Force to consider all clades as potential
                        parents for all sequences. Only useful
                        for debugging.

  --select-sequences INTERVAL, -s INTERVAL
                        Use only a specific range of input
                        sequences. DOES NOT YET WORK WITH
                        MULTIPLE INPUT FILES. (default: 0-999999)

  --enable-deletions, -d
                        Include deletions in lineage comparision.

  --show-private-mutations
                        Display mutations which are not in any of
                        the potential parental clades.

  --rebuild-examples, -r
                        Rebuild the mutations in examples by
                        querying cov-spectrum.org.

  --mutation-threshold NUM, -t NUM
                        Consider mutations with a prevalence of
                        at least NUM as mandatory for a clade
                        (range 0.05 - 1.0, default: 0.75).

  --add-spaces [NUM]    Add spaces between every N colums, which
                        makes it easier to keep your eye at a
                        fixed place. (default without flag: 0,
                        default with flag: 5)

  --sort-by-id [NUM]    Sort the input sequences by the ID. If
                        you provide NUM, only the first NUM
                        characters are considered. Useful if this
                        correlates with meaning full meta
                        information, e.g. the sequencing lab.
                        (default without flag: 0, default with
                        flag: 999)

  --verbose, -v         Print some more information, mostly
                        useful for debugging.

  --ansi                Use only ASCII characters to be
                        compatible with ansilove.

  --hide-progress       Don't show progress bars during long
                        task.

  --csvfile CSVFILE     Path to write results in CSV format.
                        (default: None)

An Interval can be a single number ("3"), a closed interval
("2-5" ) or an open one ("4-" or "-7"). The limits are inclusive.
Only positive numbers are supported.

Interpreting the output

To be written...

There already is a short Twitter thread which explains the basics.

Source material attribution

  • virus_properties.json contains data from LAPIS / cov-spectrum which uses data from NCBI GenBank, prepared and hosted by Nextstrain, see blog post.
  • reference.fasta is taken from Nextstrain's nextclade_data, see NCBI for attribution.
  • mapping.csv is a modified version of the table on the covariants homepage by Nextstrain.
  • Example output / screenshot based on Sequences published by the German Robert-Koch-Institut.
  • Primers:
    • ARTIC primers CC-BY-4.0 by the ARTICnetwork project
    • EasySeq primers by Coolen, J. P., Wolters, F., Tostmann, A., van Groningen, L. F., Bleeker-Rovers, C. P., Tan, E. C., ... & Melchers, W. J. Removed until I understand the format if the .bed file. There will be an issue soon.
    • midnight primers CC-BY-4.0 by Silander, Olin K, Massey University

The initial version of this program was written in cooperation with @flauschzelle.

TODO / IDEAS / PLANS

  • Move these TODOs into actual issues
  • add disclaimer and link to pango-designation
  • provide a sample file (maybe both .fasta and .csv, as long as the csv step is still needed)
  • accept aligned fasta
    • as input file
    • as piped stream
  • If we still accept csv/ssv input, autodetect the delimiter either by file name or by analysing the first line
  • find a way to handle already designated recombinant lineages
  • Output structured results
    • csv
    • html?
    • fasta of all sequences that match the criteria, which enables efficient multi-pass strategies
  • filter sequences
    • by ID
    • by metadata
  • take metadata csv
  • document the output in README
  • check / fix --enabled-deletions
  • adjustable threshold for mutation prevalence
  • new color mode (with background color and monochrome text on top)
  • new bar mode (with colored lines beneath each sequence, one for each example sequence, and "intermissions" shown in the color of the "surrounding" lineage, but not as bright)
  • interactive mode, for filtering, reordering, etc.
  • sort sequences within each block
  • re-think this whole "intermission" concept
  • select a single sequence and let the tool refine the choice of parental sequences, not just focusing on commonly known lineages (going up and down in the tree)
  • use more common terms to describe things (needs feedback from people with actual experience in the field)
Comments
  • ENH: provide output optionally as csv/tsv for automated analysis/sharing

    ENH: provide output optionally as csv/tsv for automated analysis/sharing

    Right now the output is good for interactive human analysis, but there's a lack of csv/tsv machine readable output for sharing/further analysis.

    From my experience with Nextclade, main difficulty here is the design of the specs of the file, which columns to include etc, which separators to take if you need an intra-column separator etc.

    Maybe best to discuss on this issue before implementing something as one will kind of get locked in to the format.

    opened by corneliusroemer 28
  • Find and use better source for typical mutations of lineages

    Find and use better source for typical mutations of lineages

    See this comment by @AngieHinrichs which even contains an alternative.

    Thanks a lot for your detailed explanation! I'm trying to move this over here so it's easier to find for me.

    (Also, if the comment thread over at pange-designation gets locked down after too many "off topic" comments, I won't be able to comment there at all. Already happened in other issues.)

    opened by lenaschimmel 14
  • BUG: Problem using covSpectrum mutation share - Ns are treated as reference

    BUG: Problem using covSpectrum mutation share - Ns are treated as reference

    There's a bit of a problem with using covSpectrum's current mutation API implementation: Ns in any sample is treated as reference.

    This can cause confusion. For example, I thought that this intermission here within Spike was a bad sign: image https://github.com/cov-lineages/pango-designation/issues/498

    But it isn't! Both 22813 and 22882 are defining for both BA.1 and BA.2. However, both are apparently N in 40% of sequences in BA.1. Causing sc2rf to think that it's in fact not a defining mutation in BA.1 making spurious intermissions appear.

    I'm not sure how to work around this best. Really, this should be fixed in covSpectrum: Ns should be left out of mutation proportion calculations - and not be treated as reference (implicitly).

    @chaoran-chen can you think of a workaround? How can one get the share of Ns for a query? Could that maybe be supplied by a new API endpoint?

    Usually, Ns don't make up 40% of a site, but sometimes they do and that can cause problems like here, where one falsely thinks there's a non-clean breakpoint.

    opened by corneliusroemer 10
  • Way to pipe results to png, txt files

    Way to pipe results to png, txt files

    This is a fantastic tool, and I've already put it to good use in Arkansas to research some strange lineages. Great work!

    I do have to share the visuals, and I an wondering if there is a way to pipe the results to an outside file, such as png or txt. I am more of an applied researcher, so if I missed something, I would appreciate any directions.

    Again, great tool already!

    Thanks,

    opened by bdelavan 7
  • Q: Why show all donors not just the relevant ones?

    Q: Why show all donors not just the relevant ones?

    I'm analyzing one sequence and am wondering why you output all potential donors/parents, not just the two that seem most relevant here: BA.1/21J?

    image

    Are my arguments wrong? When I reduce parents to 0-5, I get not output which is weird. Don't quite understand what's going on here.

    opened by corneliusroemer 6
  • Crash related to tdqm

    Crash related to tdqm

    Originally posted by @Vjimenez-vasquez in https://github.com/lenaschimmel/sc2rf/issues/25#issuecomment-1089053922:

    Hi there,

    I ran the following command :

    python3 sc2rf.py test2.fasta --unique 1
    

    And got the following message :

    Traceback (most recent call last):
      File "sc2rf.py", line 987, in <module>
        main()
      File "sc2rf.py", line 132, in main
        reference = read_fasta('reference.fasta', None)['MN908947 (Wuhan-Hu-1/2019)']
      File "sc2rf.py", line 476, in read_fasta
        with my_tqdm(total=os.stat(path).st_size, desc="Read " + path, unit_scale=True) as pbar:
      File "sc2rf.py", line 199, in my_tqdm
        return tqdm(*margs, delay=0.1, colour="green", disable=bool(args.hide_progress), **kwargs)
      File "/home/hp/anaconda3/lib/python3.7/site-packages/tqdm/std.py", line 922, in __init__
        TqdmKeyError("Unknown argument(s): " + str(kwargs)))
    tqdm.std.TqdmKeyError: "Unknown argument(s): {'delay': 0.1, 'colour': 'green'}"
    

    Do you have any suggestion, please ?

    opened by lenaschimmel 4
  • Make tool pip-installable

    Make tool pip-installable

    Shouldn't be difficult, you need a setup.py and account of Pypi

    You can have a look at this repo of mine that can be installed via Pypi as a command line tool (if you install it, the command becomes automatically available in Path!) https://github.com/corneliusroemer/fasta_zstd_sqlite/blob/master/setup.py

    opened by corneliusroemer 4
  • --csvfile option does not work

    --csvfile option does not work

    Hey!

    First of all, great tool to find the potential recombinants. Made my life easy. I needed to parse the output of sc2rf only to get the potential recombinant sequences and the breakpoints of it. I see the --csvfile option in the README. But, it must not have been included in the sc2rf python executable. I get this error.

    sc2rf.py: error: unrecognized arguments: --csvfile output.csv

    Any idea if I could get the ouput in the way I need?

    opened by think-o 2
  • ENH: show progress bar, say how many files were read in, how processing is going

    ENH: show progress bar, say how many files were read in, how processing is going

    Would be nice to see how things are going

    tqdm makes this very easy with python

    a bit more logging while the analysis is going would be cool too, just so that I know what's going on, instead of seeing nothing for a minute

    opened by corneliusroemer 2
  • Python version requirement 3.9

    Python version requirement 3.9

    Thanks for the tool.

    Just had a quick note that I think Python 3.9 is required due to the | operator in dict.

    I was getting an error before trying it with 3.9.

    opened by benkraj 2
  • TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

    TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

    Getting this error while trying to run the program:

    Reading reference genome, lineage definitions... Done. Reading actual input. Traceback (most recent call last): File "search_recombinants.py", line 539, in <module> main() File "search_recombinants.py", line 96, in main all_samples = all_samples | read_samples TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

    opened by arodzh-sudo 1
  • Bug/question with --force-all-parents --clades all

    Bug/question with --force-all-parents --clades all

    Hi there,

    I just was wondering why I have no output and tried the second example from here: https://github.com/lenaschimmel/sc2rf#no-output--some-sequences-not-shown

    So I added --clades all --force-all-parent to my call, but it seems that they can't be used both:

    The number of allowed parents, the number of selected clades, and the --force-all-parents conflict so that the results must be empty.

    Also, --clades all can't be used as the last argument (before the input) because the input won't be recognized

    Input sequences must be provided, except when rebuilding the examples. Use --help for more info. Program exits.

    I'm not sure if this is only my setup/input problem.


    Would you suggest to use -c all or -f? My full command is

      python3 sc2rf.py --csvfile ../${name}_sc2rf.csv --parents 1-35 --breakpoints 1-2 \
                          --max-intermission-count 3 --max-intermission-length 1 \
                          --unique 1 --max-ambiguous 10000 --max-name-length 55 \
                          ### --clades all  --force-all-parents  \ ###
                          ../${fasta}
    

    Best Marie

    opened by MarieLataretu 3
  • Bridging the gap between sc2rf result and Pangolin X* lineages

    Bridging the gap between sc2rf result and Pangolin X* lineages

    First, thanks to the authors for bringing the useful tool for us.

    We have been using sc2rf to scan for recombinant sequences and determine breakpoint, but i found from the result to the Pangolin X* lineage calls there is a gap. I was wondering whether it is possible to bridge the gap by: 1. take in the lineage designation from Pangolin X* lineages, scan and store the profiles for each of the recombinant lineages; 2. for a new query sequence, if the breakpoint profile matches existing Pangolin X* lineages, in the result not just suggest the parent lineages and breakpoint, provide a possible X* lineage call as well. More or less in the way of how the Scorpio Constellation works.

    I expect this would be a more accurate way of assigning recombinant lineages than the current UShER calls, where the breakpoint positions may not match.

    Thanks for considering the suggestion.

    opened by bioinforME 0
  • GISAID XT recombinant not detected by sc2rf

    GISAID XT recombinant not detected by sc2rf

    Hi, I've noticed that sc2rf.py (version sc2rf-7427d2f94b69c965362034c2597b643c5dfaa1cf) could not find any recombination for XT samples available on GISAID python sc2rf.py nextclade.aligned_XT_Gisaid.fasta. Here are the available aligned sequences. nextclade.aligned_XT_Gisaid.txt

    Nextclade: image sc2rf: image

    Thanks for looking into this and other lineages that might be in the same situation.

    opened by BenjaminDelisle 4
  • Option to ignore shared substitutions

    Option to ignore shared substitutions

    • I've been experimenting with a flag --ignore-shared that ignores positions that are shared (have the exact same nucleotide) across all parents/examples.
    • I like this option because it makes the breakpoints visually clearer, as there's a direct color change (red -> green) rather than having the intermediate shared positions (red -> white -> green)
    • For testing, a nextclade fasta alignment of XM-like recombinants (public on genbank): XM.txt
    1. Do you think this is scientifically sound for reporting? And if so,
    2. Would you be interested in a PR if I tidy up the code?

    Default Output:

    python3 sc2rf.py XM.fasta --ansi --unique 1
    

    image

    Proposed Option:

    python3 sc2rf.py XM.fasta --ansi --unique 1 --ignore-shared
    

    image

    opened by ktmeaton 0
  • Terminal Ns not recognized as missing

    Terminal Ns not recognized as missing

    While investigating https://github.com/cov-lineages/pango-designation/issues/590, I noticed that samples with the BA.2 S2M deletion (29734:29759) were being incorrectly visualized as having reference bases in sc2rf:

    Consensus View: image

    sc2rf View: image

    I think this could be for a couple of reasons:

    1. When --enable-deletions is used, perhaps deletions should not be considered missing data?

      missings_matches = ["N"]
      if not args.enable_deletions:
          missings_matches.append("-")
      
    2. I think there is missing logic when detecting a run of Ns, to catch if that runs proceeds to the end of the genome?

      if s in missings_matches:
          # we've been tracking a run of N's, this base marks the end              
          if start_n == -1:
              start_n = i  # mark the start of possible run of N's
      elif start_n >= 0:
          missings.append((start_n, i-1))  # Python-style (closed, open) interval
          start_n = -1
      
      # Missing logic to catch missing data at the end of the genome?
      if i == len(reference) and s in missings_matches:
          missings.append((start_n, i-1))
      

    With these changes, the sc2rf output more closely matches the consensus sequence/my expectation:

    image

    I think this is a bug, but if it's the intended behaviour for deletions, please let me know. Thanks!

    opened by ktmeaton 1
Owner
Lena Schimmel
Lena Schimmel
Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

Layne_Huang 7 Nov 14, 2022
Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

Jesse Bloom 4 Feb 9, 2022
A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

A script written in Python that returns a consensus string and profile matrix of a given DNA string(s) in FASTA format.

Zain 1 Feb 1, 2022
Campsite Reservation Finder

yellowstone-camping UPDATE: yellowstone-camping is being expanded and renamed to camply. The updated tool now interfaces with the Recreation.gov API a

Justin Flannery 233 Jan 8, 2023
Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences Introduction Point cloud sequences are irregular and unordered in the spatial dimen

Hehe Fan 63 Dec 9, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 3, 2023
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences 1. Introduction This project is for paper Model-free Vehicle Tracking and St

TuSimple 92 Jan 3, 2023
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Gabriele Corso 56 Dec 23, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

李优 0 Mar 25, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

DeepGeneAnnotator: A tool to annotate the gene in the genome The master thesis of the "Using deep learning to predict gene structures of the coding ge

Ching-Tien Wang 3 Sep 9, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-si

Dmitri Babaev 103 Dec 17, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

Layne_Huang 7 Nov 14, 2022
🦠 A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak.

?? A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak. It's written in python using the ?? FastAPI framework. Supports multiple sources!

Marius 1.6k Jan 4, 2023
Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

Jesse Bloom 4 Feb 9, 2022
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Olga Tsiouri 1 Jan 23, 2022