Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

Finlay Maguire

Last update: Jan 6, 2022

Related tags

Deep Learning voc_neutralisation_sc2_phylogenomics

Overview

Description

Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ancestral SARS-CoV-2, Beta, Delta, and Omicron variants"

Methods

Raw reads underwent adapter/quality trimming (trim-galore v0.6.5 [citation: https://github.com/FelixKrueger/TrimGalore]), host filtering and read mapping to reference (bwa v0.7.17 [citation: arXiv:1303.3997v2 ], samtools v.1.7 [citation: 10.1093/bioinformatics/btp352]) trimming of primers (iVar v1.3 [citation:10.1186/s13059-018-1618-7]) and variant/consensus calling (freebayes v1.3.2 [citation: arXiv:1207.3907]) using the SIGNAL workflow (https://github.com/jaleezyy/covid-19-signal) v1.4.4dev (#60dd466) [citation: doi.org/10.3390/v12080895] with the ARTICv4 amplicon scheme (from https://github.com/artic-network/artic-ncov2019) and the MN908947.3 SARS-CoV-2 reference genome and annotations. Additional quality control and variant effect annotation (SnpEff v5.0-0 [citation:0.4161/fly.19695]) was performed using the ncov-tools v1.8.0 (https://github.com/jts/ncov-tools/). Finally, PANGO lineages were assigned to consensus sequences using pangolin v3.1.17 (with the PangoLEARN v2021-12-06 models) [citation:10.1093/ve/veab064], scorpio v0.3.16 (with constellations v0.1.1) [citation: https://github.com/cov-lineages/scorpio], and PANGO-designations v1.2.117 [citation:10.1038/s41564-020-0770-5]. Variants were summarised using PyVCF v0.6.8 [citation:https://github.com/jamescasbon/PyVCF] and pandas v1.2.4 [citation:10.25080/Majora-92bf1922-00a]. Phylogenetic analysis was performed using augur v13.1.0 [citation: 10.21105/joss.02906] with IQTree (v2.2.0beta) [citation:10.1093/molbev/msaa015] and the resulting phylogenetic figure generated using ETE v3.1.2 [citation: 10.1093/molbev/msw046]. Contexual sequences were incorporated into the phylogenetic analysis by using Nexstrain's ingested GISAID metadata and pandas to randomly sample a representative subset of sequences (jointly deposited in NCBI and GISAID) that belonged to lineages observed in Canada (see sequences_used_in_tree_with_acknowledgements.tsv for metadata and acknowledgements).

File Description

20220101_MN01513_WGS114_DEC31SRI_CK_summary_valid_negative_pass_only.tsv ncov-tools generate QC summary
sk_variant_summary.ipynb notebook containing code to summarise variants (tables/variant_percentage_read_support_protein_nonsynonymous_only.tsv and graphic figures/intermediate/spike_mutation_table_styled.png) and subsample representative genomes phlyogeny/seqs/open_context_genomes.fasta from GISAID (nextstrain ingested fasta and metadata from 2021-12-31: metadata_2021-12-31_17-29.tsv.gz and sequences_fasta_2022_01_03.tar.xz)
genomes/ Consensus sequences generated by FreeBayes via SIGNAL.
variants/ ncov-tools SnpEff annotated SIGNAL FreeBayes VCFs
phylogeny data used to generate annotated phylogeny with augur
phylogeny/tree.sh script used to generate phylogeny
phylogeny/seqs sequences used for phlyogeny
phylogeny/data reference data for phylogeny
phylogeny/augur phylogeny and intermediate files
phlyogeny/viz_tree.py ete3 based script to generate phylogeny figure (tree.svg)
figure files for generating result plot
figure/phylo_variant_figure.* final figure combining tree.svg and spike_mutation_table_styled.png
figure/intermediate/tree.svg rendered SVG of phylogeny
figure/intermediate/spike_mutation_table_styled.png rendered summary of variants
tables set of tables for manuscript
tables/sequences_used_in_tree_with_acknowledgements.tsv ncov-ingest metadata with acknowledgements
tables/variant_percentage_read_support_protein_nonsynonymous_only.tsv summary of variants

Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

9 Sep 18, 2022

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

26 Nov 29, 2022

Easily pull telemetry data and create beautiful visualizations for analysis.

This repository is a work in progress. Anything and everything is subject to change. Porpo Table of Contents Porpo Table of Contents General Informati

33 Nov 30, 2022

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

239 Jan 2, 2023

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

17 Aug 13, 2022

TagLab: an image segmentation tool oriented to marine data analysis

TagLab: an image segmentation tool oriented to marine data analysis TagLab was created to support the activity of annotation and extraction of statist

49 Dec 29, 2022

Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

1 Dec 10, 2021

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

Related tags

Overview

Description

Methods

File Description

You might also like...

Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Easily pull telemetry data and create beautiful visualizations for analysis.

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

TagLab: an image segmentation tool oriented to marine data analysis

Deep Learning applied to Integral data analysis

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Releases(v0.1.1)

v0.1.1(Jan 6, 2022)

v0.1.0(Jan 4, 2022)

Owner

Finlay Maguire

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

Txt2Xml tool will help you convert from txt COCO format to VOC xml format in Object Detection Problem.

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Automatically download the cwru data set, and then divide it into training data set and test data set

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

A toolkit for making real world machine learning and data analysis applications in C++

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis