Description
Summary of phylogenomic methods and analyses used in "Immunogenicity of convalescent and vaccinated sera against clinical isolates of ancestral SARS-CoV-2, Beta, Delta, and Omicron variants"
Methods
Raw reads underwent adapter/quality trimming (trim-galore v0.6.5 [citation: https://github.com/FelixKrueger/TrimGalore]), host filtering and read mapping to reference (bwa v0.7.17 [citation: arXiv:1303.3997v2 ], samtools v.1.7 [citation: 10.1093/bioinformatics/btp352]) trimming of primers (iVar v1.3 [citation:10.1186/s13059-018-1618-7]) and variant/consensus calling (freebayes v1.3.2 [citation: arXiv:1207.3907]) using the SIGNAL workflow (https://github.com/jaleezyy/covid-19-signal) v1.4.4dev (#60dd466) [citation: doi.org/10.3390/v12080895] with the ARTICv4 amplicon scheme (from https://github.com/artic-network/artic-ncov2019) and the MN908947.3 SARS-CoV-2 reference genome and annotations. Additional quality control and variant effect annotation (SnpEff v5.0-0 [citation:0.4161/fly.19695]) was performed using the ncov-tools v1.8.0 (https://github.com/jts/ncov-tools/). Finally, PANGO lineages were assigned to consensus sequences using pangolin v3.1.17 (with the PangoLEARN v2021-12-06 models) [citation:10.1093/ve/veab064], scorpio v0.3.16 (with constellations v0.1.1) [citation: https://github.com/cov-lineages/scorpio], and PANGO-designations v1.2.117 [citation:10.1038/s41564-020-0770-5]. Variants were summarised using PyVCF v0.6.8 [citation:https://github.com/jamescasbon/PyVCF] and pandas v1.2.4 [citation:10.25080/Majora-92bf1922-00a]. Phylogenetic analysis was performed using augur v13.1.0 [citation: 10.21105/joss.02906] with IQTree (v2.2.0beta) [citation:10.1093/molbev/msaa015] and the resulting phylogenetic figure generated using ETE v3.1.2 [citation: 10.1093/molbev/msw046]. Contexual sequences were incorporated into the phylogenetic analysis by using Nexstrain's ingested GISAID metadata and pandas to randomly sample a representative subset of sequences (jointly deposited in NCBI and GISAID) that belonged to lineages observed in Canada (see sequences_used_in_tree_with_acknowledgements.tsv
for metadata and acknowledgements).
File Description
-
20220101_MN01513_WGS114_DEC31SRI_CK_summary_valid_negative_pass_only.tsv
ncov-tools generate QC summary -
sk_variant_summary.ipynb
notebook containing code to summarise variants (tables/variant_percentage_read_support_protein_nonsynonymous_only.tsv
and graphicfigures/intermediate/spike_mutation_table_styled.png
) and subsample representative genomesphlyogeny/seqs/open_context_genomes.fasta
from GISAID (nextstrain ingested fasta and metadata from 2021-12-31:metadata_2021-12-31_17-29.tsv.gz
andsequences_fasta_2022_01_03.tar.xz
) -
genomes/
Consensus sequences generated by FreeBayes via SIGNAL. -
variants/
ncov-tools SnpEff annotated SIGNAL FreeBayes VCFs -
phylogeny
data used to generate annotated phylogeny with augur -
phylogeny/tree.sh
script used to generate phylogeny -
phylogeny/seqs
sequences used for phlyogeny -
phylogeny/data
reference data for phylogeny -
phylogeny/augur
phylogeny and intermediate files -
phlyogeny/viz_tree.py
ete3 based script to generate phylogeny figure (tree.svg
) -
figure
files for generating result plot -
figure/phylo_variant_figure.*
final figure combiningtree.svg
andspike_mutation_table_styled.png
-
figure/intermediate/tree.svg
rendered SVG of phylogeny -
figure/intermediate/spike_mutation_table_styled.png
rendered summary of variants -
tables
set of tables for manuscript -
tables/sequences_used_in_tree_with_acknowledgements.tsv
ncov-ingest metadata with acknowledgements -
tables/variant_percentage_read_support_protein_nonsynonymous_only.tsv
summary of variants