A variant caller for the GBA gene using WGS data

Illumina

Last update: Oct 13, 2022

Related tags

Miscellaneous Gauchian

Overview

Gauchian: WGS-based GBA variant caller

Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Please refer to our preprint for more details about the method.

Running the program

This Python3 program can be run as follows:

python -m gauchian --manifest MANIFEST_FILE \
                   --genome [19/37/38] \
                   --prefix OUTPUT_FILE_PREFIX \
                   --outDir OUTPUT_DIRECTORY \
                   --threads NUMBER_THREADS

The manifest is a text file in which each line should list the absolute path to an input BAM/CRAM file. For CRAM input, it’s suggested to provide the path to the reference fasta file with --reference in the command.

Interpreting the output

The program produces a .tsv file in the directory specified by --outDir. The fields are explained below:

Fields in tsv	Explanation
Sample	Sample name
is_biallelic_GBAP1-like_variant_exon9-11	Whether the sample is called as biallelic for GBAP1-like variants in exon9-11
is_carrier_GBAP1-like_variant_exon9-11	Whether the sample is called as a carrier for GBAP1-like variants in exon9-11
total_CN	Total copy number of GBA+GBAP1
deletion_breakpoint_in_GBA_gene	Whether the deletion breakpoint is in GBA gene if a deletion exists
GBAP1-like_variant_exon9-11	GBAP1-like variants called in exon9-11, two alleles separated by /
other_variants	Other variants called (non-GBAP1-like variants or variants outside of exon9-11)

A .json file is also produced that contains more information about each sample.

Fields in json	Explanation
Coverage_MAD	Median absolute deviation of depth, measure of sample quality
Median_depth	Sample median depth
deletion_CN	CN of the unique region between GBA and GBAP1. This value plus 2 is the total CN
deletion_CN_raw	Raw normalized depth of the unique region between GBA and GBAP1
variant_raw_count	Supporting reads for each variant
snp_call	GBA copy number call at GBA/GBAP1 differentiating sites
snp_raw	Raw GBA copy number at GBA/GBAP1 differentiating sites
haplotypes	Summary of haplotypes assembled across GBA/GBAP1 differentiating sites in Exon9-11

ARRU seismic backprojection - Earthquake waveform detection and P/S arrivals picking on continuous data using ARRU phase picker

ARRU_seismic_backprojection Earthquake waveform detection and P/S arrivals picki

8 Nov 4, 2022

Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.

Download and display GOES-East and GOES-West data GOES-East and GOES-West satellite data are made available on Amazon Web Services through NOAA's Big

88 Dec 16, 2022

Comments

UserWarning: multiple_iterators not implemented for CRAM

When running with .cram file, got the following warnings /gauchian/depth_calling/snp_count.py:131: UserWarning: multiple_iterators not implemented for CRAM ignore_orphan=False /gauchian/depth_calling/haplotype.py:189: UserWarning: multiple_iterators not implemented for CRAM min_base_quality=13

Will these warnings affect the quality of calls?

opened by LNGDingj 1

Releases(v1.0.2)

v1.0.2(Jan 6, 2022)

Updated README, demo, setup.py and renamed some program output fields. No algorithm change.
Source code(tar.gz)
Source code(zip)
v1.0.1(Dec 13, 2021)

Updated setup.py and README. No code change.
Source code(tar.gz)
Source code(zip)
v1.0(Dec 5, 2021)

Version 1.0
Source code(tar.gz)
Source code(zip)

A variant caller for the GBA gene using WGS data

Related tags

Overview

Gauchian: WGS-based GBA variant caller

Running the program

Interpreting the output

You might also like...

Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects

Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.

Open-source data observability for modern data teams

A demo of a data science project using Kedro

Data Poisoning based on Adversarial Attacks using Non-Robust Features

Cisco IOS-XE Operations Program. Shows operational data using restconf and yang

Run python scripts and pass data between multiple python and node processes using this npm module

ARRU seismic backprojection - Earthquake waveform detection and P/S arrivals picking on continuous data using ARRU phase picker

Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.

Comments

UserWarning: multiple_iterators not implemented for CRAM

Releases(v1.0.2)

v1.0.2(Jan 6, 2022)

v1.0.1(Dec 13, 2021)

v1.0(Dec 5, 2021)

Owner

Illumina

Extract gene length based on featureCount calculation gene nonredundant exon length method.

A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

A program made in PYTHON🐍 that automatically performs data insertions into a POSTGRES database 🐘 , using as base a .CSV file 📁 , useful in mass data insertions

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

resultados (data) de elecciones 2021 y código para extraer data de la ONPE

An unofficial python API for trading on the DeGiro platform, with the ability to get real time data and historical data.

Improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity