WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis
Requirements
- Python >= 3.7
- numpy
- pandas
- scipy
- pysam
- pybedtools
Installation
Recommended installation through conda, and given environment
conda env create -f environment.yml
Allelic Imbalance Analysis
Analysis pipeline currently consists of two tools (Count and Analysis)
Count Tool
Counts alleles in ATAC peaks that overlap heterozygous SNP's
Usage
python run_analysis.py count -a [BAM] -g [VCF] -s [VCF Sample] -r [Peaks] {OPTIONS}
Required Arguments
- -a/--alignment: BAM file containing alignments.
- -g/--genotypes: VCF file with genotypes.
- -s/--sample: Sample name in VCF file.
- -r/--regions: Regions of interest in narrowPeak, GTF, or BED format. (ONLY narrowPeak support implemented)
Single-Cell Additional Requirements
- -sc/--singlecell: Flag that denotes data is single-cell.
- -b/--barcodes: 2 Column TSV that contains barcodes and their group/cell mapping.
Optional Arguments
- -o/--output: Directory to output counts. (Default. CWD)
- --nofilt: Skip step that pre-filters reads that overlap regions of interest
- --keeptemps: Keep intermediary files during preprocessing step, outputs to directory if given with flag, otherwise outputs to CWD.
Analysis Tool
Analyzes Allelic Imbalance per ATAC peak given allelic count data
Usage
python run_analysis.py analysis [COUNTS] {OPTIONS}
Required Arguments
- COUNTS: first positional argument, output data from count tool
Single-Cell Additional Requirements
- -sc/--singlecell: Flag that denotes data is single-cell
Optional Arguments
- --min: Minimum allele count needed for analysis. (Default. 10)
- -o/--output: Directory to output counts. Defaults to CWD if not given. (Default. CWD)
- -m/--model: Model used for measuring imbalance. Choice of "single", "linear", or "binomial". (Default. "single")
TODO
- Unbiased Read Mapping Curently in development
Allelic Imbalance Pipeline
-
Counts
- Need to implement RNA-Seq and Gene support
- More robust for different inputs for bulk and single-cell data
-
Analysis
- More specific implementations for single-cell data