Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Related tags

Deep Learning WASP2
Overview

WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

 

Requirements

  • Python >= 3.7
  • numpy
  • pandas
  • scipy
  • pysam
  • pybedtools

 

Installation

Recommended installation through conda, and given environment

conda env create -f environment.yml

 

Allelic Imbalance Analysis

Analysis pipeline currently consists of two tools (Count and Analysis)

 

Count Tool

Counts alleles in ATAC peaks that overlap heterozygous SNP's

Usage

python run_analysis.py count -a [BAM] -g [VCF] -s [VCF Sample] -r [Peaks] {OPTIONS}

Required Arguments

  • -a/--alignment: BAM file containing alignments.
  • -g/--genotypes: VCF file with genotypes.
  • -s/--sample: Sample name in VCF file.
  • -r/--regions: Regions of interest in narrowPeak, GTF, or BED format. (ONLY narrowPeak support implemented)

Single-Cell Additional Requirements

  • -sc/--singlecell: Flag that denotes data is single-cell.
  • -b/--barcodes: 2 Column TSV that contains barcodes and their group/cell mapping.

Optional Arguments

  • -o/--output: Directory to output counts. (Default. CWD)
  • --nofilt: Skip step that pre-filters reads that overlap regions of interest
  • --keeptemps: Keep intermediary files during preprocessing step, outputs to directory if given with flag, otherwise outputs to CWD.

 

Analysis Tool

Analyzes Allelic Imbalance per ATAC peak given allelic count data

Usage

python run_analysis.py analysis [COUNTS] {OPTIONS}

Required Arguments

  • COUNTS: first positional argument, output data from count tool

Single-Cell Additional Requirements

  • -sc/--singlecell: Flag that denotes data is single-cell

Optional Arguments

  • --min: Minimum allele count needed for analysis. (Default. 10)
  • -o/--output: Directory to output counts. Defaults to CWD if not given. (Default. CWD)
  • -m/--model: Model used for measuring imbalance. Choice of "single", "linear", or "binomial". (Default. "single")

 

TODO

  • Unbiased Read Mapping Curently in development

Allelic Imbalance Pipeline

  • Counts

    • Need to implement RNA-Seq and Gene support
    • More robust for different inputs for bulk and single-cell data
  • Analysis

    • More specific implementations for single-cell data
You might also like...
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Dataset Cartography Code for the paper Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics at EMNLP 2020. This repository cont

Poisson Surface Reconstruction for LiDAR Odometry and Mapping
Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment
[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment

Interactive Scene Reconstruction Project Page | Paper This repository contains the implementation of our ICRA2021 paper Reconstructing Interactive 3D

 COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping
COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping Version 1.0 COVINS is an accurate, scalable, and versatile vis

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

Pytorch implementation of paper
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

This repository holds code and data for our PETS'22 article 'From
This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

From "Onion Not Found" to Guard Discovery (PETS'22) This repository holds the code and data for our PETS'22 paper titled 'From "Onion Not Found" to Gu

Comments
  • Testing allelic analysis

    Testing allelic analysis

    Aaron,

    Thanks for sharing the new version for testing allelic imbalance. I'm starting to try to port over our WASP AI testing to WASP2. I thought I would start from .merge.bam files and recount those.

    As background, I have VCF files which represent a simulated diploid genome. Previously the chr-separated VCF worked fine with WASP pipeline all the way to CHT, but there may be some issues with those VCF, where I need to add additional fields or tags.

    As a side question, can I use the AI test in WASP2 with counts from WASP? E.g. these:

    wasp_cht/alt_as_counts.sample_A_1.h5
    wasp_cht/hap_read_counts.sample_A_1.adj
    wasp_cht/hap_read_counts.sample_A_1.hetp
    wasp_cht/hap_read_counts.sample_A_1.txt
    wasp_cht/other_as_counts.sample_A_1.h5
    wasp_cht/read_counts.sample_A_1.h5
    wasp_cht/ref_as_counts.sample_A_1.h5
    

    If not, with the WASP2 counting script, I am now getting this error:

    python run_analysis.py count --rna -ft data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz \
      -a wasp_mapping/sample_A_1.merge.bam -g data/drosophila_wg.vcf -s sample \
      -r data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz -o testai
    Namespace(command='count', stype='rna', singlecell=False, 
    features=['data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz'], 
    alignment='wasp_mapping/sample_A_1.merge.bam', 
    genotypes='data/drosophila_wg.vcf', sample='sample', 
    regions='data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz', 
    barcodes=None, output='testai', nofilt=False, keeptemps=None)
    Bulk Analysis
    GTF filtered by feature
    Filtering reads that overlap regions of interest
    Bam file filtered!
    Traceback (most recent call last):
      File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 273, in <module>
        main()
      File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 266, in main
        parse_counting(args.alignment, args.genotypes, args.regions, 
    args.sample, args.output, args.stype, nofilt=args.nofilt, temp_loc=args.keeptemps, features=args.features)
      File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 53, in parse_counting
        intersect_df = preprocess_data(in_bam, in_vcf, in_region, in_sample, stype, nofilt, tmpdir, features)
      File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 34, in preprocess_data
        write_sample_snp(in_vcf, in_sample, out_dir)
      File "/proj/milovelab/bin/WASP2/src/analysis/filter_data.py", line 24, in write_sample_snp
        vcf = VariantFile(in_file)
      File "pysam/libcbcf.pyx", line 4054, in pysam.libcbcf.VariantFile.__init__
      File "pysam/libcbcf.pyx", line 4284, in pysam.libcbcf.VariantFile.open
    ValueError: invalid file `b'data/drosophila_wg.vcf'` (mode=`b'r'`) - is it VCF/BCF format?
    

    The VCF in question has some non-required fields missing but it appears valid:

    vcf-validator data/drosophila_wg.vcf
    Could not parse the fileformat version string [#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample], assuming VCFv4.2
    The header tag 'reference' not present. (Not required but highly recommended.)
    The "fileformat" field not present in the header, assuming VCFv4.2
    The header tag 'contig' not present for CHROM=2L. (Not required but highly recommended.)
    column sample at 2L:10591 .. FORMAT tag [GL] not listed in the header,FORMAT tag [GT] not listed in the header
    The header tag 'contig' not present for CHROM=2R. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=3L. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=3R. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=4. (Not required but highly recommended.)
    The header tag 'contig' not present for CHROM=X. (Not required but highly recommended.)
    

    The VCF file looks like:

    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample
    2L	10591	rs1	T	A	.	PASS	.	GT:GL	0|1:-100,0,-100
    2L	11464	rs2	A	T	.	PASS	.	GT:GL	0|1:-100,0,-100
    ...
    
    opened by mikelove 14
  • Example for barcodes files.

    Example for barcodes files.

    Hi

    Thanks for your great tool.

    I want to perform ASE calling from 10X scRNA-seq data. Is there an example file for barcodes file(barcodes and their group/cell mapping) so that I can follow the tutorial? Many thanks.

    Best

    opened by fanyue322 1
Owner
McVicker Lab
McVicker Lab
Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

null 3 Nov 3, 2022
Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021) Official Pytorch implementation of Unbiased Classification

Youngkyu 17 Jan 1, 2023
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

null 123 Dec 27, 2022
Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

Jooyoung Choi 88 Dec 1, 2022
[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

Wang Tan 66 Dec 31, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

Spatial unbiased GANs — Simple TensorFlow Implementation [Paper] : Toward Spatially Unbiased Generative Models (ICCV 2021) Abstract Recent image gener

Junho Kim 16 Apr 15, 2022
BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

BBB Streamer NG? Makes a conference like this... ...streamable like this! I also recorded a small video showing the basic features: https://www.youtub

Lukas Schauer 60 Oct 21, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022