Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

McVicker Lab

Last update: Aug 11, 2022

Related tags

Deep Learning WASP2

Overview

WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Requirements

Python >= 3.7
numpy
pandas
scipy
pysam
pybedtools

Installation

Recommended installation through conda, and given environment

conda env create -f environment.yml

Allelic Imbalance Analysis

Analysis pipeline currently consists of two tools (Count and Analysis)

Count Tool

Counts alleles in ATAC peaks that overlap heterozygous SNP's

Usage

python run_analysis.py count -a [BAM] -g [VCF] -s [VCF Sample] -r [Peaks] {OPTIONS}

Required Arguments

-a/--alignment: BAM file containing alignments.
-g/--genotypes: VCF file with genotypes.
-s/--sample: Sample name in VCF file.
-r/--regions: Regions of interest in narrowPeak, GTF, or BED format. (ONLY narrowPeak support implemented)

Single-Cell Additional Requirements

-sc/--singlecell: Flag that denotes data is single-cell.
-b/--barcodes: 2 Column TSV that contains barcodes and their group/cell mapping.

Optional Arguments

-o/--output: Directory to output counts. (Default. CWD)
--nofilt: Skip step that pre-filters reads that overlap regions of interest
--keeptemps: Keep intermediary files during preprocessing step, outputs to directory if given with flag, otherwise outputs to CWD.

Analysis Tool

Analyzes Allelic Imbalance per ATAC peak given allelic count data

Usage

python run_analysis.py analysis [COUNTS] {OPTIONS}

Required Arguments

COUNTS: first positional argument, output data from count tool

Single-Cell Additional Requirements

-sc/--singlecell: Flag that denotes data is single-cell

Optional Arguments

--min: Minimum allele count needed for analysis. (Default. 10)
-o/--output: Directory to output counts. Defaults to CWD if not given. (Default. CWD)
-m/--model: Model used for measuring imbalance. Choice of "single", "linear", or "binomial". (Default. "single")

TODO

Unbiased Read Mapping Curently in development

Allelic Imbalance Pipeline

Counts
- Need to implement RNA-Seq and Gene support
- More robust for different inputs for bulk and single-cell data
Analysis
- More specific implementations for single-cell data

You might also like...

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Dataset Cartography Code for the paper Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics at EMNLP 2020. This repository cont

125 Dec 22, 2022

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

305 Dec 21, 2022

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

1.1k Dec 27, 2022

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

183 Dec 1, 2022

[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment

Interactive Scene Reconstruction Project Page | Paper This repository contains the implementation of our ICRA2021 paper Reconstructing Interactive 3D

97 Dec 28, 2022

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping Version 1.0 COVINS is an accurate, scalable, and versatile vis

183 Dec 27, 2022

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

71 Nov 15, 2022

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

41 Dec 10, 2022

This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

From "Onion Not Found" to Guard Discovery (PETS'22) This repository holds the code and data for our PETS'22 paper titled 'From "Onion Not Found" to Gu

3 May 4, 2022

Comments

Testing allelic analysis

Aaron,

Thanks for sharing the new version for testing allelic imbalance. I'm starting to try to port over our WASP AI testing to WASP2. I thought I would start from .merge.bam files and recount those.

As background, I have VCF files which represent a simulated diploid genome. Previously the chr-separated VCF worked fine with WASP pipeline all the way to CHT, but there may be some issues with those VCF, where I need to add additional fields or tags.

As a side question, can I use the AI test in WASP2 with counts from WASP? E.g. these:

wasp_cht/alt_as_counts.sample_A_1.h5
wasp_cht/hap_read_counts.sample_A_1.adj
wasp_cht/hap_read_counts.sample_A_1.hetp
wasp_cht/hap_read_counts.sample_A_1.txt
wasp_cht/other_as_counts.sample_A_1.h5
wasp_cht/read_counts.sample_A_1.h5
wasp_cht/ref_as_counts.sample_A_1.h5

If not, with the WASP2 counting script, I am now getting this error:

python run_analysis.py count --rna -ft data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz \
  -a wasp_mapping/sample_A_1.merge.bam -g data/drosophila_wg.vcf -s sample \
  -r data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz -o testai
Namespace(command='count', stype='rna', singlecell=False, 
features=['data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz'], 
alignment='wasp_mapping/sample_A_1.merge.bam', 
genotypes='data/drosophila_wg.vcf', sample='sample', 
regions='data/Drosophila_melanogaster.BDGP6.28.100.chr.gtf.gz', 
barcodes=None, output='testai', nofilt=False, keeptemps=None)
Bulk Analysis
GTF filtered by feature
Filtering reads that overlap regions of interest
Bam file filtered!
Traceback (most recent call last):
  File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 273, in <module>
    main()
  File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 266, in main
    parse_counting(args.alignment, args.genotypes, args.regions, 
args.sample, args.output, args.stype, nofilt=args.nofilt, temp_loc=args.keeptemps, features=args.features)
  File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 53, in parse_counting
    intersect_df = preprocess_data(in_bam, in_vcf, in_region, in_sample, stype, nofilt, tmpdir, features)
  File "/proj/milovelab/bin/WASP2/src/analysis/run_analysis.py", line 34, in preprocess_data
    write_sample_snp(in_vcf, in_sample, out_dir)
  File "/proj/milovelab/bin/WASP2/src/analysis/filter_data.py", line 24, in write_sample_snp
    vcf = VariantFile(in_file)
  File "pysam/libcbcf.pyx", line 4054, in pysam.libcbcf.VariantFile.__init__
  File "pysam/libcbcf.pyx", line 4284, in pysam.libcbcf.VariantFile.open
ValueError: invalid file `b'data/drosophila_wg.vcf'` (mode=`b'r'`) - is it VCF/BCF format?

The VCF in question has some non-required fields missing but it appears valid:

vcf-validator data/drosophila_wg.vcf
Could not parse the fileformat version string [#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample], assuming VCFv4.2
The header tag 'reference' not present. (Not required but highly recommended.)
The "fileformat" field not present in the header, assuming VCFv4.2
The header tag 'contig' not present for CHROM=2L. (Not required but highly recommended.)
column sample at 2L:10591 .. FORMAT tag [GL] not listed in the header,FORMAT tag [GT] not listed in the header
The header tag 'contig' not present for CHROM=2R. (Not required but highly recommended.)
The header tag 'contig' not present for CHROM=3L. (Not required but highly recommended.)
The header tag 'contig' not present for CHROM=3R. (Not required but highly recommended.)
The header tag 'contig' not present for CHROM=4. (Not required but highly recommended.)
The header tag 'contig' not present for CHROM=X. (Not required but highly recommended.)

The VCF file looks like:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample
2L	10591	rs1	T	A	.	PASS	.	GT:GL	0|1:-100,0,-100
2L	11464	rs2	A	T	.	PASS	.	GT:GL	0|1:-100,0,-100
...

opened by mikelove 14

Example for barcodes files.

Hi

Thanks for your great tool.

I want to perform ASE calling from 10X scRNA-seq data. Is there an example file for barcodes file(barcodes and their group/cell mapping) so that I can follow the tutorial? Many thanks.

Best

opened by fanyue322 1

Owner

McVicker Lab

GitHub

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

3 Nov 3, 2022

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021) Official Pytorch implementation of Unbiased Classification

17 Jan 1, 2023

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

366 Dec 28, 2022

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

ArtFlow Official PyTorch implementation of the paper: ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows Jie An*, Siyu Huang*, Yibing

123 Dec 27, 2022

Toward Spatially Unbiased Generative Models (ICCV 2021)

Toward Spatially Unbiased Generative Models Implementation of Toward Spatially Unbiased Generative Models (ICCV 2021) Overview Recent image generation

88 Dec 1, 2022

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

66 Dec 31, 2022

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Related tags

Overview

WASP2 (Currently in pre-development): Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Requirements

Installation

Allelic Imbalance Analysis

Count Tool

Analysis Tool

TODO

You might also like...

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

[ICRA2021] Reconstructing Interactive 3D Scene by Panoptic Mapping and CAD Model Alignment

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

Comments

Testing allelic analysis

Example for barcodes files.

Owner

McVicker Lab

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Official Pytorch implementation of "Unbiased Classification Through Bias-Contrastive and Bias-Balanced Learning (NeurIPS 2021)

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

Toward Spatially Unbiased Generative Models (ICCV 2021)

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

Simple Tensorflow implementation of Toward Spatially Unbiased Generative Models (ICCV 2021)

BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment