Cool Bioinformatics Scripts
qqplot
You can use this script in two ways
- read tons of millions of P values from stdin
# python
zcat pval.txt.gz | qqplot.py -out test -title "QQ plot on the fly"
# julia
zcat pval.txt.gz | qqplot.jl --out test --title "QQ plot on the fly"
warning : If you have 100 billion P values to process you should definitely use qqplot.jl instead of qqplot.py. The hourly processed lines of julia version is 3 billion while python is only 700 million on my server.
- use qqplot.py in your script
import numpy as np
from qqplot import qqplot
p = np.random.random(1000000)
qqplot(x=p, figname="test.png")
fixref
Before running bcftools merge
, you maybe need to fix the ref and alt and corresponding genotypes, otherwise bcftools
will surprise you.
usage: fixref.py [-h] REF_VCF IN_VCF OUT_VCF
calculate genotype discordance
When you run imputation analysis with BEAGLE
(or other imputation tools), you may want to know the distribution of genotype discordance between the original vcf and imputed vcf.
warning : Before running the script, you must be sure the two vcfs have the exact same sites and samples for each chromosome.
usage: calc_imputed_gt_discord.py [-h] [-chr STRING] VCF1 VCF2 OUT