single_snake_sequencing - sc/snRNAseq Snakemake Workflow
A Snakemake workflow for standardised sc/snRNAseq analysis.
Every single cell analysis is slightly different. This represents what I would call a "core" analysis, as nearly every analysis I perform start with something very akin to this. Given this custom nature of single cell, this workflow is not designed to be all encompassing. Rather, it aims to be extensible, modular, and reproducible. Any given step can be easily modified - as they are all self contained scripts - and a new rule can be easily added - see the downstream rules for an example. Finally, by taking advantage of the integrated Conda and Singularity support, we can run the whole thing in an isolated environment.
Notes on Installation
HOLDING
Notes on Configuration
⚠️ Be sure to change the configuration to suit your project!
For a full discussion of configuration, please see the configuration README.
Briefly, the general configuration file must be located at config/config.yaml
. A samplesheet containing information pertaining to the data must be supplied as well. Both are schema validated.
Notes on Data
This pipeline expects de-multiplexed fastq.gz files, normally produced by some deriviative of bcl2fastq
after sequencing. They can (technically) be placed anywhere, but we recommend creating a data
directory in your project for them.
Notes on the tools
The analysis pipeline was run using Snakemake v6.6.1. The full version and software lists can be found under the relevant yaml files in workflow/envs
. The all reasonable efforts have been made to ensure that the repository adheres to the best practices outlined here.
Notes on the analysis
For a full discussion on the analysis methods, please see the technical documentation.
Briefly, the count matrix was produced using Cellranger, droplet calling with DropletUtils::emptyDrops
, doublet detection with SOLO
from the scVI
family, batch effect removal with harmonypy
, and general analysis and data handling with scanpy
.
Future work
- Supply tests
- Track lane in samples that have been pooled and de-multiplexed
- Parallelise emptyDrops
- Support custom references
- Support SCTransform?