This is the downsampling pipeline from the Sequana project.
| Overview: | Downsample NGS data sets (FastQ or FastA). |
|---|---|
| Input: | A set of FastQ or FastA files (single or paired-end). |
| Output: | Downsampled FastQ or FastA files. |
| Status: | Production |
| Citation: | Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352 |
pip install sequana_downsampling --upgrade
You will also need pigz available on your PATH.
1. Set up the pipeline:
sequana_downsampling --input-directory DATAPATH
2. Run the pipeline:
cd downsampling bash downsampling.sh
sequana_downsampling --help
Key pipeline-specific options:
--downsampling-input-format- Input format:
fastq(default),fasta, orsam. --downsampling-methodrandom(default, keeps a fixed number of reads) orrandom_pct(keeps a percentage of reads).--downsampling-max-entries- Number of reads to keep when using
random(default: 1000). --downsampling-percent- Percentage of reads to keep when using
random_pct(default: 10). --downsampling-threads- Number of threads used by
pigzto compress output (default: 4).
Examples:
sequana_downsampling --input-directory DATAPATH \
--downsampling-method random --downsampling-max-entries 100
sequana_downsampling --input-directory DATAPATH \
--downsampling-method random_pct --downsampling-percent 10 \
--downsampling-input-format fasta --input-pattern "*.fasta"
Run on a SLURM cluster:
cd downsampling sbatch downsampling.sh
Or drive Snakemake directly:
snakemake -s downsampling.rules --cores 4 --stats stats.txt
The following tools must be available (install via conda/bioconda):
mamba env create -f environment.yml
- sequana — FastQ/FastA selection (Python API)
- pigz — parallel gzip compression of outputs
The pipeline randomly selects reads from the input files (single or paired).
If the inputs are paired, the one-to-one mapping between R1 and R2 is
preserved. FastQ inputs can be gzipped; outputs are gzipped with pigz.
FastA inputs and outputs are uncompressed.
Here is the latest documented configuration file. Key sections:
downsampling— method (random/random_pct),max_entries,percent,threads, andinput_format(fastq/fasta)
| Version | Description |
|---|---|
| 0.10.0 |
|
| 0.9.0 |
|
| 0.8.5 |
|
| 0.8.4 |
|
| 0.8.3 |
|
| 0.8.2 |
|
| 0.8.1 |
|
| 0.8.0 | First release. |
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.