GitHub - sequana/downsampling: down sample NGS data

JOSS (journal of open source software) DOI

This is the downsampling pipeline from the Sequana project.

Overview:	Downsample NGS data sets (FastQ or FastA).
Input:	A set of FastQ or FastA files (single or paired-end).
Output:	Downsampled FastQ or FastA files.
Status:	Production
Citation:	Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352

Installation

pip install sequana_downsampling --upgrade

You will also need pigz available on your PATH.

Quick Start

1. Set up the pipeline:

sequana_downsampling --input-directory DATAPATH

2. Run the pipeline:

cd downsampling
bash downsampling.sh

Usage

sequana_downsampling --help

Key pipeline-specific options:

--downsampling-input-format: Input format: fastq (default), fasta, or sam.
--downsampling-method: random (default, keeps a fixed number of reads) or random_pct (keeps a percentage of reads).
--downsampling-max-entries: Number of reads to keep when using random (default: 1000).
--downsampling-percent: Percentage of reads to keep when using random_pct (default: 10).
--downsampling-threads: Number of threads used by pigz to compress output (default: 4).

Examples:

sequana_downsampling --input-directory DATAPATH \
    --downsampling-method random --downsampling-max-entries 100

sequana_downsampling --input-directory DATAPATH \
    --downsampling-method random_pct --downsampling-percent 10 \
    --downsampling-input-format fasta --input-pattern "*.fasta"

Run on a SLURM cluster:

cd downsampling
sbatch downsampling.sh

Or drive Snakemake directly:

snakemake -s downsampling.rules --cores 4 --stats stats.txt

Requirements

The following tools must be available (install via conda/bioconda):

mamba env create -f environment.yml

sequana — FastQ/FastA selection (Python API)
pigz — parallel gzip compression of outputs

Pipeline overview

The pipeline randomly selects reads from the input files (single or paired). If the inputs are paired, the one-to-one mapping between R1 and R2 is preserved. FastQ inputs can be gzipped; outputs are gzipped with pigz. FastA inputs and outputs are uncompressed.

Configuration

Here is the latest documented configuration file. Key sections:

downsampling — method (random / random_pct), max_entries, percent, threads, and input_format (fastq / fasta)

Changelog

Version	Description
0.10.0	Migrate to Poetry / pyproject.toml packaging Simplify __init__.py using importlib.metadata Rewrite CLI with rich_click (replaces argparse) Update CI to use setup-micromamba with generate-run-shell Add `localrules: pipeline` Add `tools.txt` and `environment.yml` Refresh README badges and usage examples
0.9.0	Maintenance release
0.8.5	Cope with R1/R2 paired data properly. Improved make file
0.8.4	Add missing MANIFEST to include missing requirements.txt
0.8.3	Comply with new API from sequana_pipetools 0.2.4
0.8.2	Add a --run option to execute the pipeline directly
0.8.1	Fix input and N in the random selection
0.8.0	First release.

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
doc		doc
sequana_pipelines/downsampling		sequana_pipelines/downsampling
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.workflow-index.yml		.workflow-index.yml
LICENSE		LICENSE
README.rst		README.rst
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quick Start

Usage

Requirements

Pipeline overview

Configuration

Changelog

Contribute & Code of Conduct

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Quick Start

Usage

Requirements

Pipeline overview

Configuration

Changelog

Contribute & Code of Conduct

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages