GitHub - sequana/chipseq: ChIP-seq pipeline

JOSS (journal of open source software) DOI

This is the chipseq pipeline from the Sequana project.

Overview:	ChIP-seq pipeline from raw reads to peaks, IDR statistics, and functional annotation
Input:	Paired or single-end FastQ files and a CSV experimental design file
Output:	HTML summary report, narrow/broad peak files, IDR statistics, bigwig tracks, annotation tables, and IGV session file
Status:	Production
Citation:	Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi.org/10.21105/joss.00352

Installation

pip install sequana_chipseq --upgrade

You will also need the third-party tools listed under Requirements below.

Quick Start

1. Prepare a design file design.csv:

type,condition,replicat,sample_name
IP,EXP1,1,IP_EXP1_rep1
IP,EXP1,2,IP_EXP1_rep2
Input,EXP1,1,Input_EXP1

type must be IP (immunoprecipitated) or Input (control).
sample_name must match the prefix of the corresponding FastQ file (e.g. IP_EXP1_rep1 matches IP_EXP1_rep1_R1_.fastq.gz).
At least two IP replicates per condition are required for IDR analysis.

2. Prepare a genome directory named after the genome, containing:

<name>.fa — reference genome FASTA
<name>.gff or <name>.gff3 — gene annotation

Example:

ecoli_MG1655/
├── ecoli_MG1655.fa
└── ecoli_MG1655.gff

3. Set up the pipeline:

sequana_chipseq \
    --input-directory DATAPATH \
    --genome-directory /path/to/ecoli_MG1655 \
    --design-file design.csv

4. Run the pipeline:

cd chipseq
bash chipseq.sh

Usage

sequana_chipseq --help

Key pipeline-specific options:

--genome-directory: Path to the genome directory (must contain <name>.fa and <name>.gff).
--design-file: CSV experimental design file (see Quick Start above).
--aligner-choice: Aligner to use. Currently only bowtie2 is supported.
--blacklist-file: BED3 file of genomic regions to exclude from analysis (tab-separated: chromosome, start, end).
--genome-size: Effective genome size for macs3 peak calling. Automatically computed from the FASTA file if not provided; override with a plain integer.
--do-fingerprints: Enable plotFingerprint QC to assess ChIP enrichment quality.

Run on a SLURM cluster:

cd chipseq
sbatch chipseq.sh

Or drive Snakemake directly:

snakemake -s chipseq.rules --cores 4 --stats stats.txt

Usage with Apptainer

Run every tool inside pre-built containers — no local tool installation needed:

sequana_chipseq \
    --input-directory DATAPATH \
    --genome-directory /path/to/genome \
    --design-file design.csv \
    --apptainer-prefix ~/.sequana/apptainers

Then run as usual:

cd chipseq
bash chipseq.sh

Requirements

The following tools must be available (install via conda/bioconda):

mamba env create -f environment.yml

bowtie2 — read alignment
fastp — adapter trimming and quality filtering
fastqc — per-read quality control
samtools — BAM sorting, indexing, and flagstat
bedtools — bedGraph generation from BAM files (genomeCoverageBed)
ucsc-bedgraphtobigwig — bedGraph to bigWig conversion (bedGraphToBigWig)
deeptools — fingerprint QC (plotFingerprint) and multi-sample bigwig summary (multiBigwigSummary)
macs3 — narrow and broad peak calling
homer — peak annotation (annotatePeaks.pl)
idr — Irreproducibility Discovery Rate between replicates (installed from sequana/idr fork via pip; the upstream bioconda package is Python 3.10-only)
multiqc — aggregated QC report

Pipeline overview

Trimming — fastp removes low-quality reads and adapters.
QC — FastQC on raw and cleaned reads.
Alignment — bowtie2 maps reads to the reference genome.
[Optional] Mark duplicates — Picard marks PCR duplicates.
[Optional] Blacklist removal — bedtools removes artefact-prone regions.
bigwig — per-sample coverage tracks for genome browsers (bedtools genomeCoverageBed → UCSC bedGraphToBigWig); an IGV session file (igv.xml) is generated to preload all tracks.
[Optional] Fingerprints — plotFingerprint QC to assess ChIP enrichment.
Phantom peak — strand cross-correlation analysis (NSC, RSC, Qtag scores).
Peak calling — macs3 detects narrow and broad peaks for each IP vs Input pair.
FRiP — Fraction of Reads in Peaks per sample and comparison.
IDR — Irreproducibility Discovery Rate on true replicates, pseudo-replicates, and self-pseudo-replicates.
Annotation — homer annotates peaks relative to genomic features.
MultiQC — aggregated QC across all samples.
HTML report — summary with phantom peaks, FRiP plots, IDR tables, and annotation plots.

Configuration

Here is the latest documented configuration file. Key sections:

general — aligner choice and genome directory path
fastp — trimming options (length, quality, adapters)
fastqc — FastQC options and threads
bowtie2_mapping / bowtie2_index — mapping options, threads, memory
macs3 — peak calling parameters (genome size, bandwidth, q-value, broad cutoff)
idr — IDR thresholds, rank metric, number of pseudo-replicates
fingerprints — enable/disable and number of bins
mark_duplicates — enable/disable PCR duplicate marking
remove_blacklist — enable/disable and path to BED blacklist
trimming — enable/disable read trimming and choice of trimming tool
phantom — use SPP (use_spp: true) instead of the built-in sequana phantom-peak detection
igv — enable/disable generation of the IGV session file (igv.xml)
multiqc — MultiQC options

Changelog

Version	Description
0.13.0	Migrate to standard importlib.metadata version pattern Remove click_completion dependency Add snakemake, pulp dependencies to pyproject.toml Add dot2svg to localrules Fix CI: use generate-run-shell and micromamba-shell Update environment.yml: add graphviz, pulp, sequana_pipetools Fix README badges and apptainer usage
0.12.0	Fix `macs3`, `self_pseudo_replicate_peaks`, and `pseudo_replicate_peaks` rules: macs3 exits non-zero on sparse CI data; added `\|\| true` + conditional `touch` so the pipeline continues and downstream rules handle empty peak files gracefully Add `container: sequana_tools` to all macs3 rules so peak calling runs consistently inside the apptainer container Replace bioconda `idr` with pip install from `sequana/idr` fork; fixes CI failures on Python 3.11/3.12 (upstream package is Python 3.10-only due to Cython 3.x incompatibility) Fix `plot_FRiP`: was iterating over all comparisons inside each rule invocation causing `FileNotFoundError` in parallel runs; now processes only its own wildcard Fix IDR rules (`idr_NT`, `self_pseudo_replicate_idr`, `pseudo_replicate_idr`): IDR exits non-zero on sparse data; added `\|\| true` + conditional `mv` so the pipeline continues and downstream Python rules handle empty results gracefully peaks and Homer returns an empty DataFrame Fix `fastp` rule: use `input.fastq` / `output.r1` / `output.r2` to match the sequana-wrappers fastp shell interface; split into paired/single-end branches Add `log:` directives and stderr redirection to rules that were missing them: `phantom_align`, `chrom_sizes`, `fingerprints`, `bam_to_bed`, `bed_to_bigwig`, `pseudo_replicate_idr` Update `sequana_tools` container to `26.1.14` Update CI: Python 3.10/3.11/3.12; `actions/checkout@v4`
0.11.0	Switch to click and new sequana_pipetools
0.10.0	Fix design in case of samples that start with the same prefix Include final IDR plots and tables Fix containers and wrappers in the config file Better HTML report
0.9.1	Fix requirements and setup.py (remove wrong idr package)
0.9.0	Use latest wrappers and apptainer (for rulegraph)
0.8.0	First release.

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
doc		doc
sequana_pipelines/chipseq		sequana_pipelines/chipseq
test		test
.codacy.yaml		.codacy.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.workflow-index.yml		.workflow-index.yml
LICENSE		LICENSE
README.rst		README.rst
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quick Start

Usage

Usage with Apptainer

Requirements

Pipeline overview

Configuration

Changelog

Contribute & Code of Conduct

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Quick Start

Usage

Usage with Apptainer

Requirements

Pipeline overview

Configuration

Changelog

Contribute & Code of Conduct

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages