IntGenomicsLab/lrsomatic is a robust bioinformatics pipeline designed for processing and analyzing somatic DNA sequencing data for long-read sequencing technologies from Oxford Nanopore and PacBio. It supports both canonical base DNA and modified base calling, including specialized applications such as Fiber-seq.
This end-to-end pipeline handles the entire workflow — from raw read processing and alignment, to comprehensive somatic variant calling, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.
It can be run in both matched tumour-normal and tumour-only mode, offering flexibility depending on the users study design.
Developed using Nextflow DSL2, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from nf-core/modules, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.
For more information on how to run the pipeline, you can also go here.
1) Pre-processing:
a. Raw read QC (cramino)
b. Alignment to the reference genome (minimap2)
c. Post alignment QC (cramino, samtools idxstats, samtools flagstats, samtools stats)
d. Specific for calling modified base calling (Modkit, Fibertools)
2i) Matched mode: small variant calling:
a. Calling Germline SNPs (Clair3)
b. Phasing and Haplotagging the SNPs in the normal and tumour BAM (LongPhase)
c. Calling somatic SNVs (ClairS)
2ii) Tumour only mode: small variant calling:
a. Calling Germline SNPs and somatic SNVs (ClairS-TO)
b. Phasing and Haplotagging germline SNPs in tumour BAM (LongPhase)
3) Large variant calling:
a. Somatic structural variant calling (Severus)
b. Copy number alterion calling; long read version of (ASCAT)
4) Annotation:
a. Small variant annotation (VEP)
b. Structural variant annotation (VEP)
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First prepare a samplesheet with your input data that looks as follows:
sample,bam_tumor,bam_normal,platform,sex,fiber
sample1,tumour.bam,normal.bam,ont,female,n
sample2,tumour.bam,,ont,female,y
sample3,tumour.bam,,pb,male,n
sample4,tumour.bam,normal.bam,pb,male,yEach row represents a sample. The bam files should always be unaligned bam files. All fields except for bam_normal are required. If bam_normal is empty, the pipeline will run in tumour only mode. platform should be either ont or pb for Oxford Nanopore Sequencing or PacBio sequencing, respectively. sex refers to the biological sex of the sample and should be either female or male. Finally, fiber specifies whether your sample is Fiber-seq data or not and should have either y for Yes or n for No.
Now, you can run the pipeline using:
nextflow run IntGenomicsLab/lrsomatic \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>More detail is given in our usage documentation
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
IntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Robert Forsyth, Alexandra Pančíková, Marios Eftychiou, Ruben Cools, Laurens Lambrechts, and Jonas Demeulemeester.
This pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from cramino (fastq), cramino (bam), mosdepth, samtools (stats/flagstat/idxstats), and optionally fibertools. Finally, we have a multiqc report from that combines the output from mosdepth and samtools into one html report.
Besides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (clairS, clairS-TO, and severus) these will contain, among others, vcf files with called variants. For ascat these contain files with final copy number information and plots of the copy number profiles.
Example output directory structure:
├── Sample 1
│ ├── ascat
│ ├── bamfiles
│ ├── qc
│ │ ├── tumor
│ │ │ ├── cramino_aln
│ │ │ ├── cramino_ubam
│ │ │ ├── fibertoolsrs
│ │ │ ├── mosdepth
│ │ │ ├── samtools
│ ├── variants
│ │ ├──clairS-TO
│ │ ├──severus
│ ├── vep
│ │ ├── germline
│ │ ├── somatic
│ │ ├── SVs
│
├── Sample 2
│ ├── ascat
│ ├── bamfiles
│ ├── qc
│ │ ├── tumor
│ │ │ ├── cramino_aln
│ │ │ ├── cramino_ubam
│ │ │ ├── fibertoolsrs
│ │ │ ├── mosdepth
│ │ │ ├── samtools
│ │ ├── normal
│ │ │ ├── cramino_aln
│ │ │ ├── cramino_ubam
│ │ │ ├── fibertoolsrs
│ │ │ ├── mosdepth
│ │ │ ├── samtools
│ ├── variants
│ │ ├── clair3
│ │ ├── clairS
│ │ ├── severus
│ ├── vep
│ │ ├── germline
│ │ ├── somatic
│ │ ├── SVs
├── pipeline_info
more detail is given in our output documentation
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use IntGenomicsLab/lrsomatic for your analysis, please cite it using the following:
LRSomatic: a highly scalable and robust pipeline for somatic variant calling in long-read sequencing data
Robert A. Forsyth*, Luuk Harbers*, Amber Verhasselt, Ana-Lucía Rocha Iraizós, Sidi Yang, Joris Vande Velde, Christopher Davies, Nischalan Pillay, Laurens Lambrechts, Jonas Demeulemeester
bioRxiv 2026.02.26.707772; doi: https://doi.org/10.64898/2026.02.26.707772
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
