Caller-agnostic adjudication of long-read RNA-seq novel isoforms.
PanIsoGuard is a post-processing / decision layer that ingests the novel isoform
calls produced by long-read isoform callers (FLAIR, IsoQuant, Bambu, ESPRESSO,
TALON, …) and/or SQANTI3 output, together
with a BAM and optional evidence inputs (short-read SJ.tab, personalized
haplotype FASTA), and re-classifies each novel call into a confidence class with a
machine-readable mechanistic attribution and a provenance / circularity flag.
PanIsoGuard does not recompute SQANTI3 QC descriptors. It consumes them as priors
and integrates them with independent path-level evidence from caller consensus,
short-read junction support, long-read mapping, personalized haplotypes, and
pangenome-supported junctions — it does not re-derive TSS/TTS, ORF/NMD, polyA, or
splice motifs, and is not a SQANTI3-style QC filter (see
docs/relationship_to_sqanti3.md). Its contribution is
the adjudication logic — how those orthogonal evidence axes are integrated into a
transparent, auditable verdict (thresholds live in a runtime
config/rules.default.toml, and each verdict carries a
rule_trace; see docs/decision_engine.md).
PanIsoGuard is an independent project, not affiliated with or endorsed by the SQANTI3 authors. It interoperates with SQANTI3 by reading its output file only (no SQANTI3 code is bundled or linked; SQANTI3's GPL-3.0 does not reach PanIsoGuard's MIT code). If you use SQANTI3 in your pipeline, please cite it — see docs/relationship_to_sqanti3.md.
Equivalently, each candidate isoform can be viewed as a path through a gene-local
splice graph — novel junctions are edges absent from the known (reference) graph —
and the same deterministic verdict can be read in those terms. This is a framing
of the existing engine (no graph model or new algorithm); each verdict additionally
carries a machine-readable graph_trace (novel-edge count, edge support, graph
distance). See docs/method_graph.md.
- When to use PanIsoGuard · At a glance
- Subcommands · Usage · Quick example
- Confidence classes · Evidence tiers
- Build · Install (conda) · Validation · Documentation
Consumes caller + SQANTI3 output (does not replace them); adds an orthogonal,
auditable verdict layer, each call carrying a machine-readable rule_trace. The four
output classes shown are a simplified grouping — the full set is listed
below. Vector source:
docs/figures/overview.svg.
Status: alpha. Four evidence axes (SQANTI priors, short-read junctions, BAM read-level mapping, variant/reference-bias) plus a file-based pangenome reference-bias tier are implemented and tested, along with the
adjudicate/benchmark/ablate/combinesubcommands.Validation. The adjudication logic is validated against ground truth (SQANTI-SIM AUPRC 0.970 vs a 0.831 baseline, and well-calibrated — ECE/Brier ≤ 0.013 on the run truth sets; see docs/validation.md).
Thresholds. A SQANTI-SIM (v49 chr22) threshold sweep finds AUPRC robust (0.969–0.970) across the grid with the shipped default within 1e-4 of grid-best — the conservative defaults are near-optimal there (benchmark/results/sqanti_sim/sweep.tsv), though not yet swept on additional datasets.
Pangenome. The file-based pangenome reference-bias rescue is validated on the real HPRC v1.1 chr22 graph (GATE-1): 0 false rescues on real FLAIR novel junctions, correct rescue on real population deletions, firewall holding under circular-risk provenance (benchmark/pangenome). Validated at chr22 scale; the in-process GBZ traversal remains future work. Treat the confidence classes as calibrated ordinal evidence integration, not a tuned probability.
Use it when you have novel long-read isoform calls and need to decide which to trust:
- You ran more than one isoform caller (FLAIR / IsoQuant / Bambu / ESPRESSO / TALON, …) and have several disagreeing novel-isoform sets. PanIsoGuard integrates them caller-agnostically by splice chain and stratifies each novel call by cross-caller agreement — single-caller novels are mostly artifacts, multi-caller agreement is a strong, matcher-robust confidence signal (benchmark/multicaller).
- You want a transparent confidence class per novel call, not a flat GTF — each verdict
is one of 7 classes with a machine-readable
rule_trace(and an optional PDF report), so you can filterHIGH/MEDIUM_CONF_NOVELand audit the rest instead of eyeballing reads. - You have a personalized haplotype or a pangenome and want to catch reference-bias false novelty — a junction that looks novel only because the sample differs from the linear reference. The rescue is a high-specificity guardrail (it never over-promotes; a circularity firewall blocks rescues that would rest on the sample's own RNA), most useful for non-reference / personalized-genome samples (benchmark/hg002).
It is not a caller or a QC re-implementation. It sits above the callers and consumes
SQANTI3 QC as priors — it does not re-derive TSS/TTS, ORF/NMD, polyA, or splice motifs
(docs/relationship_to_sqanti3.md), and its combine step
is a clean re-implementation of gffcompare -i, not a new merge
(docs/relationship_to_merge_tools.md).
| Command | Purpose |
|---|---|
adjudicate |
classify novel isoforms → confidence class + mechanistic attribution + provenance (3 output files) |
benchmark |
non-redundancy vs SQANTI3 on novel isoforms (2×2, Jaccard, McNemar) |
ablate |
per-evidence-axis class-change (which axis drives which calls) |
combine |
integrate multiple callers' isoforms by intron-chain fingerprint → caller-support matrix |
version |
version, linked htslib, compiled-in capabilities |
panisoguard <command> --help for options. Input formats: docs/input_formats.md.
adjudicate is the main entry point. The only hard requirements are a SQANTI3
classification and an output prefix — every evidence input below is optional and
simply switches on another axis (see Evidence tiers). PanIsoGuard
is caller- and organism-agnostic; the paths below are placeholders for your own
caller output, reference, and reads (any long-read caller, any genome build).
Baseline — SQANTI priors only (no short/long-read evidence; novel calls are flagged or held, never positively confirmed):
panisoguard adjudicate \
--classification classification.txt \
--isoforms-bed isoforms.bed \
--ref-gtf annotation.gtf \
--out-prefix out/sampleRecommended — add short-read junctions (--sj-tab) and the long-read BAM
(--bam), the two axes that let a novel isoform be confirmed or rejected on evidence:
panisoguard adjudicate \
--classification classification.txt \
--isoforms-bed isoforms.bed \
--ref-gtf annotation.gtf \
--sj-tab SJ.out.tab \
--bam aligned.bam \
--reference genome.fa \
--out-prefix out/sampleReference-bias rescue — add a personalized haplotype FASTA. Provenance gates the
circularity firewall: wgs/external may promote to a rescue verdict, while
rna_derived/unknown are held as AMBIGUOUS:
panisoguard adjudicate \
--classification classification.txt \
--isoforms-bed isoforms.bed \
--ref-gtf annotation.gtf \
--reference genome.fa \
--reference-haplotype haplotype1.fa \
--reference-haplotype haplotype2.fa \
--haplotype-provenance wgs \
--out-prefix out/samplePangenome reference-bias rescue (file-based; validated on HPRC v1.1 chr22) — supply graph-supported splice
junctions (pre-extracted from a pangenome graph such as HPRC with vg/rpvg). An isoform
whose novel junctions are all realizable on a graph haplotype path is rescued as
reference bias. This is independent evidence only if the junction set comes from
population assemblies, so it is gated by --pangenome-provenance (the same circularity
firewall as the variant axis): population/external promote, while the default
unknown (or sample_derived) is held AMBIGUOUS:
panisoguard adjudicate \
--classification classification.txt \
--isoforms-bed isoforms.bed \
--ref-gtf annotation.gtf \
--pangenome-junctions pangenome_junctions.tsv \
--pangenome-provenance population \
--out-prefix out/sampleCombine several callers first (optional) — merge isoforms by intron-chain
fingerprint into a caller-support matrix, then feed the union to adjudicate:
panisoguard combine \
--gtf flair:flair.gtf \
--gtf isoquant:isoquant.gtf \
--gtf bambu:bambu.gtf \
--ref-gtf annotation.gtf \
--out caller_support_matrix.tsv
combineis a clean re-implementation of the established N-way intron-chain comparison (it reproducesgffcompare -iexactly; multi-caller consensus is shared practice, not a PanIsoGuard invention). Its value is feeding caller agreement into the adjudicator as one auditable evidence axis. PanIsoGuard's differentiator is the reference-bias rescue + circularity firewall — see docs/relationship_to_merge_tools.md.
adjudicate writes three files at <out-prefix>:
| File | Contents |
|---|---|
<prefix>.adjudicated.tsv |
one row per isoform — confidence class, primary mechanism, novel-junction support counts |
<prefix>.attribution.jsonl |
per-isoform rule_trace (every rule that fired, in order) + graph_trace (splice-graph view — see docs/method_graph.md) + bio_flags (SQANTI3 QC descriptors passed through, verdict-neutral — see docs/relationship_to_sqanti3.md) |
<prefix>.provenance.log |
which axes were active + circularity status of the run |
# non-redundancy vs the SQANTI3 filter on novel isoforms (2x2, Jaccard, McNemar)
panisoguard benchmark [adjudicate options] --out bench/
# per-axis contribution: which calls change when an axis is removed
panisoguard ablate [adjudicate options] --axes short_read,mapping,variant --out abl/A tiny text-only dataset with checked-in expected outputs is available under
examples/tiny/. It exercises one known isoform, one
short-read-supported novel isoform, and one unsupported artifact call.
cmake --build build -j
cd examples/tiny
./run.shThe script writes output/sample.{adjudicated.tsv,attribution.jsonl,provenance.log}
and compares them against examples/tiny/expected/.
HIGH_CONF_KNOWN · HIGH_CONF_NOVEL · MEDIUM_CONF_NOVEL · LOW_CONF_PARTIAL ·
PAN_REF_RESCUED_FALSE_NOVEL · AMBIGUOUS · ARTIFACT — emitted as a
deterministic projection of a 2-axis evidence grid (novelty-support ×
artifact-mechanism). PAN_REF_RESCUED_FALSE_NOVEL is reached when a novel junction
is explained by reference bias — either a personalized haplotype (variant axis,
--reference-haplotype) or a pangenome graph path (pangenome axis,
--pangenome-junctions).
| Tier | Input | Required? | Mechanism |
|---|---|---|---|
| 0 | SQANTI3 classification (priors) + STAR SJ.tab |
recommended | short-read junction corroboration |
| 1 | BAM (HiFi/ONT) | recommended | read-level mapping (low-MAPQ / supplementary spanning-read fractions; soft-clip / indel-near also reported) |
| 2 | personalized haplotype FASTA (--reference-haplotype) |
optional | variant-created/destroyed splice-site motif |
| 3 | pangenome graph junctions (--pangenome-junctions) |
optional | all novel junctions realizable on a graph haplotype path → reference bias (provenance-gated; validated on HPRC v1.1 chr22) |
| Doc | Contents |
|---|---|
| docs/architecture.md | The five layers (CLI → readers → core data model → evidence → decision) and data flow. |
| docs/algorithm.md | The per-isoform adjudication algorithm and the decision projection. |
| docs/function_io.md | Module-by-module input → output contracts and key data types. |
| docs/decision_engine.md | The 2-axis grid, rescue precedence, and the circularity firewall. |
| docs/method_graph.md | The splice-graph framing: isoform = path, novel junction = edge, novelty = graph distance, and the graph_trace. |
| docs/relationship_to_sqanti3.md | What PanIsoGuard consumes from SQANTI3 vs does not recompute; the bio_flags pass-through. |
| docs/relationship_to_merge_tools.md | How combine relates to gffcompare / TAMA / Bambu-NDR, and where PanIsoGuard is actually differentiated. |
| docs/input_formats.md | Every input file format and its options. |
| docs/validation.md | What is verified and the truth-based validation plan. |
| CHANGELOG.md · docs/releasing.md | Changelog, and the release / bioconda runbook. |
PanIsoGuard/
|-- README.md # quick-start, repository map, and architecture map
|-- CMakeLists.txt # C++17/CMake build, install target, test wiring
|-- cmake/FindHTSlib.cmake # htslib discovery for source and conda builds
|-- config/rules.default.toml # default thresholds and rule gates
|-- include/panisoguard/ # typed module interfaces
| |-- types.hpp # Junction, IntronChain, Transcript, Evidence primitives
| |-- adjudicator.hpp # per-isoform evidence collection and decision API
| |-- rules.hpp, verdict.hpp # rule configuration, classes, mechanisms, rule traces
| |-- consensus.hpp, fingerprint.hpp, interval_index.hpp
| |-- gtf.hpp, bed12.hpp, sqanti.hpp, sj_tab.hpp, pangenome.hpp
| `-- bam_features.hpp, variant_motif.hpp, result_writer.hpp
|-- src/
| |-- cli/ # subcommands: adjudicate, combine, benchmark, ablate
| |-- io/ # SQANTI/GTF/BED12/SJ/pangenome readers + result writer
| |-- evidence/ # htslib-backed BAM and FASTA/faidx evidence axes
| `-- core/ # consensus, catalog, rule engine, verdict projection
|-- tests/
| |-- unit/ # Catch2 tests, module by module
| `-- data/tiny/ # minimal BAM/GTF/BED/SQANTI/SJ/FASTA fixtures
|-- docs/ # architecture, algorithm, input contracts, validation plan
|-- docs/figures/ # overview figure source and rendered README image
|-- workflow/ # optional Snakemake orchestration around PanIsoGuard
|-- benchmark/ # synthetic axes, truth sets, SIRV, HG002, calibration notes
|-- examples/tiny/ # 5-minute dataset with expected adjudicate outputs
|-- recipes/bioconda/ # Bioconda meta.yaml and build.sh
|-- thirdparty/ # vendored single-header components and licenses
|-- LICENSE
`-- THIRDPARTY.txt
%%{init: {"theme": "base", "themeCSS": "svg { background: #ffffff; }", "themeVariables": {"background": "#ffffff", "mainBkg": "#ffffff", "fontSize": "17px", "fontFamily": "Arial, sans-serif", "primaryTextColor": "#111827", "lineColor": "#334155", "arrowheadColor": "#334155"}, "flowchart": {"htmlLabels": true, "curve": "linear", "nodeSpacing": 24, "rankSpacing": 32}}}%%
flowchart LR
CLI["<b>CLI</b><br/>adjudicate | combine<br/>benchmark | ablate"]
Iso["<b>Isoform inputs</b><br/>GTF/BED12<br/>SQANTI3 classification"]
Context["<b>Evidence context</b><br/>reference GTF | SJ.tab<br/>BAM/CRAM | FASTA | graph TSV"]
Rules["<b>Rule gates</b><br/>rules.default.toml"]
Normalize["<b>1. Normalize</b><br/>src/io readers<br/>typed transcript + junction models"]
Consensus["<b>2. Merge callers</b><br/>intron-chain fingerprints<br/>caller support matrix"]
Evidence["<b>3. Build evidence</b><br/>SQANTI3 QC<br/>short‑read SJ<br/>Long Read BAM mapping<br/>Variant/Haplotype<br/>Pangenome"]
Decide["<b>4. Decide</b><br/>EvidenceVector to RuleEngine<br/>class + mechanism + trace"]
Outputs["<b>Outputs</b><br/>*.adjudicated.tsv<br/>*.attribution.jsonl | *.provenance.log<br/>caller_support_matrix.tsv"]
CLI --> Normalize
Iso --> Normalize
Normalize --> Consensus
Consensus --> Evidence
Evidence --> Decide
Decide --> Outputs
Context --> Evidence
Rules --> Decide
Consensus -.-> Outputs
classDef command fill:#fff7e6,stroke:#b7791f,stroke-width:1.8px,color:#3a2500,font-size:17px;
classDef input fill:#edf6ff,stroke:#2f6fa8,stroke-width:1.8px,color:#0f2438,font-size:17px;
classDef process fill:#eefaf1,stroke:#2f855a,stroke-width:1.8px,color:#102a16,font-size:17px;
classDef evidence fill:#f5f0ff,stroke:#6b46c1,stroke-width:2px,color:#241447,font-size:17px;
classDef decision fill:#fff1f1,stroke:#c53030,stroke-width:2.2px,color:#3b0d0d,font-size:17px;
classDef output fill:#edfafa,stroke:#2c7a7b,stroke-width:1.8px,color:#0f2f2f,font-size:17px;
class CLI command;
class Iso,Context,Rules input;
class Normalize,Consensus process;
class Evidence evidence;
class Decide decision;
class Outputs output;
linkStyle default stroke:#334155,stroke-width:3.5px;
Requires a C++17 compiler, CMake ≥ 3.20, and htslib ≥ 1.18.
git clone <repo> && cd PanIsoGuard
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
ctest --test-dir build # unit suite
./build/panisoguard --versionhtslib is discovered from $CONDA_PREFIX; override with -DCMAKE_PREFIX_PATH=/prefix.
A bioconda recipe is provided under recipes/bioconda/. Once
released:
conda install -c bioconda -c conda-forge panisoguardBuild a self-contained image (the C++ binary + the optional PDF report tool) — works today without waiting on the conda release:
docker build -t panisoguard .
docker run --rm panisoguard panisoguard --version
docker run --rm -v "$PWD":/data -w /data panisoguard \
adjudicate --classification cls.txt --isoforms-gtf iso.gtf --ref-gtf ref.gtf --out-prefix runexamples/multi_caller/run.sh # combine 3 callers -> consensus verdict, on tiny fixtures (<1 s)
examples/tiny/run.sh # single-caller adjudicationThe optional PDF report (SQANTI3-style) is a Python companion — pip install ./python,
then panisoguard-report --prefix run (see python/).
Single-threaded, on a whole-genome isoform set (GRCh38 + GENCODE v49):
| Step | Wall | Peak RAM |
|---|---|---|
| reference catalog (GENCODE v49) | ~3 s | ~0.34 GB |
adjudicate (SQANTI priors + short-read SJ) |
~5 s | ~0.55 GB |
| + variant axis (faidx motif) | ~40 s | ~0.6 GB |
| + BAM mapping axis | ~1.6 min | ~0.6 GB |
The upstream callers + alignment dominate end-to-end time; PanIsoGuard's own adjudication is the fast tail.
workflow/ provides a Snakemake pipeline that runs the upstream
callers in parallel over a single shared alignment and pipes into PanIsoGuard
(combine + adjudicate). It is a thin convenience wrapper, not the core tool.
See docs/validation.md for what is verified and the truth-based validation plan (SQANTI-SIM, HG002/HPRC, LRGASP).
PanIsoGuard is MIT-licensed — see LICENSE.
It bundles three third-party single-header components under thirdparty/, with
their license texts included: cgranges (IITree.h, MIT), toml++ (MIT), and
Catch2 (Boost Software License 1.0, test-only). See THIRDPARTY.txt
for attribution. The effective combined license of the redistributed source is
MIT AND BSL-1.0. htslib is a dynamically-linked dependency, not vendored.
