Reimplementing the bioinformatics toolchain in Rust — fail-fast, fail-loud, single-binary, easy to install.
Most upstream bioinformatics tools are single-threaded, memory-inefficient, written in 2005-era C or pure R, and waste modern multicore + SIMD + GPU resources. omics-rust ports them, one operation at a time, to Rust — under a strict release contract:
No crate is published until its hot-path benches show strictly
> 1.0×throughput vs the upstream reference on the same machine, same input, same flags. Equal-to-upstream is a failure.
If a rsomics-* crate is on crates.io, it has beaten its named C/Python/R
counterpart on a recorded benchmark with checked-in provenance.
cargo install rsomics-<name>
rsomics-<name> --helpEach tool is a single binary with --help modeled after its upstream
counterpart's flag set. You can drop rsomics-bam-sort into a shell script
that calls samtools sort and most invocations work unchanged.
Each tool lives in its own repo under this org, with its own CI as the
authoritative gate. Names follow rsomics-<format>-<operation> for tools
(e.g. rsomics-bam-sort, rsomics-vcf-merge) and rsomics-<topic> for
shared foundation libraries (rsomics-bamio, rsomics-intervals,
rsomics-pileup, rsomics-bbi, …).
The two-layer split:
- Foundation (library-only) — shared primitives like BAM I/O, BED intervals, k-mers, FM-index, alignment cores, statistical kernels.
- Tools (one binary each, one operation each) —
samtools view/samtools sort/samtools merge/bcftools call/bedtools intersect/ … are each their own crate. We don't ship Swiss-Army-knife multitools; one upstream subcommand maps to one crate, scriptable on its own.
Browse the pinned repos below or cargo search rsomics- to see the
current set.
- Per-tool perfgate. Every release-tagged crate has an
.autopilot/state/perf-<tool>-*.mdwith machine identity, fixture sha256, hyperfine output, ratio vs upstream. No numbers without provenance. - Per-tool compat.rs. Every release-tagged crate has a
tests/compat.rsthat runs the upstream binary on a canonical fixture and asserts byte-or-field-level equality. - Clean-room for GPL upstreams. When porting from a GPL tool
(bowtie2, HISAT2, MEGAHIT, Trimmomatic, RSeQC, …) we reimplement from
paper + format spec + black-box observation only. License stays
MIT OR Apache-2.0; the README's
## Originsection credits the upstream honestly. - No defensive programming. Errors propagate to
main, exit non-zero, message on stderr. Wrong output is worse than a crash in bioinformatics — we bail rather than ship a wrong VCF. - Cross-platform. First-class targets:
x86_64-unknown-linux-gnu,aarch64-unknown-linux-gnu,x86_64-apple-darwin,aarch64-apple-darwin. CI runs all four on every release tag.
Pick a tool you'd like ported, open an issue against its rsomics-<name>
repo (or the umbrella rsomics-world
for cross-cutting work), and we'll triage the perfgate target and origin
methodology together. New crates must arrive with tests/compat.rs + a
recorded perfgate that meets the > 1.0× bar — see
CONVENTIONS.md.
Each crate is dual-licensed MIT OR Apache-2.0. Pick whichever fits your downstream.