Skip to content
@omics-rust

omics-rust

omics-rust

Reimplementing the bioinformatics toolchain in Rust — fail-fast, fail-loud, single-binary, easy to install.

crates.io MIT OR Apache-2.0 Rust edition 2024 MSRV 1.91


Why this exists

Most upstream bioinformatics tools are single-threaded, memory-inefficient, written in 2005-era C or pure R, and waste modern multicore + SIMD + GPU resources. omics-rust ports them, one operation at a time, to Rust — under a strict release contract:

No crate is published until its hot-path benches show strictly > 1.0× throughput vs the upstream reference on the same machine, same input, same flags. Equal-to-upstream is a failure.

If a rsomics-* crate is on crates.io, it has beaten its named C/Python/R counterpart on a recorded benchmark with checked-in provenance.

How to use

cargo install rsomics-<name>
rsomics-<name> --help

Each tool is a single binary with --help modeled after its upstream counterpart's flag set. You can drop rsomics-bam-sort into a shell script that calls samtools sort and most invocations work unchanged.

What's here

Each tool lives in its own repo under this org, with its own CI as the authoritative gate. Names follow rsomics-<format>-<operation> for tools (e.g. rsomics-bam-sort, rsomics-vcf-merge) and rsomics-<topic> for shared foundation libraries (rsomics-bamio, rsomics-intervals, rsomics-pileup, rsomics-bbi, …).

The two-layer split:

  • Foundation (library-only) — shared primitives like BAM I/O, BED intervals, k-mers, FM-index, alignment cores, statistical kernels.
  • Tools (one binary each, one operation each)samtools view / samtools sort / samtools merge / bcftools call / bedtools intersect / … are each their own crate. We don't ship Swiss-Army-knife multitools; one upstream subcommand maps to one crate, scriptable on its own.

Browse the pinned repos below or cargo search rsomics- to see the current set.

Discipline (the part we don't shortcut)

  • Per-tool perfgate. Every release-tagged crate has an .autopilot/state/perf-<tool>-*.md with machine identity, fixture sha256, hyperfine output, ratio vs upstream. No numbers without provenance.
  • Per-tool compat.rs. Every release-tagged crate has a tests/compat.rs that runs the upstream binary on a canonical fixture and asserts byte-or-field-level equality.
  • Clean-room for GPL upstreams. When porting from a GPL tool (bowtie2, HISAT2, MEGAHIT, Trimmomatic, RSeQC, …) we reimplement from paper + format spec + black-box observation only. License stays MIT OR Apache-2.0; the README's ## Origin section credits the upstream honestly.
  • No defensive programming. Errors propagate to main, exit non-zero, message on stderr. Wrong output is worse than a crash in bioinformatics — we bail rather than ship a wrong VCF.
  • Cross-platform. First-class targets: x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu, x86_64-apple-darwin, aarch64-apple-darwin. CI runs all four on every release tag.

Contributing

Pick a tool you'd like ported, open an issue against its rsomics-<name> repo (or the umbrella rsomics-world for cross-cutting work), and we'll triage the perfgate target and origin methodology together. New crates must arrive with tests/compat.rs + a recorded perfgate that meets the > 1.0× bar — see CONVENTIONS.md.

License

Each crate is dual-licensed MIT OR Apache-2.0. Pick whichever fits your downstream.

Popular repositories Loading

  1. rsomics-world rsomics-world Public

    Shell

  2. rsomics-common rsomics-common Public

    Shared primitives for every rsomics-* crate (errors, CLI scaffold, runner, progress, exit codes).

    Rust

  3. rsomics-align-core rsomics-align-core Public

    Pairwise sequence alignment kernels (Smith-Waterman + Needleman-Wunsch, affine gap) for the rsomics-* tool family. Layer A primitive.

    Rust

  4. rsomics-debruijn rsomics-debruijn Public

    de Bruijn graph types + linear-path collapse + unitig extraction for the rsomics-* tool family. Layer A primitive.

    Rust

  5. rsomics-fm-index rsomics-fm-index Public

    FM-index over BWT + suffix array, with backward search / count / locate. Layer A primitive for the rsomics-* tool family.

    Rust

  6. rsomics-fqgz rsomics-fqgz Public

    Chunked parallel-libdeflate gzip (or plain) FASTQ-record writer. Layer-A primitive shared by the rsomics-* fastq-* tools.

    Rust

Repositories

Showing 10 of 212 repositories

Top languages

Loading…

Most used topics

Loading…