cuSBF

Overview

cuSBF is a high-performance GPU implementation of the Super Bloom filter, optimized for high-throughput batch k-mer insertion and query on nucleotide (DNA) and protein sequences (or any other sequence type as long as a valid alphabet is provided).

It exploits the streaming nature of sequence-derived k-mers by using minimizers to group consecutive k-mers sharing the same minimiser into super-k-mers, assigning all k-mers of a super-k-mer to the same 256-bit memory shard. This amortizes random memory accesses across consecutive k-mer queries, reducing memory-bandwidth pressure. The findere scheme further reduces false positives dramatically by inserting overlapping s-mers and requiring a full run of consecutive s-mer matches.

Features

CUDA-accelerated batch k-mer insert and query from sequences
Configurable k-mer length, minimiser width, s-mer width, and hash function count
Minimizer-based shard selection for cache-efficient streaming queries
Findere false-positive reduction via overlapping s-mer membership
Header-only library design
FASTA/FASTQ stream and file support

Performance

Benchmarks (Config<31, 28, 16, 4>) comparing cuSBF against NVIDIA's cuco bloom_filter on an NVIDIA RTX PRO 6000 Blackwell GPU with random DNA sequence inputs. Throughput is reported in billions of k-mers per second (Gk-mers/s). Timings include GPU-side sequence encoding for cuco, so both methods consume the same input sequence.

Dataset Size	Operation	cuSBF	cuco Bloom	Speed-up
~4M k-mers	Insert	62.9 Gk-mers/s	31.5 Gk-mers/s	2.0×
~4M k-mers	Query	109 Gk-mers/s	44.2 Gk-mers/s	2.5×
~268M k-mers	Insert	46.6 Gk-mers/s	5.9 Gk-mers/s	7.9×
~268M k-mers	Query	101 Gk-mers/s	13.2 Gk-mers/s	7.7×

The findere scheme (s-mer width) provides strong false-positive reduction at equivalent memory. For Config<31, 28, 16, 4> on ~4.6M inserted k-mers queried against 10⁹ random k-mers:

Bits/k-mer	cuSBF FPR	cuco Bloom FPR
58	0.0035%	0.064%
116	0.00036%	0.017%
231	0.000092%	0.0054%

Benchmarks can be reproduced with:

./build/benchmarks/gpu-filter-comparison --benchmark_filter="CuSBF"
./build/benchmarks/fpr-fastx-sweep

Requirements

CUDA Toolkit >= 13.1
C++20 compatible host compiler
Meson build system
NVIDIA GPU with compute capability 8.0+ (Ampere, Lovelace, Hopper, Blackwell)

Building

meson setup build
ninja -C build

Benchmarks and tests are built automatically when this is a standalone project, but skipped when used as a subproject. Control with Meson feature options:

Option	Behaviour
`-Dbenchmarks=auto` (default)	Build benchmarks when standalone, skip when subproject
`-Dbenchmarks=enabled`	Always build benchmarks
`-Dbenchmarks=disabled`	Never build benchmarks
`-Dtests=auto` (default)	Build tests when standalone, skip when subproject
`-Dtests=enabled`	Always build tests
`-Dtests=disabled`	Never build tests
`-Dexamples=auto` (default)	Build examples when standalone, skip when subproject
`-Dexamples=enabled`	Always build examples
`-Dexamples=disabled`	Never build examples
`-Dparam_sweep=disabled` (default)	Never build parameter-sweep benchmark (s, m) for DNA or protein
`-Dparam_sweep=enabled`	Build parameter-sweep benchmark (s, m) for DNA or protein
`-Dparam_sweep_alphabet=dna`	Alphabet for param_sweep: `dna` or `protein`

Important

The parameter sweep is disabled by default for a reason, there are 208 binaries for the entire sweep when using the DNA alphabet.

Examples:

# Standalone: build everything except parameter sweep (default)
meson setup build

# Standalone: skip benchmarks and tests
meson setup build -Dbenchmarks=disabled -Dtests=disabled

# Subproject: force benchmarks and tests on
meson setup build -Dbenchmarks=enabled -Dtests=enabled

# Parameter-sweep benchmark (DNA, default)
meson setup build -Dparam_sweep=enabled

# Parameter-sweep benchmark (protein)
meson setup build -Dparam_sweep=enabled -Dparam_sweep_alphabet=protein

Usage

#include <cusbf/BloomFilter.cuh>

// Configure the filter: k-mer length, s-mer width, minimizer width, hash count
using Config = cusbf::Config<31, 28, 16, 4>;

// Create a filter with the desired capacity (in bits)
cusbf::Filter<Config> filter(1 << 24);  // ~16M bits

// Insert k-mers from a DNA sequence (synchronous)
filter.insertSequence("ACGTACGTACGTACGTACGTACGTACGTACGT");

// Query k-mers (returns vector<uint8_t>, 1 = present, 0 = absent)
auto hits = filter.containsSequence("ACGTACGTACGTACGTACGTACGTACGTACGT");

// Device-resident API (async, no synchronization)
thrust::device_vector<char> d_seq = ...;
thrust::device_vector<uint8_t> d_results(numKmers);
filter.insertSequenceDevice(cusbf::device_span<const char>(d_seq));
filter.containsSequenceDevice(
    cusbf::device_span<const char>(d_seq),
    cusbf::device_span<uint8_t>(d_results)
);

// Dense record batch API: one concatenated payload plus explicit record ranges.
std::string batchSequence = "ACGTACGTTGCATGCA";
std::vector<cusbf::RecordRange> batchRanges{
    {0, 8},
    {8, 8},
};
auto batchSummary = filter.queryRecordBatch(
    {
        batchSequence,
        cuda::std::span<const cusbf::RecordRange>{batchRanges.data(), batchRanges.size()},
    },
    [](const cusbf::RecordQueryView& record) {
        // record.sequence is the original record, record.hits is one bool per k-mer
    }
);

// FASTA/FASTQ file insertion and aggregate query (chunked internally for large inputs)
filter.insertFastxFile("reference.fasta");
auto summary = filter.queryFastxFile("queries.fastq");

// Streaming FASTX query emits each record as soon as its chunk completes
auto streamed = filter.queryFastxFileRecords(
    "queries.fastq",
    [](const cusbf::FastxRecordView& record) {
        // record.header, record.sequence, record.queriedKmers, record.hits
    }
);

// Detailed FASTX query keeps owning per-record copies in source order
auto detailed = filter.queryFastxFileDetailed("queries.fastq");

// Inspect filter state
double load = filter.loadFactor();  // fraction of set bits
uint64_t bits = filter.filterBits();

Configuration Options

The Config template accepts the following parameters:

Parameter	Description	Default
`K`	k-mer length (max depends on alphabet)	-
`S`	s-mer width for findere Bloom hash seed (1-K)	-
`M`	Minimiser width for shard selection (1-K)	-
`HashCount`	Number of independent Bloom hash functions (1-16)	4
`CudaBlockSize`	CUDA threads per block	256
`Alphabet`	Symbol encoding (DNA or protein)	`DnaAlphabet`

Protein Alphabet Support

#include <cusbf/BloomFilter.cuh>

using ProteinConfig = cusbf::Config<12, 10, 6, 4, 256, cusbf::ProteinAlphabet>;
cusbf::Filter<ProteinConfig> filter(1 << 24);

filter.insertSequence("ACDEFGHIKLMNPQRSTVWY");
auto hits = filter.containsSequence("ACDEFGHIKLMNPQRSTVWY");

Related Publications

E. Conchon-Kerjan, T. Rouzé, L. Robidou, F. Ingels, and A. Limasset, “Super Bloom: Fast and precise filter for streaming k-mer queries,” bioRxiv, 2026, doi: 10.64898/2026.03.17.712354.
D. Jünger, K. Kristensen, Y. Wang, X. Yu, and B. Schmidt, “Optimizing Bloom Filters for Modern GPU Architectures.” 2025. [Online]. Available: https://arxiv.org/abs/2512.15595

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
.vscode		.vscode
benchmarks		benchmarks
examples		examples
include/cusbf		include/cusbf
scripts		scripts
subprojects		subprojects
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
meson.build		meson.build
meson.options		meson.options
spack.lock		spack.lock
spack.yaml		spack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuSBF

Overview

Features

Performance

Requirements

Building

Usage

Configuration Options

Protein Alphabet Support

Related Publications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cuSBF

Overview

Features

Performance

Requirements

Building

Usage

Configuration Options

Protein Alphabet Support

Related Publications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages