cuSBF

Overview

cuSBF is a high-performance GPU implementation of the Super Bloom filter, optimized for high-throughput batch k-mer insertion and query on nucleotide (DNA) and protein sequences (or any other sequence type as long as a valid alphabet is provided).

It exploits the streaming nature of sequence-derived k-mers by using minimizers to group consecutive k-mers sharing the same minimiser into super-k-mers, assigning all k-mers of a super-k-mer to the same 256-bit memory shard. This amortizes random memory accesses across consecutive k-mer queries, reducing memory-bandwidth pressure. The findere scheme further reduces false positives dramatically by inserting overlapping s-mers and requiring a full run of consecutive s-mer matches.

Features

CUDA-accelerated batch k-mer insert and query from sequences
Configurable k-mer length, minimiser width, s-mer width, and hash function count
Minimizer-based shard selection for cache-efficient streaming queries
Findere false-positive reduction via overlapping s-mer membership
Header-only library design
FASTA/FASTQ stream and file support

Performance

Benchmarks use Config<31, 28, 16, 4> on an NVIDIA RTX PRO 6000 Blackwell GPU. CPU Super Bloom runs on an Intel Xeon W9-3595X with 120 threads.

Compared against:

Smaller Filter (C. elegans, ~100M k-mers)

Comparison	Insert	Query
cuSBF vs Super Bloom	92× faster	234× faster
cuSBF vs GBBF	9.1× faster	7.7× faster
cuSBF vs Cuckoo-GPU	80× faster	8.0× faster
cuSBF vs TCF	12× faster	52× faster
cuSBF vs GQF	69× faster	13× faster

Large Filter (CHM13, ~3.1B k-mers)

Comparison	Insert	Query
cuSBF vs Super Bloom	59× faster	165× faster
cuSBF vs GBBF	8.2× faster	7.6× faster
cuSBF vs Cuckoo-GPU	3427× faster	7.8× faster
cuSBF vs TCF	12× faster	67× faster
cuSBF vs GQF	42× faster	11× faster

False Positive Rate

Bits/k-mer	cuSBF `s=28`	cuSBF `s=30`	cuSBF `s=31`	GBBF
21.4	0.848%	0.951%	1.593%	3.069%
85.7	0.091%	0.107%	0.210%	0.126%
342.6	0.0095%	0.0114%	0.0264%	0.0273%

Requirements

Linux (x86_64 or aarch64) with an NVIDIA GPU and driver
CUDA Toolkit >= 13.1
GCC or Clang host compiler (C++20)
Meson and Ninja
NVIDIA GPU with compute capability 8.0+ (Ampere, Lovelace, Hopper, Blackwell)

Platform support

cuSBF is developed and tested on Linux only.

WSL2 on Windows with is a reasonable dev environment (See NVIDIA docs).
Native Windows and macOS are not supported or tested. The build uses Linux-specific FASTX paths (for example mmap) and host tooling assumptions (GCC/Clang, GNU statement expressions in CUSBF_TRY/CUSBF_UNWRAP).

Building

meson setup build
ninja -C build

When this repo is the root Meson project, benchmarks, tests, and examples build by default. As a subproject they are skipped unless you force them on.

Option	Type	Default	Description
`benchmarks`	feature	`auto`	Google Benchmark binaries
`tests`	feature	`auto`	GoogleTest suite
`examples`	feature	`auto`	Example CLI
`param_sweep`	feature	`disabled`	Parameter-sweep binaries (large, see below)
`param_sweep_alphabet`	combo	`dna`	`dna` or `protein` when `param_sweep` is enabled
`large_fastx_tests`	feature	`disabled`	Large generated FASTX test (`CUSBF_LARGE_FASTX_*` env vars)

Each feature option accepts auto, enabled, or disabled:

auto — on for a standalone checkout, off when cuSBF is a subproject
enabled / disabled — override regardless of project layout

Important

Enabling param_sweep builds many binaries (208 for the DNA alphabet). Leave it disabled unless you need that sweep.

# Default standalone build
meson setup build

# Faster configure: library + examples only
meson setup build -Dbenchmarks=disabled -Dtests=disabled

# Subproject consumer forcing tests on
meson setup build -Dtests=enabled

# Parameter sweep
meson setup build -Dparam_sweep=enabled
meson setup build -Dparam_sweep=enabled -Dparam_sweep_alphabet=protein

Usage

Fallible APIs return cusbf::Result<T> (a thin wrapper over cuda::std::expected<T, Error>). Use return Err(error) (cuda::std::unexpected<Error>, deduces Result<T>) or return Ok() / return {} for Result<void>. For success with a value, return value is enough. Two helpers unwrap results:

Macro	On failure	Use when
`CUSBF_TRY(expr)`	Copies the error, then `return cuda::std::unexpected<Error>(...)` from the enclosing function	The caller returns `Result` (library glue, `examples/cusbf-main`)
`CUSBF_UNWRAP(expr)`	`throw std::runtime_error(message())`	Tests, `main`, or other code that does not return `Result`

Both work as statements or in initializers (auto x = CUSBF_UNWRAP(...)). For full control (typed errors, exit codes), use if (!result) instead.

Quick example (`CUSBF_UNWRAP`)

#include <cusbf/filter.cuh>

using Config = cusbf::Config<31, 28, 16, 4>;

int main() {
    cusbf::filter<Config> filter(1 << 24);

    CUSBF_UNWRAP(filter.insert_sequence("ACGTACGTACGTACGTACGTACGTACGTACGT"));
    const auto hits = CUSBF_UNWRAP(filter.contains_sequence("ACGTACGTACGTACGTACGTACGTACGTACGT"));

    CUSBF_UNWRAP(filter.insert_fastx_file("reference.fasta"));
    const auto summary = CUSBF_UNWRAP(filter.query_fastx_file("queries.fastq"));

    (void)hits;
    (void)summary;
    return 0;
}

Propagating errors (`CUSBF_TRY`)

When the caller already returns Result, use CUSBF_TRY so failures propagate without exceptions:

[[nodiscard]] cusbf::Result<void> run(cusbf::filter<Config>& filter) {
    CUSBF_TRY(filter.insert_fastx_file("reference.fasta"));
    const auto summary = CUSBF_TRY(filter.query_fastx_file("queries.fastq"));
    (void)summary;
    return cusbf::Ok();
}

Async device APIs, record batches, and streaming FASTX callbacks follow the same pattern. filter.load_factor() and filter.filter_bits() are synchronous and do not return Result.

Inspecting errors

if (const auto result = filter.query_fastx_file("queries.fastq"); !result) {
    const cusbf::Error& err = result.error();
    std::cerr << err.message() << '\n';
    if (const cusbf::FastxParseError* parse = err.as_fastx_parse()) {
        // parse->location.file / .line / .column
    }
    return 1;
}

CUSBF_CUDA_TRY wraps CUDA runtime calls into Result<void>; CUSBF_CUDA_CALL / CUSBF_CUDA_ABORT are for throw/abort paths only.

Configuration Options

The Config template accepts the following parameters:

Parameter	Description	Default
`K`	k-mer length (max depends on alphabet)	-
`S`	s-mer width for findere Bloom hash seed (1-K)	-
`M`	Minimiser width for shard selection (1-K)	-
`HashCount`	Number of independent Bloom hash functions (4,8,12,16)	4
`CudaBlockSize`	CUDA threads per block	256
`Alphabet`	Symbol encoding (DNA or protein)	`DnaAlphabet`

Protein Alphabet Support

#include <cusbf/filter.cuh>

using ProteinConfig = cusbf::Config<12, 10, 6, 4, 256, cusbf::ProteinAlphabet>;

[[nodiscard]] cusbf::Result<void> run_protein() {
    cusbf::filter<ProteinConfig> filter(1 << 24);
    CUSBF_TRY(filter.insert_sequence("ACDEFGHIKLMNPQRSTVWY"));
    const auto hits = CUSBF_TRY(filter.contains_sequence("ACDEFGHIKLMNPQRSTVWY"));
    (void)hits;
    return cusbf::Ok();
}

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.vscode		.vscode
benchmarks		benchmarks
docs		docs
examples		examples
include/cusbf		include/cusbf
scripts		scripts
subprojects		subprojects
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
meson.build		meson.build
meson.options		meson.options
spack.lock		spack.lock
spack.yaml		spack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cuSBF

Overview

Features

Performance

Smaller Filter (C. elegans, ~100M k-mers)

Large Filter (CHM13, ~3.1B k-mers)

False Positive Rate

Requirements

Platform support

Building

Usage

Quick example (`CUSBF_UNWRAP`)

Propagating errors (`CUSBF_TRY`)

Inspecting errors

Configuration Options

Protein Alphabet Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cuSBF

Overview

Features

Performance

Smaller Filter (C. elegans, ~100M k-mers)

Large Filter (CHM13, ~3.1B k-mers)

False Positive Rate

Requirements

Platform support

Building

Usage

Quick example (CUSBF_UNWRAP)

Propagating errors (CUSBF_TRY)

Inspecting errors

Configuration Options

Protein Alphabet Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Quick example (`CUSBF_UNWRAP`)

Propagating errors (`CUSBF_TRY`)

Packages