Skip to content

ORE v2: variable-length & string ORE, v2 wire format, SIMD + constant-time hardening (integration branch)#90

Draft
coderdan wants to merge 2 commits into
mainfrom
ore-v2
Draft

ORE v2: variable-length & string ORE, v2 wire format, SIMD + constant-time hardening (integration branch)#90
coderdan wants to merge 2 commits into
mainfrom
ore-v2

Conversation

@coderdan

@coderdan coderdan commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

ORE v2 — the next generation of order-revealing encryption

This is the integration / forthcoming-release branch for the ORE v2 program. It
is the umbrella PR that the stacked feature PRs land into; once the stack is merged
and the remaining crypto-review gates clear, this branch ships the whole v2 change
set to main.

Architecture & rationale: docs/plans/2026-06-12-ore-v2-architecture.md.

Why v2

ORE v1 is the Lewi–Wu block-ORE scheme at an 8-bit block width (Bit8), fixed to
a single wire format and a single fixed-length numeric domain. v2 keeps that scheme
byte-for-byte (it stays the wire-frozen default) and builds a broader, harder, and
more flexible foundation around it:

  • Variable-length & string ORE. A new chained-prefix scheme encrypts
    arbitrary-length values — strings (lexicographic order matching str::cmp),
    and a path to u128/Decimal — not just fixed-width integers.
  • A real v2 wire format. A versioned header (scheme id + parameters) so multiple
    schemes and block widths coexist and evolve without ambiguity. v1 ciphertexts are
    untouched; v2 is additive.
  • Block width as a leakage decision. Bit6 (6-bit blocks) joins Bit8 as an opt-in
    width. Width is a per-domain/per-deployment trade-off (leakage vs cost), not a
    global default — default numerics stay on Bit8 (lower online prefix leakage +
    wire compatibility); Bit6 is opt-in for at-rest-dominated / size-sensitive cases.
  • Performance. Hardware AES on aarch64, NEON + AVX2 kernels for the
    right-ciphertext encoder, an allocation-free encrypt path, and a vectorised
    GF(2¹²⁸) σ-doubling in the hash. Numbers below.
  • Security hardening, with receipts. A plaintext-dependent PRP timing channel
    closed, the comparator made constant-time at cache-line (and MemJam sub-line)
    granularity, key-derived material zeroized on drop, and a fixed-key-AES MMO hash
    adopted for the 1-bit hash — each backed by a written review on this branch.

What's in v2 (delivered via the stacked PRs)

The feature work lands as a stack that merges into this branch:

Crypto building blocks introduced along the way:

  • σ-MMO 1-bit hash (FixedPiZ2Hash): H(x,r) = LSB(π(σ(x)⊕r) ⊕ σ(x)⊕r) with π
    a fixed public-key AES-128 permutation (nothing-up-my-sleeve PI_KEY) and σ a
    GF(2¹²⁸) doubling — the BHKR/GKWY fixed-key-AES MMO construction. Security rests on
    AES as a public random permutation, so the comparator needs no key material.
  • AES-CMAC accumulator (NIST SP 800-38B validated): incremental prefix-block /
    final-block absorption for the variable-length scheme. Encryption-side only — the
    comparator never evaluates CMAC.
  • LemireFyPrp (fixed-draw Fisher–Yates, wide 64-bit draws + Lemire
    multiply-high): replaces the rejection-sampled Knuth shuffle for the new schemes,
    closing a timing channel where the rejection draw-count depended on the
    plaintext-derived seed. (Bit8 stays on the Knuth shuffle — wire-frozen; the channel
    is documented, not changed.)

Performance

The headline: encrypting a value is ~1.6× faster on the v1-compatible format, and
up to ~4.6× faster on the new Bit6 scheme.
Comparison — the operation that runs
server-side at query time — was already fast and is essentially unchanged.

Same workload (encrypt a u64), same machine (Apple M1 Max), hardware AES on for
both
so this isolates the rewrite from the backend:

encrypt a u64 vs v1
v1 (Bit8) ~39 µs
v2, same wire-compatible scheme (Bit8) ~25 µs ~1.6×
v2, new opt-in scheme (Bit6) ~8.6 µs ~4.6×

Separately — hardware AES is now on by default. On Apple Silicon the aes crate
needs --cfg aes_armv8 for ARMv8 hardware AES; without it (the old default) every block
ran in software, ~60× slower per block — worth ~10× end to end. v2 sets it for workspace
builds (downstream crates must set it themselves, announced on release). So "old default
as it actually ran" (≈381 µs) vs v2 is ~15× (Bit8) / ~44× (Bit6) — but most of that first
~10× is the flag, not the rewrite, so it shouldn't be read as a speedup from the new code.

Comparison stays flat and fast: ~230 ns at Bit8, ~180 ns at Bit6, ~400 ns for a
23-block string — a constant-time prefix scan plus one hash eval and one oblivious block
read. The rewrite didn't target it; hardware AES alone took it from ~740 ns to ~230 ns.

Criterion medians on a developer machine — indicative, not a regression gate. Full
baseline, per-PR deltas, the SIMD kernels, and the variable-length string scaling
(~0.69 µs/block) are in docs/benchmarks/.

Security reviews (carried on this branch)

docs/reviews/2026-06-16-*:

CI (test.yml) on this branch runs on every pull request (not only PRs into
main), so the whole stack gets GitHub checks while it targets feature branches.

Backwards compatibility

v1 (Bit8) is the wire-frozen default and is unchanged byte-for-byte (the #78
compatibility vectors pin it). v2 is purely additive — new schemes, new widths, new
header — selected explicitly. (A follow-up will split the scheme trait so wire-format
back-compat is enforced at the type level.)

Status

Draft. This is the aggregation point for #78#83. Remaining gates before it can go
to main: the stack merges in, the A2/A3 CMAC + PRP-shape crypto review clears, and
the Bit6/chained wire vectors are pinned. Flip to ready once those land.

coderdan added 2 commits June 16, 2026 18:27
Review/validation artifacts from the v2 crypto review (zeroize audits for the
ZA-0001 fix and PRs 82/83, the constant-time v2-applicability analysis, and the
pl/pgSQL + pgcrypto validation of the v2 sigma-MMO comparison). ore-v2 is the
forthcoming v2 release/integration branch; the stack rebases on top of this.
Drops the `branches: [main]` filter on the pull_request trigger so stacked PRs
that target a feature/integration branch (the ore-v2 stack) run CI too. Lands on
ore-v2 first so the whole rebased stack inherits it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant