ORE v2: variable-length & string ORE, v2 wire format, SIMD + constant-time hardening (integration branch) by coderdan · Pull Request #90 · cipherstash/ore.rs

coderdan · 2026-06-16T10:17:36Z

ORE v2 — the next generation of order-revealing encryption

This is the integration / forthcoming-release branch for the ORE v2 program. It
is the umbrella PR that the stacked feature PRs land into; once the stack is merged
and the remaining crypto-review gates clear, this branch ships the whole v2 change
set to main.

Architecture & rationale: docs/plans/2026-06-12-ore-v2-architecture.md.

Why v2

ORE v1 is the Lewi–Wu block-ORE scheme at an 8-bit block width (Bit8), fixed to
a single wire format and a single fixed-length numeric domain. v2 keeps that scheme
byte-for-byte (it stays the wire-frozen default) and builds a broader, harder, and
more flexible foundation around it:

Variable-length & string ORE. A new chained-prefix scheme encrypts
arbitrary-length values — strings (lexicographic order matching str::cmp),
and a path to u128/Decimal — not just fixed-width integers.
A real v2 wire format. A versioned header (scheme id + parameters) so multiple
schemes and block widths coexist and evolve without ambiguity. v1 ciphertexts are
untouched; v2 is additive.
Block width as a leakage decision. Bit6 (6-bit blocks) joins Bit8 as an opt-in
width. Width is a per-domain/per-deployment trade-off (leakage vs cost), not a
global default — default numerics stay on Bit8 (lower online prefix leakage +
wire compatibility); Bit6 is opt-in for at-rest-dominated / size-sensitive cases.
Performance. Hardware AES on aarch64, NEON + AVX2 kernels for the
right-ciphertext encoder, an allocation-free encrypt path, and a vectorised
GF(2¹²⁸) σ-doubling in the hash. Numbers below.
Security hardening, with receipts. A plaintext-dependent PRP timing channel
closed, the comparator made constant-time at cache-line (and MemJam sub-line)
granularity, key-derived material zeroized on drop, and a fixed-key-AES MMO hash
adopted for the 1-bit hash — each backed by a written review on this branch.

What's in v2 (delivered via the stacked PRs)

The feature work lands as a stack that merges into this branch:

ORE v2 (1/n): compatibility vectors + bench baselines #78 — compatibility vectors + bench baselines (pins v1 behaviour before refactor)
ORE v2 (2/n): core refactor — width abstraction, seed/tag separation, template RO keys #79 — core refactor: width abstraction, seed/tag separation, template RO keys
ORE v2 (3/n): efficient unary encoding + hardware AES on aarch64 #80 — efficient unary encoding + allocation-free encrypt + hardware AES on aarch64
ORE v2 (4/n): NEON + AVX2 SIMD backends #81 — NEON + AVX2 SIMD backends for the indicator / hash-bit masks
ORE v2 (5/n): 6-bit block scheme + v2 wire format #82 — Bit6 6-bit block scheme + v2 wire header
ORE v2 (6/n): chained-prefix variable-length scheme + string encryption #83 — chained-prefix variable-length scheme + string encryption (CMAC accumulator)

Crypto building blocks introduced along the way:

σ-MMO 1-bit hash (FixedPiZ2Hash): H(x,r) = LSB(π(σ(x)⊕r) ⊕ σ(x)⊕r) with π
a fixed public-key AES-128 permutation (nothing-up-my-sleeve PI_KEY) and σ a
GF(2¹²⁸) doubling — the BHKR/GKWY fixed-key-AES MMO construction. Security rests on
AES as a public random permutation, so the comparator needs no key material.
AES-CMAC accumulator (NIST SP 800-38B validated): incremental prefix-block /
final-block absorption for the variable-length scheme. Encryption-side only — the
comparator never evaluates CMAC.
LemireFyPrp (fixed-draw Fisher–Yates, wide 64-bit draws + Lemire
multiply-high): replaces the rejection-sampled Knuth shuffle for the new schemes,
closing a timing channel where the rejection draw-count depended on the
plaintext-derived seed. (Bit8 stays on the Knuth shuffle — wire-frozen; the channel
is documented, not changed.)

Performance

The headline: encrypting a value is ~1.6× faster on the v1-compatible format, and
up to ~4.6× faster on the new Bit6 scheme. Comparison — the operation that runs
server-side at query time — was already fast and is essentially unchanged.

Same workload (encrypt a u64), same machine (Apple M1 Max), hardware AES on for
both so this isolates the rewrite from the backend:

	encrypt a u64	vs v1
v1 (Bit8)	~39 µs	—
v2, same wire-compatible scheme (Bit8)	~25 µs	~1.6×
v2, new opt-in scheme (Bit6)	~8.6 µs	~4.6×

Separately — hardware AES is now on by default. On Apple Silicon the aes crate
needs --cfg aes_armv8 for ARMv8 hardware AES; without it (the old default) every block
ran in software, ~60× slower per block — worth ~10× end to end. v2 sets it for workspace
builds (downstream crates must set it themselves, announced on release). So "old default
as it actually ran" (≈381 µs) vs v2 is ~15× (Bit8) / ~44× (Bit6) — but most of that first
~10× is the flag, not the rewrite, so it shouldn't be read as a speedup from the new code.

Comparison stays flat and fast: ~230 ns at Bit8, ~180 ns at Bit6, ~400 ns for a
23-block string — a constant-time prefix scan plus one hash eval and one oblivious block
read. The rewrite didn't target it; hardware AES alone took it from ~740 ns to ~230 ns.

Criterion medians on a developer machine — indicative, not a regression gate. Full
baseline, per-PR deltas, the SIMD kernels, and the variable-length string scaling
(~0.69 µs/block) are in docs/benchmarks/.

Security reviews (carried on this branch)

docs/reviews/2026-06-16-*:

Zeroize audits (Trail of Bits zeroize-audit) of ORE v2 (2/n): core refactor — width abstraction, seed/tag separation, template RO keys #79 / ORE v2 (5/n): 6-bit block scheme + v2 wire format #82 / ORE v2 (6/n): chained-prefix variable-length scheme + string encryption #83. ZA-0001
(legacy Aes128Prng keystream not wiped on drop) was found, fixed (fix(prp): zeroize Aes128Prng keystream on drop (ZA-0001) #84, already on
main), and inherited here; the new v2 code is clean.
Constant-time analysis (ORE v2 (2/n): core refactor — width abstraction, seed/tag separation, template RO keys #79 sweep, with v2 applicability): the two genuinely
actionable legacy findings — encrypt-timing (CT-1) and the compare-side cache leak
(CT-4) — do not apply to v2 (the LemireFyPrp switch and ct_select_byte
hardening were built to close exactly those). What remains in v2 (CT-2/CT-3) is
structural-but-mitigated to the MemJam sub-line residual.
pl/pgSQL comparator validation: the v2 σ-MMO comparison is implementable in
pl/pgSQL + pgcrypto (12/12 vectors match Rust) — relevant because the interim
server-side comparator can't depend on CMAC, and it doesn't (CMAC is encryption-side).

CI (test.yml) on this branch runs on every pull request (not only PRs into
main), so the whole stack gets GitHub checks while it targets feature branches.

Backwards compatibility

v1 (Bit8) is the wire-frozen default and is unchanged byte-for-byte (the #78
compatibility vectors pin it). v2 is purely additive — new schemes, new widths, new
header — selected explicitly. (A follow-up will split the scheme trait so wire-format
back-compat is enforced at the type level.)

Status

Draft. This is the aggregation point for #78–#83. Remaining gates before it can go
to main: the stack merges in, the A2/A3 CMAC + PRP-shape crypto review clears, and
the Bit6/chained wire vectors are pinned. Flip to ready once those land.

Review/validation artifacts from the v2 crypto review (zeroize audits for the ZA-0001 fix and PRs 82/83, the constant-time v2-applicability analysis, and the pl/pgSQL + pgcrypto validation of the v2 sigma-MMO comparison). ore-v2 is the forthcoming v2 release/integration branch; the stack rebases on top of this.

Drops the `branches: [main]` filter on the pull_request trigger so stacked PRs that target a feature/integration branch (the ore-v2 stack) run CI too. Lands on ore-v2 first so the whole rebased stack inherits it.

coderdan added 2 commits June 16, 2026 18:27

ci: run tests on all pull requests, not only those targeting main

8c60539

Drops the `branches: [main]` filter on the pull_request trigger so stacked PRs that target a feature/integration branch (the ore-v2 stack) run CI too. Lands on ore-v2 first so the whole rebased stack inherits it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORE v2: variable-length & string ORE, v2 wire format, SIMD + constant-time hardening (integration branch)#90

ORE v2: variable-length & string ORE, v2 wire format, SIMD + constant-time hardening (integration branch)#90
coderdan wants to merge 2 commits into
mainfrom
ore-v2

coderdan commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

coderdan commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ORE v2 — the next generation of order-revealing encryption

Why v2

What's in v2 (delivered via the stacked PRs)

Performance

Security reviews (carried on this branch)

Backwards compatibility

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderdan commented Jun 16, 2026 •

edited

Loading