Skip to content

ORE v2 (6/n): chained-prefix variable-length scheme + string encryption#83

Draft
coderdan wants to merge 5 commits into
feat/ore-v2-bit6from
feat/ore-v2-chained
Draft

ORE v2 (6/n): chained-prefix variable-length scheme + string encryption#83
coderdan wants to merge 5 commits into
feat/ore-v2-bit6from
feat/ore-v2-chained

Conversation

@coderdan

Copy link
Copy Markdown
Contributor

Stacked on #82. Implements PR 6 of the ORE v2 program: the chained-prefix, variable-length scheme that enables string encryption (and lifts the fixed-N 14-block packed-prefix cap).

Design (preliminary crypto sign-off given; detailed review in progress)

  • Plan: docs/plans/2026-06-12-ore-v2-architecture.md §5(b)
  • A2 design spec: docs/plans/2026-06-15-ore-v2-cmac-accumulator-spec.md — the precise, reviewed-against construction (injective encoding, security argument, open questions). (Both live in the base ORE v2 (5/n): 6-bit block scheme + v2 wire format #82 commits; this PR adds the implementation + a benchmark-results doc + the PR-6 status note.)

What

  • primitives/cmac.rs — incremental AES-CMAC accumulator (NIST SP 800-38B). absorb (CBC prefix) / finalize (E_k(S⊕F⊕K1)); injective prefix/final block encoders; K1 reuses the vectorized gf128_double. Validated against the NIST AES-128 CMAC test vectors.
  • LemireFyPrp::from_stream (shape ii) — PRP keyed directly from the accumulator's PRP_STREAM branch, so there is no per-block AES key schedule. new() was refactored to call it, keeping the Bit6 vectors byte-identical.
  • scheme/chained.rsOreAes128Bit6Chained: encrypt_var / encrypt_left_var / encrypt_str / encrypt_left_str; Vec-backed Var{Left,Right,CipherText} with the v2 header (scheme id 0x03, u16 block count); compare_raw_slices doing a constant-time prefix scan with lexicographic semantics (shorter sorts first), reusing the BHKR σ-MMO H and the oblivious compare read.
    • One secret key (the accumulator key, derived from k1 by a labelled AES call), subsuming the fixed-N prf1/prf2 via branch tags.
    • No total-length binding in the derivation, so prefix-sharing strings of different lengths compare correctly (spec §7); length comparability is enforced at the comparator.

Tests

Cross-length lexicographic order vs str::cmp (incl. shared prefixes and >14-block strings), equality across nonces, serialize round-trip, empty string sorts first, cross-scheme byte rejection; plus the CMAC NIST KAT and accumulator unit tests. All local CI gates green (fmt, clippy -D warnings, full test).

Performance (Apple M1 Max — docs/benchmarks/2026-06-15-chained-results.md)

op input time
encrypt "alice" (7 blocks) ~5.05 µs
encrypt "alice@example.com" (23 blocks) ~15.97 µs
encrypt 43-char sentence (58 blocks) ~40.18 µs
encrypt-left 17 chars ~10.09 µs
compare 17 chars ~402 ns

~0.69 µs/block, scaling linearly — and below fixed-N Bit6's ~0.81 µs/block, because the accumulator is keyed once per ciphertext (the shape-(ii) no-key-schedule win, A3).

Not in this PR

  • u128/i128/Decimal via the const-N + accumulator path (fixed-length, >14 blocks).
  • Pinned wire vectors — await final A2 sign-off (as Bit6 vectors awaited A1).
  • chrono Bit6 impls (independent follow-up).

Review notes

The A2 spec has 5 open questions (§11) — key-derivation slot, width-vs-scheme-id tag, shape-(ii) confirmation, birthday ceiling, u128/Decimal comparability. None block the working implementation; they shape the final freeze.

coderdan added a commit that referenced this pull request Jun 16, 2026
…nit, left-query comparator (#83 review)

Review follow-ups on the chained scheme:

- zeroize: wipe L = E_k(0) in CmacAccumulator::new (K1's source; spec section 9)
  and the per-block `ro` RO-tag buffer in encrypt_var (matches the bit2_w6
  scratch wipe). Existing Drops (k1/state/k_acc/stream) were already correct.
- Replace the BRANCH_* u8 consts + a would-be debug_assert with a non-zero
  enum Branch { RoKey = 1, PrpStream = 2 }; final_block takes Branch, so a final
  block's byte 0 can never be 0x00 and collide with a prefix block — the
  injectivity separator is now a compile-time guarantee.
- count > u16::MAX returns OreError::TooManyBlocks (new variant) in both
  encrypt_var and encrypt_left_var, instead of a debug_assert.
- Single-key init: the chained scheme is single-key by design (the CMAC
  accumulator derives every per-block secret from one key via branch tags), so
  drop the vestigial _k2 parameter — init(k1). (Bit6 still uses two PRF keys.)
- Add compare_left_to_full: the Lewi-Wu query path. The comparator core is
  factored into compare_views(left_view, full); compare_raw_slices (full,full)
  and compare_left_to_full (left,full) both call it, so the left-only artifact
  from encrypt_left_* is usable as a query against a stored ciphertext. Adds
  prop_left_query_matches_full + an artifact-rejection unit test.

No wire-format change.
@coderdan

Copy link
Copy Markdown
Contributor Author

Review: code / constant-time / zeroization

Ran the multi-agent code review plus the Trail of Bits constant-time-testing and zeroize-audit skills. The chained scheme is functionally correct (comparator, injective encoding, accumulator threading, wire format all verified, with property tests as the oracle). Findings and fixes (applied in f498990):

Applied

  • Left-vs-right query path (was missing): compare_raw_slices required both operands to be full ciphertexts, so the encrypt_left_* artifact was uncomparable. Added compare_left_to_full(left, full) — the Lewi-Wu query path — by factoring the comparator core into compare_views(left_view, full); both entry points share it. New prop_left_query_matches_full + artifact-rejection test.
  • Single-key init: the chained scheme is single-key by design (the CMAC accumulator derives every per-block secret from one key via branch tags), so the vestigial _k2 param is gone — init(k1). (Bit6 still uses two PRF keys; the differing key arity between fixed and chained is being discussed separately.)
  • count > u16::MAX now returns OreError::TooManyBlocks (both encrypt paths), not a debug_assert.
  • final_block branch is now a non-zero enum Branch (compile-time guarantee that a final block's byte 0 ≠ 0x00, so it can never collide with a prefix block) — replacing a would-be debug_assert.

Constant-time

Inherits Bit6's posture: CT-1 fixed (from_stream is fixed-draw), CT-2/CT-3 mitigated to MemJam sub-line (align-64, N=64), comparator prefix scan is constant-time; CMAC adds no data-dependent control flow. Two carry-overs: the post-ct_select_byte bit extract still uses a secret-amount shift (resolves on rebase onto #82's ct_bit), and the l-indexed block fetch leaks the common-prefix length (in-model online leakage per plan §5b).

Zeroization

Two new gaps found and fixed: L = E_k(0) in CmacAccumulator::new (K1's source) and the per-block ro RO-tag buffer in encrypt_var are now zeroized. All existing Drops (k1/state/k_acc/stream, from_stream) were already correct. Only other finding is the pre-existing ZA-0001 (fixed on #84). Full write-up: docs/reviews/2026-06-16-zeroize-audit-pr83.md.

Open (not blocking)

No pinned wire vectors yet (deliberately deferred to A2 sign-off — the freeze gate); a few low test-gaps noted in the review.

coderdan added 5 commits June 16, 2026 19:51
…PR 6 foundation)

- primitives/cmac.rs: incremental AES-CMAC (NIST SP 800-38B) accumulator per
  the A2 spec — absorb (CBC prefix) / finalize (E_k(S^F^K1)); prefix/final
  block encoders with the injective byte layout; K1 reuses gf128_double.
  Validated against the NIST AES-128 CMAC test vectors.
- prp.rs: factor LemireFyPrp::from_stream (shape ii) out of new(); new() now
  calls it, so Bit6 vectors stay byte-identical. Lets the chained scheme key
  the PRP from the accumulator stream with no per-block AES key schedule.
- hash.rs: gf128_double_u128 -> pub(crate) (shared by H and CMAC subkeys).
…on (PR 6)

OreAes128Bit6Chained: variable-length BlockORE over 6-bit blocks, driving
the AES-CMAC accumulator (A2 spec). Per block: PRP keystream from the
accumulator's PRP_STREAM branch via LemireFyPrp::from_stream (shape ii, no
per-block key schedule); ro_keys from the RO_KEY branch; left tag
f[n]=ro(n,xt[n]); right block = H-mask XOR indicator (reuses the BHKR
sigma-MMO H + oblivious compare read). Vec-backed Var{Left,Right,CipherText}
with the v2 header (scheme id 0x03, u16 block count).

- encrypt_var / encrypt_left_var / encrypt_str / encrypt_left_str.
- compare_raw_slices: constant-time prefix scan, lexicographic (shorter
  sorts first); rejects mismatched scheme/version/length.
- No total-length binding in the derivation, so prefix-sharing strings of
  different lengths compare correctly (spec section 7).
- Tests: cross-length lexicographic order, equality across nonces, >14-block
  plaintexts, roundtrip, empty string, cross-scheme rejection.
- benches/chained.rs.

Perf (M1 Max): ~0.69 us/block (encrypt), below fixed-N Bit6's ~0.81 us/block
-- the shape-ii no-key-schedule win. compare ~402ns.
Add docs/benchmarks/2026-06-15-chained-results.md (~0.69 us/block, below
fixed-N Bit6's ~0.81 -- the shape-ii no-key-schedule win) and a PR 6 status
note in the plan roadmap.
…cy, and CMAC accumulator

Adds quickcheck properties matching the crate's existing idiom:

scheme/chained.rs (6 props, lengths capped at 32 blocks so most cases
exceed the 14-block fixed-N cap):
- block-level order == blocks.cmp over the full 6-bit domain
- shared-prefix order (drives the constant-time prefix scan)
- string order == str::cmp across arbitrary Unicode and lengths
- equality-across-nonces with byte divergence
- left ct is deterministic and byte-identical to the full ct's left half
- serialize round-trip

primitives/cmac.rs (3 props):
- incremental absorb/finalize == from-scratch CMAC for any block count
  (NIST vectors only cover t in {1,4})
- final/prefix block encodings are injective (tuple recoverable)

All pass at 4000 iterations each; clippy -D warnings and fmt clean.
…nit, left-query comparator (#83 review)

Review follow-ups on the chained scheme:

- zeroize: wipe L = E_k(0) in CmacAccumulator::new (K1's source; spec section 9)
  and the per-block `ro` RO-tag buffer in encrypt_var (matches the bit2_w6
  scratch wipe). Existing Drops (k1/state/k_acc/stream) were already correct.
- Replace the BRANCH_* u8 consts + a would-be debug_assert with a non-zero
  enum Branch { RoKey = 1, PrpStream = 2 }; final_block takes Branch, so a final
  block's byte 0 can never be 0x00 and collide with a prefix block — the
  injectivity separator is now a compile-time guarantee.
- count > u16::MAX returns OreError::TooManyBlocks (new variant) in both
  encrypt_var and encrypt_left_var, instead of a debug_assert.
- Single-key init: the chained scheme is single-key by design (the CMAC
  accumulator derives every per-block secret from one key via branch tags), so
  drop the vestigial _k2 parameter — init(k1). (Bit6 still uses two PRF keys.)
- Add compare_left_to_full: the Lewi-Wu query path. The comparator core is
  factored into compare_views(left_view, full); compare_raw_slices (full,full)
  and compare_left_to_full (left,full) both call it, so the left-only artifact
  from encrypt_left_* is usable as a query against a stored ciphertext. Adds
  prop_left_query_matches_full + an artifact-rejection unit test.

No wire-format change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant