ORE v2 (5/n): 6-bit block scheme + v2 wire format#82
Conversation
Make explicit that the chained scheme has exactly one secret key (k), producing both branches via the branch tag; branch-tag domain separation under a good PRF is equivalent to independent per-branch keys but cheaper (one key schedule + one subkey pair). Note H's pi is public (not secret) and the nonce is not a key; contrast with fixed-N (#82, two keys) and the init(k1,k2) API (k2 redundant here); flag the single-key choice for explicit sign-off.
…ebug (#82 review) Code-review follow-ups on the Bit6 / v2-wire PR: - compare_raw_slices: reject a degenerate count=0 header. No OreEncrypt path produces zero blocks; without this, two crafted 0-block ciphertexts (header + nonce only) compare Equal because the scan loop never runs. (C2) - Deduplicate encode_right_block: make the bit2 helper generic over the hash (encode_right_block<W: BlockWidth, H: Hash>) and have the Bit6 scheme call it instead of keeping a near-verbatim copy. (D1) - Add width::ct_bit and route all four get_bit sites through it: extract the target bit with constant shift amounts + a constant-time select, instead of (a shift by a secret amount — constant-time on x86_64/ aarch64 but not guaranteed on every target). Pairs with ct_select_byte for a fully oblivious, data-independent bit read. (D3 mitigation) - Replace #[derive(Debug)] on OreAes128 / OreAes128Bit6 with an explicit opaque Debug impl (finish_non_exhaustive) so key material can never be rendered. (Used a manual impl rather than vitaminc::OpaqueDebug — vitaminc is a 0.2.0 pre-release and ore-rs is a published crate; see PR discussion.) No wire-format change: bit2 and bit6 compat vectors remain byte-identical.
…ebug (#82 review) Code-review follow-ups on the Bit6 / v2-wire PR: - compare_raw_slices: reject a degenerate count=0 header. No OreEncrypt path produces zero blocks; without this, two crafted 0-block ciphertexts (header + nonce only) compare Equal because the scan loop never runs. (C2) - Deduplicate encode_right_block: make the bit2 helper generic over the hash (encode_right_block<W: BlockWidth, H: Hash>) and have the Bit6 scheme call it instead of keeping a near-verbatim copy. (D1) - Add width::ct_bit and route all four get_bit sites through it: extract the target bit with constant shift amounts + a constant-time select, instead of "byte >> (bit % 8)" (a shift by a secret amount — constant-time on x86_64/aarch64 but not guaranteed on every target). Pairs with ct_select_byte for a fully oblivious, data-independent bit read. (D3 mitigation) - Replace #[derive(Debug)] on OreAes128 / OreAes128Bit6 with an explicit opaque Debug impl (finish_non_exhaustive) so key material can never be rendered. (Used a manual impl rather than vitaminc::OpaqueDebug — vitaminc is a 0.2.0 pre-release and ore-rs is a published crate; see PR discussion.) No wire-format change: bit2 and bit6 compat vectors remain byte-identical.
5692c8e to
7458510
Compare
Review: constant-time & zeroizationRan the Trail of Bits Constant-timeThe Bit6 path delivers the intended hardening (verified against the legacy bit2 baseline):
No new timing channels introduced; the σ-MMO hash is branch-free over constant-time AES. ZeroizationClean — no new gaps. All new key-derived material zeroizes on drop, and the compiler/IR phase confirms the wipes survive
The only finding is the pre-existing ZA-0001 (legacy Full write-ups: |
Implements the (Bit6, packed prefix, fixed-N) scheme from the v2 plan:
- scheme::bit2_w6::OreAes128Bit6{,ChaCha20}: 6-bit blocks (domain 64),
64 RO evaluations per block (vs 256), 8-byte right blocks (vs 32).
u64 ciphertexts are 295 bytes (vs 408) and encrypt in ~11.5us on M1
Max (vs 25.1us for the optimized legacy scheme).
- v2 wire header (version || scheme_id || block_count u16 BE) via a new
OreCipher::WIRE_HEADER associated const, None for the legacy scheme
(whose bytes are unchanged — vectors pass). One header per serialised
artifact; parse and compare validate version, scheme and count, so
cross-scheme and cross-shape comparisons fail loudly (tested).
- Block count bound into byte 15 of all PRF inputs (left tags, RO keys,
PRP seeds) for domain separation across plaintext shapes under shared
keys; fixes the repeated-PRP-seed behaviour for this scheme.
- KnuthShufflePRP generalised over domain via macro (256 wire-frozen,
64 new); NEON 64-lane indicator kernel added.
- Z2 hash: fixed-pi MMO construction (plan section 6, option 3) — both
this and the legacy construction are implemented; the scheme picks via
a one-line type alias. PENDING CRYPTO REVIEW; wire format not frozen,
no vectors pinned yet, doc comments warn against storing ciphertexts.
- OreEncrypt impls for bool/u8..u64/i8..i64/char/f32/f64 with MSB-first
6-bit decomposition and compile-time block-count assertions. u128/
i128/Decimal exceed the 14-block packed-prefix cap and stay legacy-
only until the chained prefix lands. BREAKING-ish: the previous
blanket impls over T: OreCipher are now scheme-specific (coherence
with per-scheme block counts); code generic over OreCipher that
called .encrypt() must name a scheme.
Part of the ORE v2 program (docs/plans/2026-06-12-ore-v2-architecture.md, PR 5).
…ot rejected at q=N
Replaces Bit6's rejection-sampled Knuth shuffle with LemireFyPrp: a Fisher-Yates shuffle driven by a fixed count of wide (64-bit) draws, each reduced to range by Lemire multiply-high ((x * range) >> 64). Why it matters beyond speed: the Knuth shuffle's rejection sampling does a seed-dependent number of draws, and the seed is PRF(plaintext prefix), so PRP construction time was weakly plaintext-dependent — a timing side channel. Fixed-count draws make construction time seed-independent and branch-free, closing it. Uniformity is provable: each Lemire reduction is within range/2^64 of uniform, so the permutation is within < 2^-55 of a uniformly random permutation (the object Lewi-Wu models) — a pure statistical term, no new assumption. Scope: - Bit6 only. Bit8 (legacy) stays on the Knuth shuffle: its byte-exact output is wire-frozen, so the timing channel there is documented in the plan rather than fixed (a fix would change ciphertext bytes). - Seed-keyed shape (i): drops into the existing Prp::new(seed) with no architectural change. Bit6 u64 encrypt 11.5us -> 8.6us. - The remaining win to ~3.3us (pre-scheduled stream, shape ii — no per-block AES key schedule) is deferred to PR 6, where the CMAC accumulator can emit the PRP keystream as a branch family under the same crypto review. Documented in the plan + bench doc. Tests: permutation validity + bidirectional round-trip over 32 seeds, determinism, short-key rejection, and indicator-mask-vs-reference quickcheck (guards the gt_mask_xor_64 kernel). Plan open-question 1 and bench results updated. Part of the ORE v2 program (docs/plans/2026-06-12-ore-v2-architecture.md, PR 5).
Draft analysis for the CipherStash research blog covering the rejection-sampling timing channel in PRP generation, the wide-draw/Lemire fix and its security framing (2^-55 statistical distance vs Lewi-Wu's random-permutation model), the swap-or-not rejection on proof grounds, and the hardware-AES build-flag finding. Cross-references the vitaminc issue.
Restructured to the CipherStash standard (cipherstash-js-suite/prompts/ _shared/writing-guidelines.md): definition -> why it matters -> how it works -> example -> related, with you/we voice, Note/Tip/Warning callouts, a meta description, title options, and a short 'Why this matters for CipherStash' framing. Genre-adapted for a crypto-internals post (Rust snippets rather than the TS default; no sales CTAs).
…profile) Redone against cipherstash-js-suite .claude/skills/blog-writing-voice (the canonical voice skill, which post-dated the stale local checkout I first wrote to). Narrative detective-story arc, hook opening instead of a definition, recurring 'old code I was proud of' motif bookended, first- person Dan voice, evocative headers, sparing em-dashes, US English, italic closing aphorism, and the :wq sign-off.
The post now lives in the marketing site content (cipherstash/cipherstash-js-suite#548); it doesn't belong in the library repo.
Lewi-Wu is a left/right scheme: a comparison is only evaluated between a left and a right ciphertext, and a right ciphertext in isolation reveals nothing about order. With right-only-at-rest storage (the default deployment), an offline attacker recovers nothing -- not order, and a fortiori not common-prefix length. The first-differing-block disclosure surfaces only at query time (to the legitimate operator) or to an online adversary observing query traffic. Rewrite the plan's string-leakage discussion to lead with this left/right asymmetry and the three threat tiers, and narrow the security-checklist item accordingly. The product decision (B1) is thus about acceptable query-time/online leakage, not at-rest leakage.
Standalone sign-off brief a reviewer can act on without reading the full plan or codebase. Covers: A1 (1-bit hash H instantiation -- shipped FixedPiZ2Hash), A2 (CMAC cached-state accumulator for PR 6), A3 (PRP keystream as a CMAC branch family, shape ii), A4 (secret-indexed Fisher-Yates swap). Sequencing: A1+A4 gate Bit6 vector pinning (both change ciphertexts), A2+A3 gate PR 6. Flags a discrepancy found in the shipped code: LemireFyPrp is not repr(align(64)), so the one-cache-line argument the plan claims for the secret-indexed permutation.swap(i,j) does not hold as written -- either add the alignment or take the constant-time fallback.
A4 hardening from the crypto review brief. None of these change ciphertexts (compat + comparison vectors unchanged). - prp.rs: LemireFyPrp is now repr(C, align(64)) so each [u8;N] table is one 64-byte cache line (permutation@0, inverse@64) -- covers both secret-indexed key-gen writes (the FY swap and the inverse fill) for the one-cache-line argument. Add a const assert!(domain <= 64) in impl_lemire_fy_prp! so a larger instantiation, which would span multiple lines and lose the property, is a compile error. - width.rs: add oblivious ct_select_byte(block, idx) -- scans the whole block and constant-time-selects the byte, so the access address is independent of the secret index. - bit2/bit2_w6 comparators: route all four get_bit sites through ct_select_byte. The right-block byte read was indexed by a[l] (the secret permuted symbol); the oblivious read closes that cache-line channel and, by touching every byte, the sub-line (MemJam) channel too -- chosen over mere alignment for that reason. Cost is <=32 byte-ops per comparison. - review brief: add the MemJam analysis (4K-aliasing, 4-byte granularity, Intel-wide scope incl. SGX, ARM/AMD out, SMT co-residence required), the oblivious-swap-FY vs swap-or-not wire-compatibility distinction, and the consequence that A4 no longer gates Bit6 vector pinning (the fixes are byte-stable; A1/H is the sole remaining gate).
Add a "Block width is a leakage decision" subsection to plan section 5(b) and rewrite open question 3. Lewi-Wu leaks the first-differing-block index, so larger blocks leak less (Bit8 < Bit6 < CLWW): a u64's first differing bit is localised to an 8-bit window at Bit8 vs a 6-bit window at Bit6. That online prefix-leakage axis is the one the library cannot fix; it pulls against Bit6's encrypt-side one-cache-line CT advantage, which is only a cost difference (full oblivious CT is available at both widths, ~12x cheaper at Bit6). Conclusion: width is a per-domain/per-deployment policy keyed on target data and threat model, not a global default. Default numerics to Bit8 (lower leakage + wire-compat); Bit6 is opt-in for at-rest-dominated, size/perf, or encryptor-hostile deployments. Supersedes the earlier 'Bit6 as default' lean.
Record mantissa+exponent / log-domain encoding as an encoding-layer option for wide-dynamic-range numerics (currency, measurements, high-range decimals): bounds block count, uniform relative precision, deliberate magnitude-band leakage; composes with the variable-block machinery. Caveats: Benford leading-digit skew (relative != flat), magnitude-band is a conscious leak, and parameters must be fixed per-domain as public (Parameter-Hiding ORE, Cash et al. 2018). Explicitly scope it as NOT a low-entropy / narrow-domain mitigation: it is order-preserving, so it cannot touch the order floor, and for a narrow domain like DOB the exponent is near-constant (increases high-order skew). DOB-class fields are mitigated by coarsening the plaintext to the queried granularity, not by re-encoding. Keeps the two ideas from being conflated later.
Keep the fixed public-key AES construction (option 3) but upgrade it with the BHKR/Zahur orthomorphism: H(x, r) = LSB( pi(sigma(x) XOR r) XOR sigma(x) XOR r ) with pi = AES-128 under public key K0 and sigma(x) = 2x in GF(2^128) (the 'multiply by x' / CMAC-subkey doubling; reduction constant 0x87). FixedPiZ2Hash (Bit6 only) updated in both the scalar comparator path and the bulk scalar/SIMD encryption path; added gf128_double plus tests (fixed_pi_scalar_matches_bulk, gf128_double_reduction). Legacy Bit8 keeps Aes128Z2Hash unchanged. Justification (review brief A1, plan section 6): - The known fixed-key-MMO attacks (GKWY; the half-gates multi-instance attack of eprint 2019/1168) require *known* hash inputs and a recoverable global Free-XOR offset. ORE has independent *secret* PRF inputs and no global offset, so the O(p*C/2^k) degradation does not arise. - The tight tweak-as-key variant (2019/1168 Thm 2) is declined: rekeying per evaluation breaks the keyless-comparator / performance requirement and fixes a degradation ORE does not suffer. - eprint 2025/792 cryptanalyses collision/preimage/one-wayness (not the 1-bit correlation-robustness we rely on) and only round-reduced AES (7/10 collision on AES-MMO/MP), leaving full AES-128's margin intact. - The orthomorphism is cheap defense-in-depth: security holds by matching the named BHKR/Zahur construction rather than by a usage argument. This changes Bit6 right ciphertexts; A1 was the gate holding Bit6 vector pinning and is now cleared. Docs mark A1 RESOLVED.
Freeze the OreAes128Bit6 (bit2_w6) wire format now that A1 (the H construction) is resolved — BHKR σ-MMO. Mirrors compat_vectors.rs (Bit8): deterministic TestRng nonce, pinned left + full ciphertext bytes, and comparison/order fixtures over the pinned bytes. Adds a signed-int (i64) order vector since Bit6 is a new scheme; cross-checks confirm the v2 header (0x02/0x02 + u16 block count) and the orderable sign-flip equivalence (i64::MIN ↔ 0u64, i64::MAX ↔ u64::MAX). Regeneratable via: cargo test --test compat_w6_vectors -- --ignored --nocapture generate
Replace the per-byte carry loop in the BHKR sigma-MMO with a single u128 word op: 2x = (x << 1) ^ ((x >> 127) * 0x87) over the big-endian field element. In the bulk encrypt path, fold the doubling and the nonce XOR into one u128 pass per block (the ~704-evals/u64 hot loop). Byte-identical output (sigma is unchanged), so the pinned Bit6 vectors and the scalar<->bulk consistency test still pass; still constant-time (branch-free, no secret-dependent control flow). Recovers the A1 sigma-MMO regression: Bit6 u64 encrypt ~12.3us -> ~8.9us (Apple M1 Max), back to the pre-orthomorphism shape-i level.
The detailed, review-ready spec for the PR 6 accumulator (the A2 gate), replacing the sketch in the review brief. Covers: - Construction: CMAC (NIST SP 800-38B) over an injective prefix-block + final-block encoding; cached CBC state = incremental CMAC. Subkeys reuse the (vectorized) gf128_double. - Injective message encoding (byte layouts) + injectivity argument. - Per-block algorithm; the left tag f[n] = ro(n, xt[n]) is the RO_KEY branch at the permuted symbol (not a separate output), preserving mask cancellation at compare time. - NO total-length binding -- required so prefix-sharing strings of different lengths compare correctly; length comparability enforced at the comparator. - One dedicated KDF'd key unifies the old prf1/prf2 via branch tags (RO_KEY/PRP_STREAM); PRF2 subsumed. - Shape-(ii) PRP (A3): keystream derived as CMAC tags, no per-block key schedule (needs a LemireFyPrp::from_stream ctor). - Security: reduction to CMAC PRF + the three auditable claims (injectivity, incremental faithfulness, zeroization) + birthday budget; the chain state is never published, so the cascade/GGM interaction does not arise. - Test plan + open questions for the reviewer. Linked from review brief A2 and plan section 5(b).
The final-block signature F(branch, n, s) and section 3's 'branch separation' referenced 'branch' before it was named. Add an explicit definition at the top of section 4 (RO_KEY / PRP_STREAM output families, carried as the byte-0 branch tag) and point section 3 at it.
Make explicit that the chained scheme has exactly one secret key (k), producing both branches via the branch tag; branch-tag domain separation under a good PRF is equivalent to independent per-branch keys but cheaper (one key schedule + one subkey pair). Note H's pi is public (not secret) and the nonce is not a key; contrast with fixed-N (#82, two keys) and the init(k1,k2) API (k2 redundant here); flag the single-key choice for explicit sign-off.
…ebug (#82 review) Code-review follow-ups on the Bit6 / v2-wire PR: - compare_raw_slices: reject a degenerate count=0 header. No OreEncrypt path produces zero blocks; without this, two crafted 0-block ciphertexts (header + nonce only) compare Equal because the scan loop never runs. (C2) - Deduplicate encode_right_block: make the bit2 helper generic over the hash (encode_right_block<W: BlockWidth, H: Hash>) and have the Bit6 scheme call it instead of keeping a near-verbatim copy. (D1) - Add width::ct_bit and route all four get_bit sites through it: extract the target bit with constant shift amounts + a constant-time select, instead of "byte >> (bit % 8)" (a shift by a secret amount — constant-time on x86_64/aarch64 but not guaranteed on every target). Pairs with ct_select_byte for a fully oblivious, data-independent bit read. (D3 mitigation) - Replace #[derive(Debug)] on OreAes128 / OreAes128Bit6 with an explicit opaque Debug impl (finish_non_exhaustive) so key material can never be rendered. (Used a manual impl rather than vitaminc::OpaqueDebug — vitaminc is a 0.2.0 pre-release and ore-rs is a published crate; see PR discussion.) No wire-format change: bit2 and bit6 compat vectors remain byte-identical.
7458510 to
92833dd
Compare
Stacked on #81. Plan §4 + §6 (
docs/plans/2026-06-12-ore-v2-architecture.md); crypto review briefdocs/reviews/2026-06-14-ore-v2-crypto-review-brief.md.Crypto review status ✅
The §6 H-construction gate (A1) is resolved (2026-06-15): the scheme uses the BHKR fixed-key-AES σ-MMO
H(x, r) = LSB(π(σ(x) ⊕ r) ⊕ σ(x) ⊕ r), withπ = AES-128_{K₀}(publicK₀) andσ(x) = 2xin GF(2¹²⁸).σis cheap defense-in-depth (security by matching the named BHKR/Zahur construction, not a usage argument). Full write-up: review brief A1.Wire format is now frozen — Bit6 vectors pinned in
tests/compat_w6_vectors(mirrors the Bit8compat_vectorscontract: pinned left + full bytes, comparison/order fixtures, plus a signed i64 order vector).Also lands the constant-time PRP hardening (A4):
LemireFyPrp(fixed-draw Fisher–Yates + Lemire reduction, replacing the rejection-sampled shuffle — closes a plaintext-dependent timing channel and a modulo bias),#[repr(C, align(64))]+ anN ≤ 64compile guard for the secret-indexed key-gen writes, and an oblivious comparator block read (ct_select_byte, closing the cache-line / MemJam channel). See review brief A4.Numbers (Apple M1 Max, u64)
What
OreAes128Bit6/OreAes128Bit6ChaCha20— 64-element domain, 64 RO evals/block, 8-byte right blocks, MSB-first 6-bit decomposition (order-preservation quickchecked).version ‖ scheme_id ‖ count) viaOreCipher::WIRE_HEADER(Nonefor legacy — Bit8 bytes unchanged, its vectors still pass). Cross-scheme / cross-shape comparisons returnNone; corrupt/truncated headers fail parsing.FixedPiZ2Hash) — A1 resolved.LemireFyPrp(fixed-draw FY + Lemire reduction), constant-time hardened (A4).OreEncryptfor primitives ≤ 64 bits with compile-time block-count assertions; u128/i128/Decimal exceed the 14-block packed-prefix cap (documented; they arrive with the chained prefix, PR 6).tests/compat_w6_vectors).OreEncryptblanket impls (T: OreCipher) became scheme-specific — coherence requires it once block count ≠ byte count. Downstream code generic overOreCiphermust name a scheme.Not in this PR