Skip to content

feat: inline order-preserving xsd:decimal encoding + numeric range pushdown#1335

Open
bplatz wants to merge 27 commits into
mainfrom
feature/inline-decimal-encoding
Open

feat: inline order-preserving xsd:decimal encoding + numeric range pushdown#1335
bplatz wants to merge 27 commits into
mainfrom
feature/inline-decimal-encoding

Conversation

@bplatz

@bplatz bplatz commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Closes #1325.

Builds on the exactness work in #1332 (base branch fix/decimal-exactness); merge after that lands.

What this does

Small xsd:decimal values now store inline in the index, the way integers already do — a value that fits encodes directly into the 64-bit object key, and only genuinely large/high-precision decimals fall back to the per-(graph, predicate) NumBig arena. The inline key is an order-preserving, canonical base-10 float code: equal values produce identical bits (equality, dedup, joins) and raw u64 order equals numeric order (range / ORDER BY pushdown). Because the format change is the one-time expensive cost, the order-preserving layout was locked in directly rather than shipping an equality-only form first and needing a second migration.

The inline range is generous: up to 17 significant digits with the decimal point floating in [-32, 31] — e.g. money to the cent up to ~$1 quadrillion, six decimal places up to ~$100 billion. Beyond that, values spill to the arena and remain fully correct, just without the inline benefits.

Backward compatibility / migration

  • The encoding is a new index-root format version. Existing ledgers keep working unchanged — no reindex required.
  • A full reindex (or fresh bulk import) adopts the new format automatically and unlocks inlining for that ledger going forward. Ledgers migrate independently; old binaries refuse the new root version rather than misreading it.
  • Decode is always dual-capable, so new code reads old and new roots transparently.

Query pushdown harvested

With order-preserving keys, the fast paths now exploit decimals (and, as a no-format-change bonus, integers/doubles — their keys were already order-preserving on disk, so this works on existing indexes too):

  • ORDER BY [DESC] … LIMIT k — sorted-tail scan, sort elimination.
  • COUNT(… FILTER(?o cmp k)) — encoded-key comparison with manifest-extent shortcuts.
  • MIN / MAX — boundary-key read, no row scan.
  • SELECT … FILTER(?o cmp k) — seeks the o_key range instead of post-filtering the whole predicate, when the predicate is uniformly one order-preserving numeric type and there is no overlay.

SUM/AVG deliberately stay on the exact BigDecimal path (the f64 fast-agg lane would be lossy for decimals).

Correctness guardrails

  • Range narrowing requires a uniform predicate (manifest extent min_o_type == max_o_type) and no overlay — novelty can introduce a cross-type value whose op sorts outside the narrowed window; with overlay present we fall back to the full scan + merge + post-filter. Regression tests cover both a decimal base with an integer-novelty match and an integer base with a decimal-novelty match.
  • A mixed-type predicate (e.g. integers + decimals under one property) never narrows, so no cross-type matches are dropped.

Notable changes

  • Order-preserving inline decimal codec with monotonicity property tests (sorted-set walk + exhaustive pairwise).
  • New OType::XSD_DECIMAL_INLINE / ObjKind::NUM_DEC, a root-derived DecimalEncoding policy (sticky per root, preserved across incremental writes), and a root version bump that old binaries refuse.
  • Resolver, bulk-import sink, query-constant encoding, decode, and class/property datatype stats all wired for the inline lane.
  • Overlay-translation cache keyed on the decimal policy so a same-t reindex can't serve a stale arena-keyed translation against an inline root.
  • End-to-end and differential tests: write→read round-trips across the inline/arena boundary, equality-constant matching, and identical results between an indexed-inline read and the canonical novelty representation.

Testing

cargo fmt --check, cargo clippy --all --all-features --all-targets, and cargo nextest run --workspace --all-features all pass (the lone failure is the known LocalStack PortNotExposed testcontainer flake, unrelated to this change).

bplatz added 22 commits June 13, 2026 11:00
Add the wire contract and packing primitives for storing small, exact
xsd:decimal values inline in the (OType, ObjKey) pair instead of routing
every decimal to the per-(graph, predicate) NumBig arena.

- OType::XSD_DECIMAL_INLINE (0x0020) in the reserved embedded range, routed
  to a new DecodeKind::Decimal. Old binaries route unknown embedded payloads
  to Sentinel, never to the lossy f64 decimal lane, so the byte layout is not
  a misdecode hazard on its own.
- ObjKind::NUM_DEC (0x15) as the late-materialized binding kind.
- ObjKey::encode_decimal / decode_decimal: canonical equality-keyed packing
  [sign:1 | scale:6 | mantissa:57]. Trailing fractional zeros stripped,
  integer-valued decimals folded to scale 0, zero canonical, so equal values
  encode to identical bits. Returns None (arena fallback) when scale > 63 or
  |mantissa| >= 2^57.

Equality-keyed, not order-preserving: this kind is excluded from o_key-order
range pushdown. Nothing reads or writes the new encoding yet; resolver,
query-constant, decode, and root-version wiring follow.
Adding DecodeKind::Decimal broke exhaustiveness in decode_value_v3, the
on-disk (OType, ObjKey) -> FlakeValue decode. Decode is meant to be
unconditional (never policy-gated), so wire it now: NUM_DEC keys decode via
ObjKey::decode_decimal into FlakeValue::Decimal.

Also document the NUM_DEC carve-out in the ObjKey module docs: inline
decimals are equality-keyed, not value-ordered, so raw u64 ordering of the
packed payload is not numeric ordering and range pushdown must exclude this
kind. Other read sites still route the kind through wildcard arms; they are
unreachable until the write path emits inline decimals and will be wired in
the decode slice.
Introduce DecimalEncoding { ArenaOnly, InlineWhenFits } as the encode-time
policy for xsd:decimal, and make the FIR6 root version its carrier.

- DecimalEncoding (core): sticky per-root policy. Default ArenaOnly. Decode is
  always capable of both schemes regardless of policy, so it governs writes only.
- ROOT_V6_VERSION_INLINE_DECIMAL (3): the capability signal. A root that inlines
  decimals is written as v3; the layout is byte-identical to v2. Old binaries
  refuse a v3 root via the strict version check rather than misdecoding inline
  rows, giving the 'upgrade code first' safety property.
- decimal_encoding_for_version / version_for_decimal_encoding: the two-way
  mapping. encode() derives the version byte from the policy; decode() accepts
  v2 and v3 and sets IndexRoot::decimal_encoding accordingly.
- IndexRoot gains a sticky decimal_encoding field; all construction sites set
  ArenaOnly for now.

Still inert: no write path emits InlineWhenFits, so every root is v2/arena-only
and behavior is bit-for-bit unchanged. Tests cover the version<->policy round
trip, byte-identical-except-version invariant, and unknown-version refusal.
Wire the decimal-encoding policy through the resolve path so the write side
honors it.

- OTypeRegistry maps ObjKind::NUM_DEC -> OType::XSD_DECIMAL_INLINE, the
  inline counterpart of NUM_BIG -> NUM_BIG_OVERFLOW.
- CommitResolver and SharedResolverState carry a sticky decimal_encoding
  (default ArenaOnly) with a setter. The DecimalStr arms (both the commit and
  chunk resolvers) try ObjKey::encode_decimal first under InlineWhenFits and
  fall back to the NumBig arena when the value doesn't fit; under ArenaOnly
  they keep today's arena path.
- SharedResolverState::from_index_root derives the policy from the base root
  being extended, so an incremental rebuild inherits the root's encoding and
  never mixes inline and arena under one (o_type, o_key) identity.

Still inert at the system level: no writer sets InlineWhenFits and every root
is built ArenaOnly, so production behavior is unchanged. Tested in isolation
by setting the policy directly: small decimals -> NUM_DEC (decode round-trips),
oversized -> NUM_BIG fallback, ArenaOnly -> NUM_BIG.
Query constants must encode the same way as stored rows or equality lookups
and prefilters miss. Thread the loaded root's decimal policy through constant
encoding.

- BinaryIndexStore mirrors the root's decimal_encoding (like lex_sorted_string_ids)
  and exposes it via decimal_encoding().
- value_to_otype_okey: under InlineWhenFits a fitting decimal constant encodes
  inline (XSD_DECIMAL_INLINE) to match the stored inline row; otherwise it falls
  back to the NumBig arena handle lookup, as before.
- value_to_otype_okey_simple gains a Decimal arm (previously Unsupported): an
  inline-eligible decimal narrows the bound-object prefilter with no arena
  round-trip (issue #1328). Arena decimals stay Unsupported (un-narrowed scan,
  never NotFound) since this helper has no (graph, predicate) context.

Inert under ArenaOnly: every loaded root is arena-only today, so decimal
constants take the unchanged arena path.
Complete the decode side for inline xsd:decimal. The strategy is to
materialize NUM_DEC to FlakeValue::Decimal everywhere rather than carrying it
as an EncodedLit: inline decimals decode cheaply from o_key alone (no arena,
unlike NUM_BIG), so materialization is not a cost, and it keeps them on the
ordinary value path where equality/aggregate/sort already compare by canonical
BigDecimal against decoded sources (VALUES, BIND, novelty, arena decimals).

- late_materialized_object_binding declines NUM_DEC (returns None) so callers
  materialize via decode_value/decode_value_v3.
- Verified every other decode site is already correct: join probe/bounds/
  batched paths fall through to decode_value_v3; spool remap leaves the
  self-describing o_key untouched; decode_value_from_kind resolves
  NUM_DEC -> XSD_DECIMAL_INLINE then decodes; export's NotFound fallback only
  covers dict-backed kinds.
- Confirmed the numeric MIN/MAX and scalar-agg fast paths exclude
  XSD_DECIMAL_INLINE via their is_numeric() gate — which is also required for
  correctness, since inline o_key order is not value order.
Inline decimals are equality-keyed, not value-ordered, so any fast path that
compares or orders by o_key must exclude XSD_DECIMAL_INLINE or it would return
wrong ranges/order/counts once inline decimals are written.

- is_post_desc_orderable already excludes XSD_DECIMAL_INLINE (0x0020 is outside
  the is_numeric/is_temporal ranges); documented the exclusion so it is not
  later 'fixed' into the orderable set. This gates the reverse-POST ORDER BY
  LIMIT fast path.
- fast_count numeric-compare overlay lane: XSD_DECIMAL_INLINE now joins the
  unsupported-numeric set (defer) instead of falling through to the
  non-numeric 'no match' arm, which would have undercounted COUNT(?o cmp k)
  over inline decimals. The base lane already deferred for any non
  integer/double o_type.
- fast_star_const_order_topk numeric '>' filter now declines to its fallback
  when a row carries a numeric o_type it can't compare by o_key (inline decimal
  or arena NUM_BIG), instead of silently dropping the row. This also closes the
  pre-existing arena-decimal gap in that benchmark fast path.

MIN/MAX and scalar-agg numeric fast paths already exclude it via is_numeric.
Make the inline-decimal format live: a full rebuild now writes a v3 root under
DecimalEncoding::InlineWhenFits, so small exact decimals are stored inline and
only large/high-precision values fall back to the NumBig arena. Existing
ledgers keep running on their current format and adopt inline decimals on their
next full reindex.

Single-source invariant: the full-rebuild resolver policy and the output root
version come from one value (shared.decimal_encoding threaded into Fir6Inputs),
so a reindex can never inline-encode while writing a v2 root. Incremental
builds inherit the base root's policy (from_index_root for the resolver,
from_old_root preserves decimal_encoding for the output), staying sticky.
Bulk import stays arena-only for now (internally consistent; inlines on first
reindex).

Decode/serialize sites completed for the new format:
- db.rs core snapshot metadata decoder accepts v3 (header layout unchanged).
- build_o_type_table maps XSD_DECIMAL_INLINE -> xsd:decimal.
- resolve_datatype_sid returns xsd:decimal for XSD_DECIMAL_INLINE, so decoded
  inline decimals carry the correct datatype on output.

Tests: end-to-end round-trip across the inline/arena boundary, equality-constant
match (issue #1328 narrowing), and a novelty-vs-inline differential proving the
two representations are observably identical by value and datatype.
The SPOT/rebuild stats path (stats_record_from_v2 -> otype_to_value_type_tag)
mapped OType by value, but XSD_DECIMAL_INLINE had no arm and fell through to
ValueTypeTag::UNKNOWN. Since a full rebuild now writes inline decimals, that
path would report inline-decimal properties as UNKNOWN instead of DECIMAL,
diverging from the incremental path (which derives the tag from the declared
datatype IRI and is already correct).

Map XSD_DECIMAL_INLINE to DECIMAL. Unlike NUM_BIG_OVERFLOW (which mixes BigInt
and BigDecimal under one ObjKind and stays UNKNOWN), the inline lane carries
only decimals, so the classification is unambiguous.
The global overlay-translation cache keyed on (ledger, snapshot_t, overlay_epoch,
store_max_t, to_t, g_id, index). A full reindex can replace an arena-only (v2)
root with an inline-decimal (v3) root at the SAME index_t — a pure re-encode of
the same committed data — so store_max_t doesn't change but a novelty decimal
now translates to a different (o_type, o_key): an inline XSD_DECIMAL_INLINE key
vs a NUM_BIG_OVERFLOW arena handle. A cached arena-keyed translation served
against the inline root (or vice versa) would not match base rows, breaking
overlay assert/retract identity.

Add the store's decimal-encoding policy to the cache key. For the same committed
data a deterministic reindex reassigns dict ids identically, so the policy is the
one translation-affecting property that flips on a same-t re-encode.

Also clarify the bulk-import comment: import writes arena-only roots and adopts
inline decimals on first reindex (its object-resolution path is separate from the
rebuild resolver).
Bulk import has its own object-resolution path (ImportSink::resolve_object_value
via the turtle parser -> SpoolContext, separate from the rebuild resolver), so
it previously always routed decimals to the shared NumBig pool and wrote v2
arena-only roots. Wire it to the inline-decimal format like a full rebuild.

- SpoolConfig/SpoolContext carry a decimal_encoding policy; the Decimal arm
  mirrors the resolver: under InlineWhenFits a fitting decimal encodes inline
  (NUM_DEC) with no pool handle, otherwise it falls back to the NumBig pool.
- A single IMPORT_DECIMAL_ENCODING = InlineWhenFits constant feeds both the
  spool object resolution and the written root version, so they can't diverge
  (same single-source invariant as the rebuild path).

Inline encoding also skips the shared NumBig pool insert (a locked global handle
allocation) for the common small-decimal case, so it removes work from the
import hot path rather than adding it.

Test: bulk import of inline-eligible + arena-overflow decimals writes a v3 root
and round-trips every value exactly with the xsd:decimal datatype.
Apply cargo fmt to the inline-decimal encoding primitives and the
decimal-exactness test additions (line wrapping only; no behavior change).
…overflow

- Document that CommitResolver is constructed only in tests; live indexing uses
  SharedResolverState (rebuild/incremental) and ImportSink (import). If it is
  ever wired into production it must call set_decimal_encoding from the root
  policy, since it defaults to ArenaOnly and would otherwise write arena
  decimals into a v3 inline root.
- Add a unit test for the negative-scale fold path in encode_decimal: an
  integer-valued decimal (1e18/1e19) folds to scale 0 and overflows the 57-bit
  mantissa -> arena fallback, and values past the fold-exponent limit (1e20+)
  take the early fallback. The prior boundary tests only used scale-0 values.
Replace the equality-keyed inline decimal layout with an order-preserving
base-10 float code, so inline decimals support range / ORDER BY pushdown in
addition to equality. The key stays canonical (equal values -> identical bits),
so everything built on the old codec (equality, dedup, joins, #1328 prefilter)
is unaffected.

Layout (magnitude is 63 bits, sign splits around the 2^63 midpoint):
  mag = [ exponent:6 (biased, exp10 in -32..=31) | significand:57 (17 digits) ]
  value > 0  -> key = 2^63 + mag
  value == 0 -> key = 2^63            (canonical midpoint)
  value < 0  -> key = 2^63 - 1 - mag  (more negative -> smaller key)
The significand is the coefficient normalized to 17 digits (MSD leading), so
same-exponent significands compare as integers; negatives complement the
magnitude like the f64 lane. Inline-eligible iff <= 17 significant digits and
exp10 in -32..=31, else arena fallback (decimals beyond ~32 integer/fractional
places spill — rare).

Proven by property tests: order-preservation over a numerically-sorted value
set and exhaustive pairwise (numeric cmp == key cmp), plus canonical-equality
across scale variants and zero spellings, and round-trip exactness.

This is the format-locking half. Query-side fast paths still exclude inline
decimals from pushdown (safe — an ordered key is also a valid equality key);
admitting them follows next.
Now that inline decimal keys are order-preserving, flip the fast-path guards
that excluded them to instead treat them as o_key-order-comparable:

- is_post_desc_orderable: admit XSD_DECIMAL_INLINE, so the reverse-POST
  ORDER BY ... LIMIT fast path eliminates the sort for decimal predicates.
- fast_count numeric-compare (base + overlay lanes): encode the threshold into
  the decimal key space (encode_numeric_threshold_for_otype gains
  XSD_DECIMAL_INLINE arms for Decimal/Long/BigInt thresholds) and compare keys
  directly. New otype_okey_order_comparable helper centralizes the comparable
  set {INTEGER, DOUBLE, DECIMAL_INLINE}.
- fast_star_const_order_topk: compare inline-decimal rows against an integer
  threshold's decimal key.

Mixed-predicate safety is preserved: a predicate with both inline and arena
(NUM_BIG) decimals spans two o_types, so the single-o_type uniformity checks
still bail to the general path — an inline-only scan never drops arena rows. A
double threshold against decimal rows, or a threshold too large to encode
inline, also declines. Cross-form integer thresholds are exact (10 and 10.00
share a key).

Integration test: ORDER BY (asc + DESC LIMIT), SELECT range FILTER, and COUNT
with decimal and integer thresholds over a decimal predicate all return
numerically correct results — including 0.05 vs 0.5, which the prior
equality-keyed layout ordered wrong.
MIN/MAX(?decimal) can now read the predicate's boundary o_keys instead of
scanning + decoding every row: the inline decimal key is order-preserving, so a
predicate's first/last POST key is its min/max value. minmax_numeric_post admits
XSD_DECIMAL_INLINE (the single-o_type checks still bail a mixed inline+arena
predicate), and numeric_binding_from_otype_okey decodes the boundary key to a
FlakeValue::Decimal.

Integration test extended: MIN/MAX over the decimal predicate returns the
numerically smallest/largest values (-1 and 1000.5).
A SELECT range filter on an object (FILTER(?o > k)) previously switched to the
POST index but post-filtered every row — no o_key-range narrowing, because
numeric comparison is cross-type (an integer bound matches integer, double, and
decimal rows under different o_types, so narrowing to one o_type's key range
would drop the others). Range narrowing was therefore gated to temporal types
only.

When the predicate's POST extent is uniformly XSD_DECIMAL_INLINE
(min_o_type == max_o_type), every value is an inline decimal with no arena spill
and no other types — so the cross-type hazard is absent and the scan can seek
the decimal key range. The order-preserving codec makes that range a contiguous,
value-sorted run, turning O(predicate size) into O(log n + |result|).

- fast_count::predicate_uniform_o_type: cheap manifest-extent probe (the same
  one COUNT uses; opens <=2 boundary leaves only when a predicate shares a leaf)
  exposing the uniform-o_type precondition. encode_numeric_threshold_for_otype
  is now pub(crate) for encoding bounds into the decimal key space.
- binary_scan open(): when the predicate is uniform inline-decimal and a bound
  is numeric, encode bounds as decimal keys and set the cursor's o_key range;
  the temporal narrowing path is unchanged (and skipped when this fires). The
  existing post-filter stays as the correctness backstop, and overlay ops are
  windowed to the range by existing machinery.

Tests: a uniform-decimal predicate narrows and stays correct (range filter,
COUNT, ORDER BY); a MIXED int+decimal predicate must NOT narrow — FILTER(?v > 4)
keeps the integer 5 alongside decimals 7.5/10.5, and ORDER BY interleaves both
types numerically.
Inline integer (encode_i64) and double/float (encode_f64) keys have always
been order-preserving, so the uniform-predicate range pushdown built for
decimals extends to them with no format change — it works on any existing
index, including un-reindexed ledgers.

- otype_okey_order_comparable now admits all inline integer subtypes
  (is_integer; overflow carries a different arena o_type, so an integer-subtype
  o_type guarantees an inline encode_i64 key) plus xsd:double/xsd:float and
  inline decimals.
- encode_numeric_threshold_for_otype encodes thresholds into the integer
  (encode_i64) and float (encode_f64) key spaces, not just decimal.
- The COUNT overlay lane family-routes per row (any integer subtype -> the i64
  threshold key, double/float -> the f64 key) instead of matching only
  XSD_INTEGER, so xsd:long/int/short counts now push down too.
- BinaryScanOperator SELECT range narrowing accepts any uniform order-preserving
  numeric predicate (not just decimal), encoding bounds into the predicate's
  o_type key space.

Test: a uniform xsd:integer predicate narrows correctly (ORDER BY across
negatives, range FILTER, COUNT).
- Overlay/novelty safety (correctness): numeric range narrowing in the scan
  operator is now gated on overlay_free_single_graph(). The base manifest extent
  only proves the BASE rows are uniform; novelty can add a matching value of a
  different type (e.g. integer 100 to a decimal predicate) whose translated
  overlay op sorts outside the narrowed o_type/o_key window and would be dropped
  before the post-filter. With overlay present we fall back to the full base
  scan + merge + post-filter. (Temporal narrowing needs no gate: a cross-type
  value can't satisfy a temporal filter, so dropping it is harmless.)
- Decimal/big-int COUNT pushdown now actually engages: the numeric-compare
  detector (extract_simple_numeric_compare_threshold) extracted only Long/Double
  constants, so FILTER(?v > 0.1) always deferred. Extract Decimal and BigInt too
  — the fast paths already encode them into the matching key space.
- fast_star_const_order_topk numeric filter now declines (Ok(None) -> fallback)
  on a non-Long/Double threshold instead of returning an empty set; with decimal
  thresholds now extractable, the empty-set path would have silently undercount.
- Refresh stale docs: OType::XSD_DECIMAL_INLINE and DecodeKind::Decimal now
  describe the order-preserving base-10 float key; the scan-operator comment
  documents the temporal + uniform-numeric-no-overlay narrowing cases.

Test: a uniform-decimal base predicate with a cross-type novelty integer keeps
that integer under a range filter (would be dropped without the overlay gate).
…ype doc

- Add a regression test mirroring the decimal cross-type novelty guard with the
  base/overlay types swapped: a uniform xsd:integer base predicate plus a
  matching xsd:decimal novelty value must keep the decimal under a range filter
  (would be dropped if narrowing ignored the overlay). Confirms the overlay gate
  is type-agnostic across the generalized integer/double/decimal path.
- Update predicate_uniform_o_type docs: the precondition covers any
  order-preserving numeric type, not just inline decimals, and notes the caller
  must also ensure no overlay.
…ization

The comment claimed non-canonical integer widths and floats force the count to
defer; those are now o_key-order-comparable. Document that this is really the
arena NUM_BIG_OVERFLOW lane plus the dormant lossy-f64 XSD_DECIMAL lane.
@bplatz bplatz requested review from aaj3f and zonotope June 14, 2026 01:40
Base automatically changed from fix/decimal-exactness to main June 16, 2026 17:47

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order preserving fix and the numeric range optimization are really nice. Some CI failures you may want to look into before merging. Also, I made a note in #1355 but that PR fixes a condition that is still latent in this PR in fluree-db-query/src/binary_scan.rs:2668 that retains the integer-valued-double --> i64 shortcut that was corrupting indexed doubles -- just be sure to get those fixes into this PR before it's merged (possibly by merging #1355 first then updating this PR's code to match it). Here is Claude's detailed description of this, fwiw:

This branch keeps the **double's** OType (`ot == XSD_DOUBLE`) but encodes the
key with `ObjKey::encode_i64` instead of `encode_f64`. A reader decoding an
`XSD_DOUBLE` row interprets `o_key` via `decode_f64`, so an i64-encoded key
(e.g. `100.0` → `encode_i64(100)` = `0x8000000000000064`) is read back as a
subnormal/garbage f64 — silent data corruption over an indexed double predicate.
It is also internally inconsistent with the *overlay* path in
`dict_overlay.rs:487`, which for the same integral-double case emits
`ObjKind::NUM_INT` (a different OType entirely). The two paths disagree, so a
constant encoded here will not match an overlay row encoded there, nor a
resolver-written `encode_f64` base row.

This is exactly the shortcut PR **#1355** (fix/integer-float-index-corruption)
removes — in `binary_scan.rs` *and* the mirror at `dict_overlay.rs:487` (which
this PR does not touch at all). As written, #1335 ships with the corruption
still present.

Recommendation: **land #1355 first, rebase #1335 on top.** This PR's own
`FlakeValue::Decimal` arm (`:2756`) is correct and unrelated; the problem is the
pre-existing Double lane it inherits. On rebase there will be a textual conflict
in `binary_scan.rs` (this region) and in
`fluree-db-api/tests/it_decimal_exactness.rs` (both PRs append to it). Resolve by
taking #1355's removal of the shortcut and keeping #1335's new Decimal arm + new
tests.

(Strictly, this is not a defect *introduced* by #1335 — it is a cross-PR ordering
constraint. Flagging CRITICAL because merging #1335 without #1355 lands known
corruption, and the generalized double range pushdown this PR adds runs *over the
same corrupt keys*.)

Comment on lines +1896 to +1901
if let Some((v, _inclusive)) = bounds.lower.as_ref() {
range_min_okey = enc(v);
}
if let Some((v, _inclusive)) = bounds.upper.as_ref() {
range_max_okey = enc(v);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little confused here, because it seemed like bounds.lower/bounds.upper would be inclusive and would have ignored the _inclusive flag later when computing range_min_okey/range_max_okey. I see now the post-filter is where enforcement takes place. So for an exclusive filter it just seems to over-read by 1 at each boundary and then enforces exclusivity in post-filter rejection.

This is all fine, but maybe just a comment around line 1896 that makes that clear to avoid the validation exercise for future devs

Comment on lines 388 to +405
let over_threshold = match ot {
OType::XSD_INTEGER => batch.o_key.get(i) > thr_i_key,
OType::XSD_DOUBLE => batch.o_key.get(i) > thr_d_key,
// Inline decimals are order-preserving: compare keys when the
// threshold encodes to a decimal key, else decline.
OType::XSD_DECIMAL_INLINE => match thr_dec_key {
Some(k) => batch.o_key.get(i) > k,
None => {
saw_uncomparable_numeric = true;
false
}
},
// Numeric but not o_key-comparable (arena big numerics, other
// integer widths/floats): can't decide here.
_ if ot.is_numeric() || ot == OType::NUM_BIG_OVERFLOW => {
saw_uncomparable_numeric = true;
false
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems to guarantee correctness, but I thought perhaps that the fast-path would still support xsd:long, xsd:int, xsd:short, or xsd:float as those would be o_key-comparable as well?

bplatz added 2 commits June 23, 2026 17:29
Add the new decimal_encoding field to IndexRoot test-helper literals
pulled in from main, and reflow a decimal property-test closure.
The parallel remote-import commit loop received parsed chunks with a
direct blocking std::mpsc recv on the tokio worker thread. On a
single-worker (current_thread) runtime, this parks the worker and
starves the spawned remote producer task — which drives storage reads
and re-parks waiting for channel capacity. With max-inflight backpressure
of one chunk (small memory budget, e.g. CI runners) the producer can
never be re-polled, so the import hangs forever. This surfaced as the
load-dependent, Linux-CI-only timeout of
remote_import_matches_local_flake_count.

Receive off the worker via spawn_blocking, mirroring the serial arm which
already documents and avoids the same hazard. Add a max_inflight_chunks
builder setter and a deterministic regression test (yielding storage +
single in-flight) that hangs on the old code and passes on the new.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants