fix(query): stop corrupting integer-valued doubles read through the overlay#1355
Conversation
…verlay When a double/float predicate already has indexed data, a newly inserted integer-valued double (e.g. 55000.0) was read back as a tiny subnormal (2.71736e-319). The novelty overlay encoder optimized integral doubles to encode_i64 but paired the key with the datatype-derived OType (XSD_DOUBLE/ XSD_FLOAT), whose decode kind is F64 — so the reader ran decode_f64 over the i64-encoded bits, i.e. f64::from_bits(N). The persisted commits were always correct; the corruption was purely on the read path, so affected data is recoverable without re-ingestion. value_to_otype_okey now always encodes finite doubles with encode_f64, matching the encode-side guards already in resolver.rs and import_sink.rs. The unused twin value_to_obj_pair gets the same guard defensively. Adds a regression test that indexes a double, inserts an integer-valued double, and asserts an exact round-trip.
aaj3f
left a comment
There was a problem hiding this comment.
Good catch on this 👍
It seems after this PR's work, DictOverlay::value_to_obj_pair is unused anywhere but is not tracking as dead code because it's pub. If we want to keep it just in case, perhaps we at least downgrade to pub(crate) with #[cfg(test)] coverage or with some comment like // retained for parity with value_to_otype_okey
Also, I was also just reviewing #1335 but that PR sustains the bugs that you've fixed in this PR. It's probably worth making sure that this PR lands first and (if you make the test additions I suggested in the review comments below), then in #1335 you can (a) have test-guards that will keep the issue/bug from persisting in that PR and also (b) you can have some test-driven requirements for the rework you'll want to do in that branch's it_decimal_exactness.rs code.
| #[tokio::test] | ||
| async fn integer_valued_double_over_indexed_predicate_is_not_corrupted() { | ||
| // Regression (fluree/db-r#142): an integer-valued double inserted into a | ||
| // predicate that already has INDEXED double/float data was silently | ||
| // corrupted to a tiny subnormal. The novelty overlay encoder paired the | ||
| // datatype-derived OType (F64 decode) with an i64-encoded key, so the | ||
| // reader ran decode_f64 over integer bits (55000.0 -> 2.71736e-319). | ||
| // The trigger needs a persisted index for the predicate; novelty-only | ||
| // ledgers encode the value correctly. | ||
| let fluree = FlureeBuilder::memory() | ||
| .with_ledger_cache_config(fluree_db_api::LedgerManagerConfig::default()) | ||
| .build_memory(); | ||
| let ledger_id = "double/indexed-overlay:main"; | ||
|
|
||
| let (local, handle) = start_background_indexer_local( | ||
| fluree.backend().clone(), | ||
| Arc::new(fluree.nameservice_mode().clone()), | ||
| fluree_db_indexer::IndexerConfig::small(), | ||
| ); | ||
|
|
||
| local | ||
| .run_until(async move { | ||
| let ledger = genesis_ledger(&fluree, ledger_id); | ||
|
|
||
| // Seed + index an integer-valued double for ex:amount. | ||
| let result = run_sparql_update( | ||
| &fluree, | ||
| ledger, | ||
| r#" | ||
| PREFIX ex: <http://example.org/> | ||
| PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
| INSERT DATA { ex:seed ex:amount "28575.0"^^xsd:double . } | ||
| "#, | ||
| ) | ||
| .await; | ||
| trigger_index_and_wait(&handle, ledger_id, result.receipt.t).await; | ||
| let ledger = fluree.ledger(ledger_id).await.expect("load ledger"); | ||
|
|
||
| // Insert a NEW integer-valued double into the now-indexed predicate; | ||
| // it lands in novelty and is read back through the overlay merge. | ||
| let result = run_sparql_update( | ||
| &fluree, | ||
| ledger, | ||
| r#" | ||
| PREFIX ex: <http://example.org/> | ||
| PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
| INSERT DATA { ex:a ex:amount "55000.0"^^xsd:double . } | ||
| "#, | ||
| ) | ||
| .await; | ||
| let ledger = result.ledger; | ||
|
|
||
| let query = r" | ||
| PREFIX ex: <http://example.org/> | ||
| SELECT ?amount WHERE { ex:a ex:amount ?amount . } | ||
| "; | ||
| let result = support::query_sparql(&fluree, &ledger, query) | ||
| .await | ||
| .expect("query"); | ||
| let sparql_json = result | ||
| .to_sparql_json(&ledger.snapshot) | ||
| .expect("to_sparql_json"); | ||
| let values = binding_values(&sparql_json, "amount"); | ||
| assert_eq!(values.len(), 1, "expected exactly one row, got {values:?}"); | ||
| let got: f64 = values[0].parse().expect("double result"); | ||
| assert_eq!( | ||
| got, 55000.0, | ||
| "integer-valued double over an indexed predicate must round-trip \ | ||
| exactly (was corrupted to a subnormal), got {got:e}" | ||
| ); | ||
| }) | ||
| .await; | ||
| } |
There was a problem hiding this comment.
This test is good for the bug that prompted this PR. Just small additional suggestions:
- Also assert the datatype of the returned binding is
xsd:double(not silently downgraded toxsd:integerorxsd:long), to lock in that uniform-f64 encoding doesn't change the reported type. - Add a negative/boundary companion value (e.g.
-55000.0and a large integral double like9.007199254740992e15) in the same indexed predicate to cover the sign-flip branch ofencode_f64/decode_f64and the i64-range edge. Not required for the fix, but cheap insurance.
There was a problem hiding this comment.
Given that d.is_finite() will guard against conditions like NaN/Inf, the Err(_) arm is effectively unreachable. It may just be worth a comment if nothing else to make it clear that Ok((ObjKind::NULL, ObjKey::from_u64(0))) is never reachable/returnable for a truly finite d
Address review feedback on the integer-valued-double overlay fix: - it_decimal_exactness: assert the returned binding stays xsd:double (no silent downgrade to integer/long) and add boundary companions -55000.0 (sign-flip branch) and 2^53 (largest exact integral double). - dict_overlay: value_to_obj_pair is now pub(crate) + #[expect(dead_code)] with a note explaining it is kept for parity with value_to_otype_okey. - binary_scan / dict_overlay: drop the redundant is_finite() pre-check in the Double arm; encode_f64 already rejects NaN/Inf, so its Err arm is the single reachable "unrepresentable double -> NULL sentinel" path.
2fbbc20 to
b473e09
Compare
Summary
An integer-valued double (e.g.
55000.0) inserted into axsd:double/xsd:floatpredicate that already has indexed data was silently read back as a tiny subnormal (2.71736e-319). No error, deterministic, and it surfaces in queries and aggregates (SUMover the column reads ≈ 0).Root cause
The novelty→overlay encoder
value_to_otype_okey(used when reading novelty over an indexed predicate) optimized integer-valued doubles toencode_i64, but paired the key with the datatype-derivedOType(XSD_DOUBLE/XSD_FLOAT), whose decode kind isF64. The reader then randecode_f64over theencode_i64bits:This is the same anti-pattern already guarded at three encode-side sites (
resolver.rs,import_sink.rs); the read-side overlay was the one path that still had it.Scope / impact
55000.5) and integer types were never affected.Changes
binary_scan.rs—value_to_otype_okeyalways encodes finite doubles withencode_f64.dict_overlay.rs— same guard applied to the unused twinvalue_to_obj_pair(defensive).it_decimal_exactness.rs— regression test: index a double, insert an integer-valued double, assert exact round-trip. Verified it fails without the fix with the exact2.71736e-319signature.Testing
fluree-db-queryunit (1078) + decimal exactness (14), batched-overlay (4), aggregates (16), SPARQL (136) all green.fmt+clippyclean.