fix(sparql): aggregate and sub-select edge cases#1374
Open
bplatz wants to merge 5 commits into
Open
Conversation
…ts, parse bare GROUP BY built-ins Three SPARQL query correctness fixes plus review hardening (W3C subquery/sq11 now passes; eval 250 -> 256, no regressions). GROUP BY bare built-in/function call: - parse_group_condition only accepted a bare variable or a parenthesized expression, so `GROUP BY DATATYPE(?v)` was silently dropped and the query degraded to a single implicit group. Parse the bare BuiltInCall/FunctionCall form, restoring the stream position on a partial parse so it cannot desync. Sub-SELECT ORDER BY/LIMIT/OFFSET scoping: - A sliced sub-SELECT was seeded per parent row, so an inner LIMIT applied per row and leaked rows past the slice into the outer join. SPARQL 1.1 18.2 sub-SELECTs are uncorrelated: add SubqueryPattern.uncorrelated, set it at the SPARQL lowering sites, and force materialize-once + hash-join for those subqueries. JSON-LD subqueries keep their existing per-row behavior. - Fix SubqueryOperator::next_batch returning None (operator exhausted) when a parent batch matched no subquery row, which dropped every later batch. Blank-node property lists: - `[ :p ?o ]` was a parser placeholder that discarded its inner triples, dropping the nested variables. Desugar to a fresh blank node plus its nested triples in subject and object position, and fold them into CONSTRUCT / INSERT / DELETE templates as well as WHERE BGPs. The synthetic label uses a character outside PN_CHARS so it cannot collide with a user blank node. Adds parser unit tests and SPARQL integration regression tests, and documents the supported features in docs/reference/compatibility.md.
Over an implicit single group (no GROUP BY) whose pattern matches nothing, SUM and AVG returned an unbound variable instead of their additive identity. COUNT already returned 0, so the empty-group behavior was internally inconsistent. NumericAcc::finalize_sum / finalize_avg now return "0"^^xsd:integer when the accumulated count is 0, matching SPARQL 1.1 18.5.1.3 / 18.5.1.4 and the W3C agg-avg-03 / agg-empty-group-sum tests. MIN/MAX/SAMPLE keep returning unbound (no identity element), and a query with GROUP BY over an empty pattern still returns zero rows (no groups). The streaming and DISTINCT aggregate paths both delegate to these helpers, so all are covered. Updates the empty-aggregate unit tests, the inner-OFFSET subquery regression test (empty SUM is now 0, not unbound), and the compatibility docs.
… identity 0 The scalar-aggregate fast paths bypass the generic aggregate.rs, so they needed the same empty-multiset identity fix: an indexed AVG(?o) over an absent or empty predicate returned unbound instead of "0"^^xsd:integer, diverging from the generic path. fast_predicate_scalar_agg.rs now returns the integer identity 0 for the empty AVG case (both the absent-predicate empty_result and the count==0 finalize); empty SUM already returned 0 there. Refreshes the now-stale "empty SUM is Unbound" documentation in fast_count.rs: since empty SUM is the identity 0 (= the COUNT identity), SUM(?o cmp K) equals the match count for every input, and the empty-input branches deferring to the general pipeline are a conservative choice rather than a correctness requirement. Adds an indexed regression for absent-predicate SUM(?o)/AVG(?o) and updates the existing absent-predicate SUM(compare) test (the shared general result is now 0).
A dynamic value error in a projection / BIND / ORDER BY expression (e.g. arithmetic on incompatible operand types from a heterogeneous predicate) aborted the whole query with HTTP 400 instead of leaving that variable unbound for the solution. Per SPARQL 1.1 §18.5 Extend, an expression error yields an unbound value for that variable and the remaining solutions are still returned. Queries now run with strict bind errors OFF (the spec-compliant default); transactions keep strict mode via execute_where_streaming so a computed WHERE value cannot silently become unbound before it is written. The BIND/Extend demotion is narrowed to dynamic value errors (arithmetic, comparison): structural errors — a built-in called with the wrong arity, an unknown datatype IRI — still surface as a query error, since they describe a malformed query rather than dirty data. Dynamic type mismatches already evaluate to unbound without raising. Covers W3C functions/plus-1-corrected, plus-2-corrected, and project-expression/projexp02. Adds integration tests for projection and WHERE-BIND / ORDER BY expression errors and a unit test for the new classification, and documents the semantics in docs/reference/compatibility.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses some SPARQL aggregate and sub-select edge cases surfaced while exercising analytical queries. These are small cleanups that bring a few corner cases in line with how the rest of the engine already behaves — no large feature work.
What's here
ORDER BY/LIMITscoping — an innerORDER BY … LIMIT non a sub-SELECT is now evaluated to its complete sliced result before it joins the enclosing pattern (instead of the slice effectively applying after the join). Also fixes a related case where a parent batch that matched nothing could end the join early.SUM/AVG— over an empty match these now return0, matching whatCOUNTalready did, rather than an unbound cell. Covers both the generic and the indexed scalar-aggregate fast paths.GROUP BYon a bare built-in —GROUP BY DATATYPE(?v)(a built-in / function call without surrounding parens) now groups correctly; the unparenthesized form was previously dropped, collapsing the query to one implicit group.[ :p ?o ]in subject/object position (and inCONSTRUCT/INSERT/DELETEtemplates) now binds its nested variables.SELECT/BIND/ORDER BYexpression (e.g. arithmetic over a heterogeneous column) leaves that one cell unbound for the row instead of failing the whole query; structural errors (wrong arity, etc.) still surface, and transactions keep strict evaluation.Testing
Closes: #1362