feat: openCypher 9 front-end (parser, read/write, paths, relationships) by bplatz · Pull Request #1361 · fluree/db

bplatz · 2026-06-23T13:14:50Z

Summary

Adds an openCypher 9 front-end on top of the RDF 1.2 edge-annotation work in
the base branch. Cypher parses and lowers to the same query IR and transaction
pipeline as JSON-LD and SPARQL — the planner, executor, and result formatter
are shared — so this is a new surface, not a new engine. Cypher relationships
map onto the base branch's edge-annotation primitive, so property-graph edges
and RDF quoted-triple annotations are the same data read from two angles.

What's included

Read path

Clauses: MATCH / OPTIONAL MATCH / WITH / UNWIND / RETURN, UNION /
UNION ALL.
CALL { … } subqueries: explicit imports (a, b), (*), uncorrelated
broadcast, inner UNION, nesting, and strict scope/shadowing validation with
correlated-aggregate soundness.
Paths: shortestPath / allShortestPaths; bounded variable-length paths with
trail enumeration and relationship-uniqueness; binding a path (MATCH p = …)
or a relationship list (-[r:T*1..n]->).
Relationship/path values: relationships(p), nodes(p), type(r),
startNode(r) / endNode(r), properties(r), edge property access.
Expressions: arithmetic (incl. ^), comparison/boolean, string predicates,
IN, CASE, list/map literals and indexing, list comprehensions / reduce /
quantifiers, pattern comprehensions, EXISTS, parameters.
A scalar-function pass (string, math, casts, id/elementId), aggregates
with implicit grouping + WITH-as-HAVING, and collect() carried through a
WITH boundary.
Map values and map projection (n{.name, .*}).

Write path

CREATE / SET / REMOVE / DELETE / DETACH DELETE; node and relationship
MERGE (with ON CREATE SET); WITH before a write.

Surfaces

HTTP application/cypher query route and CLI --format cypher-json
(Neo4j-compatible output, with jsonld opt-in), both policy-enforced like the
JSON-LD / SPARQL paths.

Docs

New concept doc, cookbook, and a tracked openCypher support matrix
(docs/reference/cypher-support-matrix.md) marking each feature
supported / divergent-by-design / deferred.

Model divergences (intentional)

Fluree enables a unified layer accessible by both SPARQL and Cypher: nodes are
IRIs (labels are rdf:type), relationships are edge annotations,
id()/elementId() returns the IRI string, and unbounded variable-length
traversal is reachability (bounded enumerates trails).

Testing

Covered by the Cypher parser/lowering unit suites and the
it_query_cypher / cypher_http_integration integration suites (read, write,
paths, relationships, CALL, null/type semantics), plus the cross-surface
edge-annotation tests shared with the base branch.

The Cypher language layer on top of the RDF 1.2 + query-operator engine: a front-end that parses openCypher and lowers to the shared query/transact IR, so the same executor powers Cypher, SPARQL, and JSON-LD. - fluree-db-cypher: lexer, parser, AST, read + write lowering, diagnostics - api: query_cypher / transact_cypher, conditional writes (MERGE / guarded DELETE via probe-then-stage), cypher-json (Neo4j-compatible) output - transact: lower_cypher_update (CREATE / SET / REMOVE / DELETE / MERGE) - cli: cypher query + update (--cypher / .cypher detection), cypher-json format - server: application/cypher HTTP query + update endpoints - CSV (neo4j-admin header convention) import to RDF 1.2 annotations Cypher relationships map to RDF 1.2 edge annotations (LPG identity). See docs/concepts/cypher.md for the supported surface. One bare-DELETE relationship-guard test is #[ignore]'d pending a focused fix in the per-edge-annotation probe path (its untyped rel-var probe is hidden by the edge-annotation read-side firewall).

Replace the planning-tone Cypher docs now that the front-end has landed: - concepts/cypher.md: drop the "v1 preview" framing and GQL_CYPHER_SUPPORT.md references; add a Running Cypher section (Rust API / CLI / HTTP application/cypher); expand the write surface (CREATE / SET / REMOVE / DELETE / MERGE / MATCH…CREATE); replace the stale deferred list with an accurate "Not yet supported". - guides/cookbook-cypher.md: task-oriented recipes — model a property graph, query relationships, MERGE find-or-create, updates/deletes, paths and shortest path, aggregation, and cross-surface round-trips. - concepts/edge-annotations.md: cross-link Cypher as the property-graph front-end (correct the "not part of this release" note). - Wire both into SUMMARY.md and the guides README.

…egate exprs - expressions: `%` (Function::Mod), `XOR` (desugars to the boolean-filter IR), and expression-valued aggregate arguments (`sum(n + 1)`) - reads: `WITH *` - writes: `SET n = {...}` full map-replace; the bare-DELETE relationship guard now probes candidate nodes from the original MATCH rather than an untyped rel-var pattern (which the edge-annotation read-side firewall hid), so `DELETE n` on a node with relationships correctly errors - docs: move the now-supported constructs out of "Not yet supported" (keeping `^`, which has no IR support yet); add granular MERGE / write-MATCH limits

- labels(n): a node's Cypher label strings from live rdf:type assertions (overlay-aware — reflects uncommitted novelty, not just the persisted index) - type(r): the relationship type string for a named relationship variable, read from f:reifiesPredicate on the reifier - unbound / non-node / non-relationship arguments yield null Adds eval/metadata.rs, Function::Labels / Function::RelType, and the Cypher lowering for labels()/type(); moves both out of the docs' "Not yet supported".

Cypher query strings are written as raw strings for consistency even when a particular query has no inner quotes; suppress the lint per-file so it stops failing clippy on new tests.

MERGE now supports a single standalone relationship pattern `(a)-[:T]->(b)` in addition to single-node MERGE. The whole path is the match key: it lowers to one NOT EXISTS guard spanning both endpoint identities plus the directed type triple, and when absent the create branch mints both endpoints and the edge (with its f:reifies* reifier bundle) exactly once. ON CREATE SET routes to either endpoint node variable. Deferred with clear errors: property-bearing MERGE relationships (need an annotation-sidecar guard for correct match semantics), multi-hop / multi-part MERGE, undirected MERGE relationships, and ON MATCH SET on a relationship MERGE. Also folds in a stray cargo-fmt rewrap in eval/metadata.rs.

Extends relationship MERGE to allow a leading MATCH binding the endpoints — `MATCH (a),(b) MERGE (a)-[:KNOWS]->(b)`. This is a per-row find-or-create: the NOT EXISTS guard runs once per matched (a,b) row against the pre-write snapshot, exactly like SPARQL `INSERT ... WHERE { ... FILTER NOT EXISTS { ... } }`, so it needs no probe — it lowers to a single Txn. Endpoint terms are now bound-aware: a MATCH-bound variable references the existing node in both the guard and the create branch; an endpoint introduced by the MERGE still gets a fresh existential probe (guard) and a fresh blank node (create), so a mixed pattern like `MATCH (a) MERGE (a)-[:HAS_PET]->(p:Pet {name:"Rex"})` creates one Pet per matched a. The lower_update guard is relaxed accordingly: a leading MATCH is allowed before a single relationship MERGE (still rejected before a node MERGE or alongside another write), and OPTIONAL MATCH before a relationship MERGE is rejected (partial-reifier-bundle hazard, same as CREATE).

Fix a doc contradiction: ON MATCH SET is supported on single-node MERGE only, not on a relationship MERGE (lower_merge rejects it; node MERGE routes ON MATCH SET through the conditional-probe path before lowering). Also document standalone-vs-bound endpoint behavior (standalone mints fresh nodes even when one endpoint already exists; bind endpoints with a leading MATCH to reuse them), add a cartesian-product warning for unfiltered per-row MERGE, and a style note on repeating labels on bound endpoints. Test gaps closed: node MERGE with a leading MATCH and OPTIONAL MATCH before a relationship MERGE are now asserted in the deferred-shapes list; ON CREATE SET on a MATCH-bound head endpoint has an API test (fires once on create, not on re-run).

Supports a WITH projection between a MATCH and a write clause, limited to the subset that maps cleanly onto the single-Txn where-pattern stream: - pass-through variables (WITH a, b) - renames (WITH a AS p) — lowered to a Bind - computed non-aggregate aliases carried into the write (WITH a, a.birthYear + 30 AS adultAt SET a.adultAt = adultAt) — lowered to a Bind, reusing the write-WHERE expression lowering (which already rejects aggregate calls, so aggregation is refused for free) - a post-projection WHERE that gates which rows are written WITH applies Cypher scoping by narrowing bound_vars to the projection horizon: a dropped SET/REMOVE/DELETE target is rejected as unbound, and a dropped node referenced in a CREATE position becomes a fresh node — the execution stream keeps every matched binding, so the narrowing only gates which names a later write may reference. Deferred with clear errors: aggregation, DISTINCT, and ORDER BY / SKIP / LIMIT on a write-side WITH (they need query-level grouping/slice the single-Txn model does not carry).

Cypher untyped variable-length paths -[*], -[*m..n] now lower to a wildcard transitive PropertyPathPattern instead of being rejected. The path operator follows ANY node->node edge per hop, with bounds carried as min_hops/max_hops. IR (ir/path.rs): PropertyPathPattern gains `wildcard: bool` and `min_hops`/`max_hops: Option<u32>`. Existing constructors default them (wildcard=false, bounds=None), so typed paths and SPARQL/JSON-LD @path are byte-for-byte unchanged; a new `new_wildcard` constructor builds the untyped form. Operator (property_path.rs): forward_step/backward_step branch on wildcard — a subject-prefix SPOT scan (resp. object-prefix OPST) that keeps only Ref objects and skips the reserved predicates rdf:type and f:reifies* (data properties are excluded by the Ref filter; the reifier sidecar and class memberships by the exclusion). The four BFS primitives (forward/backward/closure/path_exists) gain depth tracking and emit/expand gates derived so that the unbounded, non-wildcard case reproduces the original */+ behavior exactly. Bounded untyped paths use reachability semantics (each node reachable within the hop range, by shortest path). Lowering (lower/pattern.rs): untyped variable-length rel -> wildcard path with bounds; undirected untyped is rejected (the operator drives a single direction from the bound endpoint). Tests: lowering (it_lower.rs untyped_*) and end-to-end execution (it_query_cypher.rs cypher_untyped_path_*) over a mixed-type KNOWS/FOLLOWS chain, proving mixed-edge traversal, hop bounds, incoming direction, and exclusion of data properties / rdf:type / reifier edges. Typed property-path + SPARQL suites unchanged (36 + 139 + 493 pass).

…arams Adds a first-class map value type to the query engine and wires the full Cypher map-value surface on top of it. Value model: - `Binding::Map(Vec<(Arc<str>, Binding)>)` — ordered entries; identity (equality / hash / group key) is key-order-insensitive, while the Vec preserves insertion order for display. Opaque to triple matching, index search, and flake generation (a map is a projection value, not an RDF term). - `Expression::Map(Vec<(Arc<str>, Expression)>)` in the IR — keys are static, values are per-row sub-expressions; duplicate keys resolve last-wins at construction. (Chosen over a `MakeMap` function with interleaved args so intent survives into planner/eval/debug.) Surface: - Map literal `{k: expr}` in expression position — parser (`{` in primary), AST `Expr::Map`, lowering to `Expression::Map`, eval to `Binding::Map`. - `properties(n)` → a map of a node's data properties (literal-valued, non-reserved predicates; multi-valued → list) and `keys(n)` → their sorted names, both via a subject-prefix scan in eval/metadata. - Object `$params` substitute to a map value (was rejected). Rendering: maps render as JSON objects in the JSON formatters (jsonld/typed) and native objects in cypher-json (cypherify now recurses plain objects); tabular formatters (sparql/xml/csv) treat a map like a list (one shared arm).

…TH before DELETE Two review findings on the recent untyped-path and WITH-before-write work. 1. Bounded wildcard paths could omit a valid in-range endpoint. The transitive operator keyed `visited` by node, so a node first reached below `min_hops` on a shorter path was suppressed and never re-reached on a longer in-range path — and the bound-bound form (`path_exists`, which checks the target before the visited gate) disagreed with the bound-unbound traversal. Bounded paths now run a layered (node, depth) BFS (`traverse_bounded`, and the per-start loop in `compute_closure`), correct for any lower bound and consistent across forms. Only untyped paths reach the operator with bounds (typed bounded ranges lower to a UNION of chains), so typed/SPARQL paths are untouched. An UNBOUNDED lower bound above 1 (`-[*2..]->`) can't be evaluated soundly with node-reachability state and no depth cap, so it is rejected at lowering. 2. WITH before DELETE mis-handled renames/horizon. The delete classifier and the rel-var edge map key off the raw MATCH variables, so a WITH rename (`WITH r AS edge DELETE edge`) or a dropped target (`WITH a DELETE r`) was mis-routed or — worse — silently deleted an out-of-scope edge. WITH before DELETE is now rejected with a clear error (both in `detect_conditional` and `lower_update`); WITH before CREATE/SET/REMOVE is unaffected (those honor the narrowed scope via require_bound). Tests: diamond bound/unbound consistency + unbounded-lower-bound rejection (it_query_cypher), WITH-before-DELETE rejection at lowering and end-to-end (no silent delete). Also documents the `try_eval_to_binding` Map passthrough that lets a map var nest in another value (`{props: p}`).

@list

… lang + list order Two more review findings on the untyped-path and map-value work. 1. path_exists (bound-bound bounded paths) still used a node-only visited set, so an intermediate that must be revisited at a later depth was suppressed — disagreeing with the now-layered bound-unbound traversal (A->B, A->C, C->B, B->D; `A-[*3..3]->D` via A-C-B-D). Bounded path_exists now defers to the same `traverse_bounded` the bound-unbound form uses and checks target membership, so the two can never disagree. 2. properties(n) dropped language tags and list order. It now builds a lang-aware value binding (Binding::lit_lang for an rdf:langString, so JSON-LD/typed output keeps @language) and carries each value's FlakeMeta::i, ordering a multi-valued @list property by its stored index. Tests: revisit-intermediate bound/unbound consistency (`*3..3`), and a langString + @list properties() round-trip through JSON-LD output.

Maps the openCypher scalar functions whose semantics match an existing engine function 1:1: toUpper→Ucase, toLower→Lcase, round→Round, ceil/ceiling→Ceil, floor→Floor, rand→Rand. Deferred where semantics differ (substring is 0- vs 1-indexed, replace is literal vs regex) or where the engine lacks an evaluator (sqrt/sign/split/trim/^).

Adds the expression-level list-iteration family on a shared local-binding foundation: - list comprehensions `[x IN list WHERE pred | expr]` - `reduce(acc = init, x IN list | expr)` - `all/any/none/single(x IN list WHERE pred)` Foundation (the reusable win): one `RowWithLocals` overlay binds the loop variable(s) per element over the base row (dynamic dispatch on the base so nested comprehensions don't recurse the type). Four new IR variants drive it — `ListComprehension`, `Reduce`, `ListPredicate`, and `Member` for eval-time property access — chosen over interleaved-arg function calls so intent survives into planner/eval/debug, with loop vars excluded from `referenced_vars` and capture-aware `substitute_var`. Loop-local property access is first-class (the load-bearing edge): `x.name` on a comprehension variable lowers to eval-time `Member` — a map element looks the key up, a node element scans the property — instead of an outer pattern join (which only works for query variables). The cypher lowering gains a scope stack so a body's `var` resolves to a fresh synthetic id and `var.prop` becomes member access; outer-variable property access keeps the efficient auxiliary-pattern path. Semantics: null/non-list input yields null (not an empty list); empty-list identities are all/none = true, any/single = false; duplicate map keys last-win. The list position may aggregate (`[x IN collect(p) | x.name]`). Write-side `MATCH … WHERE` rejects these for now. Also wires the clean scalar functions toUpper/toLower/round/floor/ceil/rand, and relaxes object/list-of-map params now that map values exist.

…as rewrite Two follow-up review findings on list iteration. 1. EXISTS inside a list comprehension / reduce / list predicate reached eval unresolved and silently evaluated to false: the FilterOperator EXISTS resolver only descends through Call (and an EXISTS there usually references the loop-local element, which needs per-element async subquery evaluation the synchronous per-element eval path can not do). Reject it at lowering with a clear error (via filter::contains_exists on the built expression). 2. The UNWIND $param alias rewrite (collect/rewrite/replace) descended into comprehension scoped bodies without checking whether the loop/acc var shadows the UNWIND alias, so `UNWIND $rows AS row ... [row IN xs | row]` could rewrite the inner loop var. The three alias walkers now always rewrite the outer list/init position but skip the scoped body/filter/map when the binder shadows the alias. Tests: EXISTS-in-iteration rejection (it_lower) and a binder-aware rewrite unit test (shadowed loop var untouched, non-shadowed alias rewritten).

Adds map projection — build a map value from a node variable. `.key` selectors desugar to `{key: n.key}` (reusing the property-accessor lowering, so they join via the aux pattern for outer vars or eval-time member access for loop-locals); `key: expr` adds an explicit entry; and `n{.*}` lowers to `properties(n)`. Mixing `.*` with other selectors is deferred (needs a runtime map merge) — rejected with a clear error. Parser: a variable immediately followed by `{` is a projection (distinct from the bare map literal `{…}`). New AST `Expr::MapProjection` (boxed to keep `Expr` small) with `Property` / `AllProperties` / `Literal` selectors. Builds on the existing Binding::Map and Member machinery.

A correlated subquery that collects a projection over each match into a list — `RETURN [(a)-[:KNOWS]->(b) WHERE b.age > 30 | b.name]`. The inner pattern's existing variables correlate with the outer row (via the shared registry, like EXISTS); new ones bind inside the subquery. Mechanism: reuses the EXISTS async per-row resolution path (FilterOperator + BindOperator, gated by contains_exists). A new IR `PatternComprehension { patterns, projection }` is resolved per outer row by seeding the row's bindings, running the subquery, evaluating the projection per match, and collecting into a list. Since `FlakeValue` has no list variant, the resolver substitutes a new `Expression::Resolved(Binding)` leaf (rather than EXISTS's `Const(Bool)`) which the synchronous evaluator returns directly. The resolver also now recurses into map literals, so it composes nested (`size([(a)-->(b) | b])`, `{friends: [...]}`). Parser: `[(pattern) WHERE? | proj]` is disambiguated from a parenthesized list element by speculatively parsing a pattern + mandatory `|` and backtracking (new TokenStream mark/reset). Lowering mirrors EXISTS, plus splicing the projection's property-accessor aux patterns into the subquery so `b.name` resolves per match. Write-side MATCH WHERE rejects it.

A computed map entry holding an async subquery (e.g. {ok: EXISTS { (p)-[:KNOWS]->(:Person) }}) now resolves per row. The per-row resolver already recursed into Expression::Map; this also recurses through Map in the batch-level pre_resolve_uncorrelated pass so an uncorrelated map-nested EXISTS is resolved once per batch rather than falling through to phase 2 per-row.

… pattern params Three pattern-comprehension correctness fixes: - referenced_vars() now includes outer variables captured only by the projection (e.g. [(a)-->(b) | c]) so dependency trimming can't drop the correlation. EXISTS (no projection) keeps the pattern-only behavior. - eval_pattern_comprehension_for_row resolves async subqueries inside the projection per inner match, so a nested EXISTS or pattern comprehension ([(a)-->(b) | EXISTS { ... }]) no longer falls through to a sync false. - Parameter substitution and UNWIND alias rewriting now descend into the inner pattern, so [(a)-[:KNOWS]->(b {name: $x}) | b.name] resolves $x.

Lower CALL [(imports)] { <read query> } to the existing Pattern::Subquery rather than a new executor. The scope clause imports correlated variables; without one the subquery runs once and broadcasts. Outer rows flow into the subquery as a pipeline clause (appended, unlike WITH which consumes prior patterns) and the RETURN columns continue downstream. Imports are prepended to the subquery SELECT so SubqueryOperator correlates on parent_schema ∩ select (per-row seed or evaluate-once + hash-join). A correlated aggregating CALL would lower to implicit grouping, which collapses to a single global aggregate in join-mode; promote the body-referenced imports to GROUP BY keys so per-import aggregates are correct and consistent in both execution modes. Tradeoff: a zero-match import yields no row (use OPTIONAL MATCH inside the CALL to retain it as 0) — documented. Body is MATCH / OPTIONAL MATCH / WITH / UNWIND / nested CALL ending in RETURN (explicit columns). Deferred and rejected with clear errors: writes inside CALL, CALL (*), inner UNION, RETURN *, and a RETURN that re-binds an import.

A CALL subquery's returned names were only checked against its imports, so a RETURN re-binding a non-import outer variable (MATCH (p),(q) CALL (p) { … RETURN f AS q }) was silently accepted — the executor treats the collision as an existing parent var and drops the subquery's value. And imports were never validated as actually bound outside, so CALL (p) { MATCH (p:Person) … } would import an unbound p. Pass the visible outer vars into lower_call_subquery: reject an import not in that set, and reject any RETURN column that collides with it (not just an import). Synthetic ?#__* vars are already excluded from the visible set, so they can't false-trigger.

Two CALL refinements: Strict shadowing boundary: a CALL body may only see its imports. Reject a body that references a non-imported variable whose name also exists in the outer scope — under proper scoped-CALL semantics that inner name is fresh, but the shared VarRegistry would silently treat it as the (unseeded, unbound) outer var and produce wrong results. Rename it or import it. (Imports-bound and RETURN-collision checks already landed.) CALL { … UNION [ALL] … }: parse the union chain inside the body (stopping at the closing brace) and lower it to Pattern::Union wrapped in the CALL subquery, so the import seed flows CALL → UnionOperator → each branch (which already runs branches correlated/seeded from its child) and the parent-row merge happens once. Plain UNION dedups per correlation group (DISTINCT on the CALL subquery); UNION ALL keeps duplicates. Branches must share a column shape and may not mix UNION with UNION ALL.

CALL (*) { … } imports every variable visible in the outer scope, resolved at lowering from the outer-scope set the caller already threads in. With every outer var imported, the shadowing guard never fires (the body may freely reference any outer name, which correlates), while the RETURN-collision guard still applies. New AST field CallSubqueryClause.import_all; parser accepts (*); lowering sets import_vars = outer_vars in that case.

A nested CALL inside a CALL body computed its visible scope only from the already-lowered patterns of the inner branch, so it couldn't see the enclosing CALL's imports: an explicit nested CALL (p) was rejected (p not bound), and a nested CALL (*) silently uncorrelated to a global aggregate. Thread the enclosing scope into lower_single_branch and union it into the nested call's visible set. A WITH narrows scope to its projection, so the enclosing-scope vars are dropped once a WITH is seen (whatever it carried forward is already in the branch patterns) — this avoids over-including an import a WITH dropped, which would re-create the silent-uncorrelation footgun.

New scalar functions on the Cypher expression surface: - String: substring (0-indexed → SUBSTR +1), left, right, trim/ltrim/rtrim, replace (literal replace-all), split (→ list). - Math: sqrt, sign, log (natural), and the ^ exponent operator (right-associative, binds tighter than * / %). - id(n)/elementId(n) → the node's IRI string (Fluree has no integer element id; documented). New IR Function variants (ReplaceAll, Split, Trim/LTrim/RTrim, Left/Right, Sqrt/Sign/Ln/Pow) with eval in string.rs/numeric.rs/list.rs and central dispatch arms; substring/id remap onto existing primitives at lowering. New BinOp::Pow with a right-associative power tier in the parser.

numeric_f64() (used by the new sqrt/sign/log/^ math functions) only accepted primitive numeric ComparableValues, so a string-backed xsd:float TypedLiteral — what Function::XsdFloat produces — collapsed to null instead of a numeric result. Run the value through the existing coerce_numeric_operand() normalizer first, matching how SUM/AVG and comparisons already handle xsd:float. Regression: eval-level unit test building sqrt/sign/log/^ over xsd:float(...), which yields the string-backed TypedLiteral (confirmed load-bearing — fails None vs Some(4.0) without the coercion).

A collect() projected by a WITH was deferred (a stale guard assumed the list was nulled at the subquery boundary). The list now survives the boundary — the later list/map work fixed the merge + try_eval_to_binding paths — so remove the guard. The carried list flows to the next stage: projected, fed to list functions (size), and UNWINDed (collect→unwind round-trip). Harden the one edge the blanket guard implicitly covered: ORDER BY directly on a collect() list in a WITH is now rejected via reject_order_by_on_list (mirroring the RETURN path) — sorting a list value is unsound in v1. (A carried list var reaching a downstream ORDER BY hits sort.rs's defensive element-wise total order, so it's deterministic, not corrupting.)

The ProjectionState::list_outputs doc claimed collect() lists are Binding::Grouped and must not flow out of a WITH — both now false: collect() yields a real Binding::List that carries through the WITH boundary. Tracking is now only to reject ORDER BY directly on a list.

A bound relationship variable is the reified edge's node, so the full relationship-value surface works off it: type(r) and properties(r)/r.prop already resolved via the reifier; add startNode(r)/endNode(r) reading f:reifiesSubject/f:reifiesObject (mirroring type(r)'s f:reifiesPredicate lookup). New IR Functions StartNode/EndNode + eval in metadata.rs + dispatch + cypher lowering (startnode/endnode). Test cypher_relationship_value_semantics: type, r.stars, properties(r), startNode(r)==a, endNode(r)==m over a reified -[r:RATED {stars}]-> edge.

A diamond graph (two distinct 2-hop paths A→D) pins the current semantics: bounded var-length enumerates both trails (2 rows); unbounded is reachability (D reached once → 1 row). Marks exactly where true unbounded path enumeration is still missing.

A feature-by-feature status grid against openCypher 9 (clauses, patterns/paths, expressions, functions, null/type semantics), tagged supported / divergent-by- design / deferred. Adds the 'divergent by design' axis the prose concept doc lacked (RDF-model choices: nodes are IRIs, relationships are edge annotations, id() is an IRI, unbounded var-length is reachability). Wired into SUMMARY.md and linked from the concept doc. Next step (noted in-doc): drive the marks from executable openCypher TCK scenarios.

Post-rebase integration: the SPARQL table fast-path formatter in the CLI (fluree-db-cli/src/output.rs) predates pr/3's Binding::Map/Rel variants and the Path tuple→struct reshape. Update Path to the struct pattern and add Map / Rel arms (defensive — these Cypher-only values aren't reached via the SPARQL table surface, but the match must be exhaustive).

pr/2 left this Cypher-over-HTTP policy regression test #[ignore]'d because the application/cypher transport wasn't implemented (#1357). pr/3 added that route (execute_cypher_ledger with policy wrapping), so the test now passes — un-ignore it and refresh the stale doc. Not a duplicate: pr/3's cypher_http_integration.rs has no policy coverage.

GovernanceOptions has exactly the specified fields, so the struct-update is a no-op (clippy::needless_update). Removes the warning I added to the Cypher route's qc_opts during conflict resolution, and the matching pre-existing one in sparql_qc_opts that landed in the reconstructed region.

Three policy-enforcement gaps where traversal/probe reads bypassed the per-flake view-policy filter that scan operators apply: - property_path.rs compute_closure: the typed-predicate `else` arm ingested raw edges with no filter_edges call, while the wildcard arm filtered. This regressed SPARQL/JSON-LD `?x :p+ ?y` transitive paths, which under a non-root policy emitted policy-hidden edges/nodes. - property_path.rs forward_step/backward_step wildcard branches read edges without filter_edges (typed branches already filtered). - server cypher write: the conditional-write branch probe (MERGE ON CREATE/ON MATCH, DELETE relationship guard) ran against an unpolicied GraphDb, making the committed branch a one-bit existence oracle over policy-hidden nodes. Wrap the probe in the same view policy the commit uses, matching the Cypher read path and SPARQL UPDATE's policy-wrapped WHERE. filter_edges short-circuits for root / no-policy, so non-policy queries are unaffected.

- Exponentiation `^` now binds tighter than unary `-`, matching openCypher/Neo4j precedence: `-2 ^ 2` is -(2^2) = -4, not (-2)^2 = 4. The `^` right operand still accepts a signed exponent (`2 ^ -3`), and `^` stays right-associative (`-2 ^ 2 ^ 2` = -16). Reordered the parse layering to mult > unary > power > postfix. - Variable-length path bounds (`*n` / `*n..m`) now reject out-of-range values via u32::try_from instead of a truncating `as u32` cast, which silently wrapped e.g. *4294967312 to a 16-hop expansion. Adds an end-to-end precedence regression test.

The unboxed Rel { start, predicate, end, reifier } variant (96-byte payload) made it the size driver of Binding, growing size_of::<Binding>() from 88 to 104 bytes. Binding is the engine's per-cell value, cloned and scattered through join/sort/materializer on every SPARQL/JSON-LD query, so this taxed queries that never touch Cypher. Move the payload into a boxed RelValue struct; the rare relationship value pays one indirection and the common variants restore 88 bytes.

aaj3f

Leaving an approve review, but I found some non-cypher-path (i.e. jsonld / sparql) performance and policy-security regressions, amongst other things, that prompted me to then ask Claude to do a full review. I mentioned this in-person, but I've passed that full audit to you in Slack (https://fluree-internal.slack.com/files/UKWLAAHBQ/F0BCYDH697C/pr-1361-review.md) for you to investigate and address as you see fit

(Oh, should add--super excited for this!)

The Cypher metadata functions (labels/keys/properties/type/startNode/ endNode) and loop-local member access (`[x IN list | x.prop]`) read graph flakes lazily during synchronous scalar expression evaluation, which could not await the engine's async policy enforcer — so under a non-root view policy they returned policy-hidden flakes. Reuse the existing enforcement and the EXISTS-style async pre-resolution rail rather than adding a new mechanism: - metadata.rs: split each reader into a raw read + a pure flake→binding reduction, and add policy-filtered async variants that thread the raw flakes through BinaryScanOperator::filter_flakes_by_policy (the same filter scans use). The synchronous readers are fail-closed under a non-root policy (return empty + warn) as a safety net. - metadata_resolve.rs: a per-row async resolver that, under an active policy, evaluates metadata calls / Member / list-comprehension / reduce / list-predicate through the filtered path and substitutes the computed value as Expression::Resolved, so the later sync evaluator never reaches a raw read. Comprehensions/reduce are resolved whole (their loop-local scope only exists during iteration). - bind.rs / filter.rs: BindOperator and FilterOperator route through the resolver when the expression contains a metadata read and a policy is active; the no-policy fast path is unchanged. - where_plan.rs: do not fuse a metadata-read bind or filter into the synchronous inline-operator path (apply_inline can't await), mirroring the existing EXISTS exclusion; they become deferred Bind/Filter operators that resolve through the async path. Outer-var `n.prop` and pattern-comprehension projections already lower to auxiliary scan joins, so they were already policy-correct and are untouched. Adds it_policy_cypher covering properties/keys/WHERE/list comprehension, each with a no-policy positive control proving the feature works generally, not merely fail-closed.

- expr_touches_list (the ORDER-BY-over-collect() guard) fell through to `false` for Index/Case/List/Map/comprehension/reduce, so e.g. `ORDER BY vs[0]` or `ORDER BY CASE … vs … END` over a collected list bypassed the guard and could reach the sort comparator on an unorderable value. Enumerate all compound forms (deny-by-default). - UNWIND row.field desugar (collect/rewrite/replace alias helpers) skipped Case and Exists, so `row.field` inside a CASE/EXISTS lowered to null — a silent wrong write. Recurse into both, matching subst_expr. - Unterminated `/* …` block comment leaked its final byte as a token instead of erroring; add LexError::UnterminatedComment and fail cleanly.

- Bounded path_exists (`(a)-[:T*1..k]->(b)`) enumerated the entire bounded frontier then tested membership, so a near target behind a high-fan-out anchor did full-closure work and could spuriously hit max_visited → ResourceLimit. Thread the target into traverse_bounded and return on first in-range reach (the unbounded form already short-circuits). - DETACH DELETE lowered the outbound and inbound edge scans as two independent OPTIONALs sharing only the bound node, so the WHERE solver cross-joined them into O×I transient rows for a hub node. Emit one OPTIONAL over a UNION of both directions: each row binds one direction, the other delete template skips, giving O+I rows. The OPTIONAL still preserves node-only deletion when the node has no edges. - SET with multiple property items pushed an independent OPTIONAL old-value lookup per item, cross-joining into kₐ×k_b×k_c rows over multi-valued predicates. Collect the lookups and emit one OPTIONAL over a UNION of the per-predicate branches (Σkᵢ rows). Final flakes are unchanged.

shortestPath/allShortestPaths built a per-hop edges Vec (a second allocation plus a predicate Arc-bump and two Sid Arc-bumps per hop) for every emitted path, but only Cypher's relationships(p) reads it. The operator is shared with the JSON-LD/FQL query surface, which has no relationships() function, so those queries paid the cost for nothing. Add ShortestPathPattern::needs_relationships — false for the JSON-LD/FQL lowering, conservatively true for Cypher (relationships() usage isn't visible at pattern-lowering time). The operator builds edges only when set; edges remain derivable from nodes + the single predicate/direction.

Two unauthenticated denial-of-service vectors in the openCypher front-end, both reachable from a single small request on the HTTP query/update routes (the parser runs synchronously on the request thread, so either aborts the whole server process): - Unbounded recursion. The recursive-descent parser had no nesting limit, so deeply-nested input (parens, NOT/unary chains, or nested CALL subqueries) overflowed the stack -> SIGABRT. TokenStream now tracks recursion depth and errors past MAX_PARSE_DEPTH. Guards sit at the expression re-entry point (parse_or), the self-recursive unary layers (parse_not, parse_unary), parse_statement, and parse_call_subquery -- the CALL cycle recurses parse_call_subquery <-> parse_call_body and never re-enters parse_statement, so it needs its own guard. Bounding the parser depth also bounds every downstream AST walker. - Exponential XOR. parse_xor desugared `a XOR b` into `(a OR b) AND NOT(a AND b)`, cloning the left operand twice per operator -> O(2^n) AST; ~60 XOR terms exhausted memory. XOR is now a first-class operator: BinOp::Xor in the AST and Function::Xor in the shared IR, evaluated as a two-valued boolean fold that reproduces the old truthiness semantics exactly, with no duplication. Adds regression tests covering deep-nesting rejection and a 2000-term XOR chain.

is_cypher_query() called to_ascii_lowercase(), heap-allocating a String on every query and transact request -- the Cypher check runs before the JSON-LD/SPARQL dispatch, so the standard RDF surface paid the cost on its hot path. Replace it with a borrowed, non-allocating case-insensitive substring scan that matches the same two media types.

Cypher writes staged directly via the cached handle instead of going through transact_via_consensus like every other write surface, with two consequences: - No idempotency. A retried submission (client timeout + retry) was not deduplicated and committed twice; the Idempotency-Key header was never read. - Pre-lock TOCTOU. A conditional `MERGE ... ON MATCH/ON CREATE` chose its branch by probing the cached pre-lock snapshot, so a concurrent writer could create the node between the probe and the commit, producing a duplicate that MERGE's uniqueness contract forbids. Add a TransactionBody::Cypher variant carrying the raw statement plus its bound parameters. The monolithic committer lowers it to a Txn inside the stage+commit retry loop, under the ledger write lock -- resolving a conditional plan with a policy-wrapped probe and re-resolving on each retry -- so the branch choice is consistent with the committed head and retries are deduplicated by Idempotency-Key. The route now mirrors the SPARQL UPDATE path. Adds an HTTP idempotency regression test.

The both-vars-unbound bounded closure ran a layered BFS per start node with no cancellation check -- unlike the unbounded branch, which polls per dequeue. Over a dense graph this loop runs nodes x edges x depth work uninterruptibly, so a query could not be cancelled mid-closure. Poll cancellation at the top of the per-start loop and the bounded BFS step.

The GROUP BY key and the defensive sort order for the Cypher-only Binding::Path / Rel / Map variants disagreed with their PartialEq/Hash identity, a latent inconsistency where GROUP BY and DISTINCT (or a sort) could classify the same values differently: - Path group key keyed on nodes only, but Eq/Hash key on nodes + edges. Two paths over the same node sequence reached via different parallel edges are distinct paths (Cypher WITH path, collect over allShortestPaths); GROUP BY merged them while DISTINCT kept them apart. The key now includes edges. The RDF/JSON-LD surface never populates edges, so this is a no-op there. - compare_bindings ordered Path by nodes only, Rel by the full (reifier, start, predicate, end) tuple, and Map by insertion order — none matching Eq. Align all three so equal values compare Equal: Path compares edges after nodes, Rel uses reifier-only identity when both are reified, and Map compares entries in key order. These variants are not produced on the SPARQL/JSON-LD surfaces, so there is no standard-RDF behavior change; this is internal-consistency hardening. Adds a group-key regression test.

…ONAL REMOVE n.a, n.b, … pushed one independent OPTIONAL { ?n p ?old } per property, so on multi-valued predicates the staged WHERE cross-joined to Πkᵢ transient rows -- the same hub blow-up the SET multi-property fix already addressed. Route REMOVE's property items through the existing push_unioned_old_values helper so they share a single OPTIONAL { … UNION … } (Σkᵢ rows). The now-redundant per-property push_optional_old_value helper is removed.

const_usize clamped a negative literal SKIP/LIMIT with (*n).max(0), silently turning LIMIT -1 / SKIP -5 into 0. openCypher errors on a negative bound; clamping is a surprising divergence a driver would not expect. Reject it with a clear message.

The lookup_node_ref doc block still described the old buggy approach -- claiming anonymous nodes "break" and that patterns with an anonymous node in a relationship are rejected. The implementation keys anonymous nodes on their source span (?#__anon_{start}_{end}) so every appearance shares one VarId, and the case works (anonymous_relationship_lowers_to_plain_triple). Rewrite the comment to describe the actual behavior.

The CLI classified any valid JSON body as JSON-LD before checking for Cypher, so a `{"cypher": "...", "params": {...}}` envelope (the form the server accepts under Content-Type: application/cypher) was routed to the JSON-LD pipeline and failed — forcing an explicit --format cypher / --cypher. Both the query and update sniffers now recognize a JSON object with a string `cypher` field as Cypher, via a shared looks_like_cypher_envelope helper, before the generic JSON fallthrough.

The request span's input_format was only ever "sparql" or "json-ld", so Cypher requests were traced as "json-ld". Tag them "cypher" (mirroring the route's own SPARQL/Cypher/JSON-LD dispatch order). Observability only.

Adds cypher_write_under_employee_bearer_denied, the write-side analogue of sparql_update_under_employee_bearer_denied: a restricted employee identity issuing a Cypher SET on a policy-protected predicate is rejected with the policy's exMessage. Locks in f:modify enforcement on the Cypher update route (which now commits through consensus with the same PolicyContext).

Replace the redundant `.and_then(|t| t.as_i64())` with `.and_then(serde_json::Value::as_i64)` to satisfy the workspace clippy::redundant-closure-for-method-calls lint (denied in CI).

bplatz added 30 commits June 23, 2026 09:12

chore(cypher): allow needless_raw_string_hashes in cypher test files

e31bd57

Cypher query strings are written as raw strings for consistency even when a particular query has no inner quotes; suppress the lint per-file so it stops failing clippy on new tests.

bplatz added 6 commits June 23, 2026 09:12

test(query): lock the Binding::Rel Eq/Hash identity contract

d22d512

bplatz requested review from aaj3f and zonotope June 23, 2026 13:14

Base automatically changed from pr/2-rdf12-annotations to main June 23, 2026 13:17

bplatz added 3 commits June 24, 2026 15:54

aaj3f approved these changes Jun 24, 2026

View reviewed changes

bplatz added 17 commits June 24, 2026 19:48

fmt

faf10db

fix(server): label Cypher requests as "cypher" in tracing

98cc25b

The request span's input_format was only ever "sparql" or "json-ld", so Cypher requests were traced as "json-ld". Tag them "cypher" (mirroring the route's own SPARQL/Cypher/JSON-LD dispatch order). Observability only.

fix(test): use a method reference in the Cypher idempotency test

86e91d7

Replace the redundant `.and_then(|t| t.as_i64())` with `.and_then(serde_json::Value::as_i64)` to satisfy the workspace clippy::redundant-closure-for-method-calls lint (denied in CI).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: openCypher 9 front-end (parser, read/write, paths, relationships)#1361

feat: openCypher 9 front-end (parser, read/write, paths, relationships)#1361
bplatz wants to merge 62 commits into
mainfrom
pr/3-cypher

bplatz commented Jun 23, 2026 •

edited

Loading

Uh oh!

aaj3f left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bplatz commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Model divergences (intentional)

Testing

Uh oh!

aaj3f left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bplatz commented Jun 23, 2026 •

edited

Loading