Skip to content

feat: openCypher 9 front-end (parser, read/write, paths, relationships)#1361

Open
bplatz wants to merge 62 commits into
mainfrom
pr/3-cypher
Open

feat: openCypher 9 front-end (parser, read/write, paths, relationships)#1361
bplatz wants to merge 62 commits into
mainfrom
pr/3-cypher

Conversation

@bplatz

@bplatz bplatz commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an openCypher 9 front-end on top of the RDF 1.2 edge-annotation work in
the base branch. Cypher parses and lowers to the same query IR and transaction
pipeline
as JSON-LD and SPARQL — the planner, executor, and result formatter
are shared — so this is a new surface, not a new engine. Cypher relationships
map onto the base branch's edge-annotation primitive, so property-graph edges
and RDF quoted-triple annotations are the same data read from two angles.

What's included

Read path

  • Clauses: MATCH / OPTIONAL MATCH / WITH / UNWIND / RETURN, UNION /
    UNION ALL.
  • CALL { … } subqueries: explicit imports (a, b), (*), uncorrelated
    broadcast, inner UNION, nesting, and strict scope/shadowing validation with
    correlated-aggregate soundness.
  • Paths: shortestPath / allShortestPaths; bounded variable-length paths with
    trail enumeration and relationship-uniqueness; binding a path (MATCH p = …)
    or a relationship list (-[r:T*1..n]->).
  • Relationship/path values: relationships(p), nodes(p), type(r),
    startNode(r) / endNode(r), properties(r), edge property access.
  • Expressions: arithmetic (incl. ^), comparison/boolean, string predicates,
    IN, CASE, list/map literals and indexing, list comprehensions / reduce /
    quantifiers, pattern comprehensions, EXISTS, parameters.
  • A scalar-function pass (string, math, casts, id/elementId), aggregates
    with implicit grouping + WITH-as-HAVING, and collect() carried through a
    WITH boundary.
  • Map values and map projection (n{.name, .*}).

Write path

  • CREATE / SET / REMOVE / DELETE / DETACH DELETE; node and relationship
    MERGE (with ON CREATE SET); WITH before a write.

Surfaces

  • HTTP application/cypher query route and CLI --format cypher-json
    (Neo4j-compatible output, with jsonld opt-in), both policy-enforced like the
    JSON-LD / SPARQL paths.

Docs

  • New concept doc, cookbook, and a tracked openCypher support matrix
    (docs/reference/cypher-support-matrix.md) marking each feature
    supported / divergent-by-design / deferred.

Model divergences (intentional)

Fluree enables a unified layer accessible by both SPARQL and Cypher: nodes are
IRIs (labels are rdf:type), relationships are edge annotations,
id()/elementId() returns the IRI string, and unbounded variable-length
traversal is reachability (bounded enumerates trails).

Testing

Covered by the Cypher parser/lowering unit suites and the
it_query_cypher / cypher_http_integration integration suites (read, write,
paths, relationships, CALL, null/type semantics), plus the cross-surface
edge-annotation tests shared with the base branch.

bplatz added 30 commits June 23, 2026 09:12
The Cypher language layer on top of the RDF 1.2 + query-operator engine: a
front-end that parses openCypher and lowers to the shared query/transact IR, so
the same executor powers Cypher, SPARQL, and JSON-LD.

- fluree-db-cypher: lexer, parser, AST, read + write lowering, diagnostics
- api: query_cypher / transact_cypher, conditional writes (MERGE / guarded
  DELETE via probe-then-stage), cypher-json (Neo4j-compatible) output
- transact: lower_cypher_update (CREATE / SET / REMOVE / DELETE / MERGE)
- cli: cypher query + update (--cypher / .cypher detection), cypher-json format
- server: application/cypher HTTP query + update endpoints
- CSV (neo4j-admin header convention) import to RDF 1.2 annotations

Cypher relationships map to RDF 1.2 edge annotations (LPG identity). See
docs/concepts/cypher.md for the supported surface.

One bare-DELETE relationship-guard test is #[ignore]'d pending a focused fix in
the per-edge-annotation probe path (its untyped rel-var probe is hidden by the
edge-annotation read-side firewall).
Replace the planning-tone Cypher docs now that the front-end has landed:

- concepts/cypher.md: drop the "v1 preview" framing and GQL_CYPHER_SUPPORT.md
  references; add a Running Cypher section (Rust API / CLI / HTTP
  application/cypher); expand the write surface (CREATE / SET / REMOVE /
  DELETE / MERGE / MATCH…CREATE); replace the stale deferred list with an
  accurate "Not yet supported".
- guides/cookbook-cypher.md: task-oriented recipes — model a property graph,
  query relationships, MERGE find-or-create, updates/deletes, paths and
  shortest path, aggregation, and cross-surface round-trips.
- concepts/edge-annotations.md: cross-link Cypher as the property-graph
  front-end (correct the "not part of this release" note).
- Wire both into SUMMARY.md and the guides README.
…egate exprs

- expressions: `%` (Function::Mod), `XOR` (desugars to the boolean-filter IR),
  and expression-valued aggregate arguments (`sum(n + 1)`)
- reads: `WITH *`
- writes: `SET n = {...}` full map-replace; the bare-DELETE relationship guard
  now probes candidate nodes from the original MATCH rather than an untyped
  rel-var pattern (which the edge-annotation read-side firewall hid), so
  `DELETE n` on a node with relationships correctly errors
- docs: move the now-supported constructs out of "Not yet supported" (keeping
  `^`, which has no IR support yet); add granular MERGE / write-MATCH limits
- labels(n): a node's Cypher label strings from live rdf:type assertions
  (overlay-aware — reflects uncommitted novelty, not just the persisted index)
- type(r): the relationship type string for a named relationship variable,
  read from f:reifiesPredicate on the reifier
- unbound / non-node / non-relationship arguments yield null

Adds eval/metadata.rs, Function::Labels / Function::RelType, and the Cypher
lowering for labels()/type(); moves both out of the docs' "Not yet supported".
Cypher query strings are written as raw strings for consistency even when a
particular query has no inner quotes; suppress the lint per-file so it stops
failing clippy on new tests.
MERGE now supports a single standalone relationship pattern
`(a)-[:T]->(b)` in addition to single-node MERGE. The whole path is
the match key: it lowers to one NOT EXISTS guard spanning both
endpoint identities plus the directed type triple, and when absent the
create branch mints both endpoints and the edge (with its f:reifies*
reifier bundle) exactly once. ON CREATE SET routes to either endpoint
node variable.

Deferred with clear errors: property-bearing MERGE relationships (need
an annotation-sidecar guard for correct match semantics), multi-hop /
multi-part MERGE, undirected MERGE relationships, and ON MATCH SET on a
relationship MERGE.

Also folds in a stray cargo-fmt rewrap in eval/metadata.rs.
Extends relationship MERGE to allow a leading MATCH binding the
endpoints — `MATCH (a),(b) MERGE (a)-[:KNOWS]->(b)`. This is a per-row
find-or-create: the NOT EXISTS guard runs once per matched (a,b) row
against the pre-write snapshot, exactly like SPARQL
`INSERT ... WHERE { ... FILTER NOT EXISTS { ... } }`, so it needs no
probe — it lowers to a single Txn.

Endpoint terms are now bound-aware: a MATCH-bound variable references
the existing node in both the guard and the create branch; an endpoint
introduced by the MERGE still gets a fresh existential probe (guard) and
a fresh blank node (create), so a mixed pattern like
`MATCH (a) MERGE (a)-[:HAS_PET]->(p:Pet {name:"Rex"})` creates one Pet
per matched a.

The lower_update guard is relaxed accordingly: a leading MATCH is
allowed before a single relationship MERGE (still rejected before a node
MERGE or alongside another write), and OPTIONAL MATCH before a
relationship MERGE is rejected (partial-reifier-bundle hazard, same as
CREATE).
Fix a doc contradiction: ON MATCH SET is supported on single-node MERGE
only, not on a relationship MERGE (lower_merge rejects it; node MERGE
routes ON MATCH SET through the conditional-probe path before lowering).
Also document standalone-vs-bound endpoint behavior (standalone mints
fresh nodes even when one endpoint already exists; bind endpoints with a
leading MATCH to reuse them), add a cartesian-product warning for
unfiltered per-row MERGE, and a style note on repeating labels on bound
endpoints.

Test gaps closed: node MERGE with a leading MATCH and OPTIONAL MATCH
before a relationship MERGE are now asserted in the deferred-shapes
list; ON CREATE SET on a MATCH-bound head endpoint has an API test
(fires once on create, not on re-run).
Supports a WITH projection between a MATCH and a write clause, limited to
the subset that maps cleanly onto the single-Txn where-pattern stream:
- pass-through variables (WITH a, b)
- renames (WITH a AS p) — lowered to a Bind
- computed non-aggregate aliases carried into the write
  (WITH a, a.birthYear + 30 AS adultAt SET a.adultAt = adultAt) — lowered
  to a Bind, reusing the write-WHERE expression lowering (which already
  rejects aggregate calls, so aggregation is refused for free)
- a post-projection WHERE that gates which rows are written

WITH applies Cypher scoping by narrowing bound_vars to the projection
horizon: a dropped SET/REMOVE/DELETE target is rejected as unbound, and a
dropped node referenced in a CREATE position becomes a fresh node — the
execution stream keeps every matched binding, so the narrowing only gates
which names a later write may reference.

Deferred with clear errors: aggregation, DISTINCT, and ORDER BY / SKIP /
LIMIT on a write-side WITH (they need query-level grouping/slice the
single-Txn model does not carry).
Cypher untyped variable-length paths -[*], -[*m..n] now lower to a
wildcard transitive PropertyPathPattern instead of being rejected. The
path operator follows ANY node->node edge per hop, with bounds carried
as min_hops/max_hops.

IR (ir/path.rs): PropertyPathPattern gains `wildcard: bool` and
`min_hops`/`max_hops: Option<u32>`. Existing constructors default them
(wildcard=false, bounds=None), so typed paths and SPARQL/JSON-LD @path
are byte-for-byte unchanged; a new `new_wildcard` constructor builds the
untyped form.

Operator (property_path.rs): forward_step/backward_step branch on
wildcard — a subject-prefix SPOT scan (resp. object-prefix OPST) that
keeps only Ref objects and skips the reserved predicates rdf:type and
f:reifies* (data properties are excluded by the Ref filter; the reifier
sidecar and class memberships by the exclusion). The four BFS primitives
(forward/backward/closure/path_exists) gain depth tracking and
emit/expand gates derived so that the unbounded, non-wildcard case
reproduces the original */+ behavior exactly. Bounded untyped paths use
reachability semantics (each node reachable within the hop range, by
shortest path).

Lowering (lower/pattern.rs): untyped variable-length rel -> wildcard
path with bounds; undirected untyped is rejected (the operator drives a
single direction from the bound endpoint).

Tests: lowering (it_lower.rs untyped_*) and end-to-end execution
(it_query_cypher.rs cypher_untyped_path_*) over a mixed-type
KNOWS/FOLLOWS chain, proving mixed-edge traversal, hop bounds, incoming
direction, and exclusion of data properties / rdf:type / reifier edges.
Typed property-path + SPARQL suites unchanged (36 + 139 + 493 pass).
…arams

Adds a first-class map value type to the query engine and wires the full
Cypher map-value surface on top of it.

Value model:
- `Binding::Map(Vec<(Arc<str>, Binding)>)` — ordered entries; identity
  (equality / hash / group key) is key-order-insensitive, while the Vec
  preserves insertion order for display. Opaque to triple matching, index
  search, and flake generation (a map is a projection value, not an RDF
  term).
- `Expression::Map(Vec<(Arc<str>, Expression)>)` in the IR — keys are
  static, values are per-row sub-expressions; duplicate keys resolve
  last-wins at construction. (Chosen over a `MakeMap` function with
  interleaved args so intent survives into planner/eval/debug.)

Surface:
- Map literal `{k: expr}` in expression position — parser (`{` in
  primary), AST `Expr::Map`, lowering to `Expression::Map`, eval to
  `Binding::Map`.
- `properties(n)` → a map of a node's data properties (literal-valued,
  non-reserved predicates; multi-valued → list) and `keys(n)` → their
  sorted names, both via a subject-prefix scan in eval/metadata.
- Object `$params` substitute to a map value (was rejected).

Rendering: maps render as JSON objects in the JSON formatters
(jsonld/typed) and native objects in cypher-json (cypherify now recurses
plain objects); tabular formatters (sparql/xml/csv) treat a map like a
list (one shared arm).
…TH before DELETE

Two review findings on the recent untyped-path and WITH-before-write work.

1. Bounded wildcard paths could omit a valid in-range endpoint. The
   transitive operator keyed `visited` by node, so a node first reached
   below `min_hops` on a shorter path was suppressed and never re-reached
   on a longer in-range path — and the bound-bound form (`path_exists`,
   which checks the target before the visited gate) disagreed with the
   bound-unbound traversal. Bounded paths now run a layered (node, depth)
   BFS (`traverse_bounded`, and the per-start loop in `compute_closure`),
   correct for any lower bound and consistent across forms. Only untyped
   paths reach the operator with bounds (typed bounded ranges lower to a
   UNION of chains), so typed/SPARQL paths are untouched. An UNBOUNDED
   lower bound above 1 (`-[*2..]->`) can't be evaluated soundly with
   node-reachability state and no depth cap, so it is rejected at lowering.

2. WITH before DELETE mis-handled renames/horizon. The delete classifier
   and the rel-var edge map key off the raw MATCH variables, so a WITH
   rename (`WITH r AS edge DELETE edge`) or a dropped target (`WITH a
   DELETE r`) was mis-routed or — worse — silently deleted an out-of-scope
   edge. WITH before DELETE is now rejected with a clear error (both in
   `detect_conditional` and `lower_update`); WITH before CREATE/SET/REMOVE
   is unaffected (those honor the narrowed scope via require_bound).

Tests: diamond bound/unbound consistency + unbounded-lower-bound rejection
(it_query_cypher), WITH-before-DELETE rejection at lowering and end-to-end
(no silent delete). Also documents the `try_eval_to_binding` Map passthrough
that lets a map var nest in another value (`{props: p}`).
… lang + list order

Two more review findings on the untyped-path and map-value work.

1. path_exists (bound-bound bounded paths) still used a node-only visited
   set, so an intermediate that must be revisited at a later depth was
   suppressed — disagreeing with the now-layered bound-unbound traversal
   (A->B, A->C, C->B, B->D; `A-[*3..3]->D` via A-C-B-D). Bounded
   path_exists now defers to the same `traverse_bounded` the bound-unbound
   form uses and checks target membership, so the two can never disagree.

2. properties(n) dropped language tags and list order. It now builds a
   lang-aware value binding (Binding::lit_lang for an rdf:langString, so
   JSON-LD/typed output keeps @language) and carries each value's
   FlakeMeta::i, ordering a multi-valued @list property by its stored
   index.

Tests: revisit-intermediate bound/unbound consistency (`*3..3`), and a
langString + @list properties() round-trip through JSON-LD output.
Maps the openCypher scalar functions whose semantics match an existing
engine function 1:1: toUpper→Ucase, toLower→Lcase, round→Round,
ceil/ceiling→Ceil, floor→Floor, rand→Rand. Deferred where semantics
differ (substring is 0- vs 1-indexed, replace is literal vs regex) or
where the engine lacks an evaluator (sqrt/sign/split/trim/^).
Adds the expression-level list-iteration family on a shared local-binding
foundation:
- list comprehensions `[x IN list WHERE pred | expr]`
- `reduce(acc = init, x IN list | expr)`
- `all/any/none/single(x IN list WHERE pred)`

Foundation (the reusable win): one `RowWithLocals` overlay binds the loop
variable(s) per element over the base row (dynamic dispatch on the base so
nested comprehensions don't recurse the type). Four new IR variants drive
it — `ListComprehension`, `Reduce`, `ListPredicate`, and `Member` for
eval-time property access — chosen over interleaved-arg function calls so
intent survives into planner/eval/debug, with loop vars excluded from
`referenced_vars` and capture-aware `substitute_var`.

Loop-local property access is first-class (the load-bearing edge): `x.name`
on a comprehension variable lowers to eval-time `Member` — a map element
looks the key up, a node element scans the property — instead of an outer
pattern join (which only works for query variables). The cypher lowering
gains a scope stack so a body's `var` resolves to a fresh synthetic id and
`var.prop` becomes member access; outer-variable property access keeps the
efficient auxiliary-pattern path.

Semantics: null/non-list input yields null (not an empty list); empty-list
identities are all/none = true, any/single = false; duplicate map keys
last-win. The list position may aggregate (`[x IN collect(p) | x.name]`).
Write-side `MATCH … WHERE` rejects these for now.

Also wires the clean scalar functions toUpper/toLower/round/floor/ceil/rand,
and relaxes object/list-of-map params now that map values exist.
…as rewrite

Two follow-up review findings on list iteration.

1. EXISTS inside a list comprehension / reduce / list predicate reached
   eval unresolved and silently evaluated to false: the FilterOperator
   EXISTS resolver only descends through Call (and an EXISTS there usually
   references the loop-local element, which needs per-element async
   subquery evaluation the synchronous per-element eval path can not do).
   Reject it at lowering with a clear error (via filter::contains_exists on
   the built expression).

2. The UNWIND $param alias rewrite (collect/rewrite/replace) descended into
   comprehension scoped bodies without checking whether the loop/acc var
   shadows the UNWIND alias, so `UNWIND $rows AS row ... [row IN xs | row]`
   could rewrite the inner loop var. The three alias walkers now always
   rewrite the outer list/init position but skip the scoped body/filter/map
   when the binder shadows the alias.

Tests: EXISTS-in-iteration rejection (it_lower) and a binder-aware rewrite
unit test (shadowed loop var untouched, non-shadowed alias rewritten).
Adds map projection — build a map value from a node variable. `.key`
selectors desugar to `{key: n.key}` (reusing the property-accessor
lowering, so they join via the aux pattern for outer vars or eval-time
member access for loop-locals); `key: expr` adds an explicit entry; and
`n{.*}` lowers to `properties(n)`. Mixing `.*` with other selectors is
deferred (needs a runtime map merge) — rejected with a clear error.

Parser: a variable immediately followed by `{` is a projection (distinct
from the bare map literal `{…}`). New AST `Expr::MapProjection` (boxed to
keep `Expr` small) with `Property` / `AllProperties` / `Literal`
selectors. Builds on the existing Binding::Map and Member machinery.
A correlated subquery that collects a projection over each match into a
list — `RETURN [(a)-[:KNOWS]->(b) WHERE b.age > 30 | b.name]`. The inner
pattern's existing variables correlate with the outer row (via the shared
registry, like EXISTS); new ones bind inside the subquery.

Mechanism: reuses the EXISTS async per-row resolution path (FilterOperator
+ BindOperator, gated by contains_exists). A new IR `PatternComprehension
{ patterns, projection }` is resolved per outer row by seeding the row's
bindings, running the subquery, evaluating the projection per match, and
collecting into a list. Since `FlakeValue` has no list variant, the
resolver substitutes a new `Expression::Resolved(Binding)` leaf (rather
than EXISTS's `Const(Bool)`) which the synchronous evaluator returns
directly. The resolver also now recurses into map literals, so it composes
nested (`size([(a)-->(b) | b])`, `{friends: [...]}`).

Parser: `[(pattern) WHERE? | proj]` is disambiguated from a parenthesized
list element by speculatively parsing a pattern + mandatory `|` and
backtracking (new TokenStream mark/reset). Lowering mirrors EXISTS, plus
splicing the projection's property-accessor aux patterns into the subquery
so `b.name` resolves per match. Write-side MATCH WHERE rejects it.
A computed map entry holding an async subquery (e.g.
{ok: EXISTS { (p)-[:KNOWS]->(:Person) }}) now resolves per row. The
per-row resolver already recursed into Expression::Map; this also
recurses through Map in the batch-level pre_resolve_uncorrelated pass so
an uncorrelated map-nested EXISTS is resolved once per batch rather than
falling through to phase 2 per-row.
… pattern params

Three pattern-comprehension correctness fixes:

- referenced_vars() now includes outer variables captured only by the
  projection (e.g. [(a)-->(b) | c]) so dependency trimming can't drop the
  correlation. EXISTS (no projection) keeps the pattern-only behavior.
- eval_pattern_comprehension_for_row resolves async subqueries inside the
  projection per inner match, so a nested EXISTS or pattern comprehension
  ([(a)-->(b) | EXISTS { ... }]) no longer falls through to a sync false.
- Parameter substitution and UNWIND alias rewriting now descend into the
  inner pattern, so [(a)-[:KNOWS]->(b {name: $x}) | b.name] resolves $x.
Lower CALL [(imports)] { <read query> } to the existing Pattern::Subquery
rather than a new executor. The scope clause imports correlated variables;
without one the subquery runs once and broadcasts. Outer rows flow into the
subquery as a pipeline clause (appended, unlike WITH which consumes prior
patterns) and the RETURN columns continue downstream.

Imports are prepended to the subquery SELECT so SubqueryOperator correlates
on parent_schema ∩ select (per-row seed or evaluate-once + hash-join). A
correlated aggregating CALL would lower to implicit grouping, which collapses
to a single global aggregate in join-mode; promote the body-referenced imports
to GROUP BY keys so per-import aggregates are correct and consistent in both
execution modes. Tradeoff: a zero-match import yields no row (use OPTIONAL
MATCH inside the CALL to retain it as 0) — documented.

Body is MATCH / OPTIONAL MATCH / WITH / UNWIND / nested CALL ending in RETURN
(explicit columns). Deferred and rejected with clear errors: writes inside
CALL, CALL (*), inner UNION, RETURN *, and a RETURN that re-binds an import.
A CALL subquery's returned names were only checked against its imports, so a
RETURN re-binding a non-import outer variable (MATCH (p),(q) CALL (p) { … RETURN
f AS q }) was silently accepted — the executor treats the collision as an
existing parent var and drops the subquery's value. And imports were never
validated as actually bound outside, so CALL (p) { MATCH (p:Person) … } would
import an unbound p.

Pass the visible outer vars into lower_call_subquery: reject an import not in
that set, and reject any RETURN column that collides with it (not just an
import). Synthetic ?#__* vars are already excluded from the visible set, so they
can't false-trigger.
Two CALL refinements:

Strict shadowing boundary: a CALL body may only see its imports. Reject a body
that references a non-imported variable whose name also exists in the outer
scope — under proper scoped-CALL semantics that inner name is fresh, but the
shared VarRegistry would silently treat it as the (unseeded, unbound) outer
var and produce wrong results. Rename it or import it. (Imports-bound and
RETURN-collision checks already landed.)

CALL { … UNION [ALL] … }: parse the union chain inside the body (stopping at
the closing brace) and lower it to Pattern::Union wrapped in the CALL
subquery, so the import seed flows CALL → UnionOperator → each branch (which
already runs branches correlated/seeded from its child) and the parent-row
merge happens once. Plain UNION dedups per correlation group (DISTINCT on the
CALL subquery); UNION ALL keeps duplicates. Branches must share a column shape
and may not mix UNION with UNION ALL.
CALL (*) { … } imports every variable visible in the outer scope, resolved at
lowering from the outer-scope set the caller already threads in. With every
outer var imported, the shadowing guard never fires (the body may freely
reference any outer name, which correlates), while the RETURN-collision guard
still applies. New AST field CallSubqueryClause.import_all; parser accepts (*);
lowering sets import_vars = outer_vars in that case.
A nested CALL inside a CALL body computed its visible scope only from the
already-lowered patterns of the inner branch, so it couldn't see the enclosing
CALL's imports: an explicit nested CALL (p) was rejected (p not bound), and a
nested CALL (*) silently uncorrelated to a global aggregate.

Thread the enclosing scope into lower_single_branch and union it into the
nested call's visible set. A WITH narrows scope to its projection, so the
enclosing-scope vars are dropped once a WITH is seen (whatever it carried
forward is already in the branch patterns) — this avoids over-including an
import a WITH dropped, which would re-create the silent-uncorrelation footgun.
New scalar functions on the Cypher expression surface:
- String: substring (0-indexed → SUBSTR +1), left, right, trim/ltrim/rtrim,
  replace (literal replace-all), split (→ list).
- Math: sqrt, sign, log (natural), and the ^ exponent operator
  (right-associative, binds tighter than * / %).
- id(n)/elementId(n) → the node's IRI string (Fluree has no integer element
  id; documented).

New IR Function variants (ReplaceAll, Split, Trim/LTrim/RTrim, Left/Right,
Sqrt/Sign/Ln/Pow) with eval in string.rs/numeric.rs/list.rs and central
dispatch arms; substring/id remap onto existing primitives at lowering. New
BinOp::Pow with a right-associative power tier in the parser.
numeric_f64() (used by the new sqrt/sign/log/^ math functions) only accepted
primitive numeric ComparableValues, so a string-backed xsd:float TypedLiteral —
what Function::XsdFloat produces — collapsed to null instead of a numeric
result. Run the value through the existing coerce_numeric_operand() normalizer
first, matching how SUM/AVG and comparisons already handle xsd:float.

Regression: eval-level unit test building sqrt/sign/log/^ over xsd:float(...),
which yields the string-backed TypedLiteral (confirmed load-bearing — fails
None vs Some(4.0) without the coercion).
A collect() projected by a WITH was deferred (a stale guard assumed the list was
nulled at the subquery boundary). The list now survives the boundary — the
later list/map work fixed the merge + try_eval_to_binding paths — so remove the
guard. The carried list flows to the next stage: projected, fed to list
functions (size), and UNWINDed (collect→unwind round-trip).

Harden the one edge the blanket guard implicitly covered: ORDER BY directly on a
collect() list in a WITH is now rejected via reject_order_by_on_list (mirroring
the RETURN path) — sorting a list value is unsound in v1. (A carried list var
reaching a downstream ORDER BY hits sort.rs's defensive element-wise total
order, so it's deterministic, not corrupting.)
The ProjectionState::list_outputs doc claimed collect() lists are Binding::Grouped
and must not flow out of a WITH — both now false: collect() yields a real
Binding::List that carries through the WITH boundary. Tracking is now only to
reject ORDER BY directly on a list.
A bound relationship variable is the reified edge's node, so the full
relationship-value surface works off it: type(r) and properties(r)/r.prop
already resolved via the reifier; add startNode(r)/endNode(r) reading
f:reifiesSubject/f:reifiesObject (mirroring type(r)'s f:reifiesPredicate
lookup). New IR Functions StartNode/EndNode + eval in metadata.rs + dispatch +
cypher lowering (startnode/endnode).

Test cypher_relationship_value_semantics: type, r.stars, properties(r),
startNode(r)==a, endNode(r)==m over a reified -[r:RATED {stars}]-> edge.
bplatz added 6 commits June 23, 2026 09:12
A diamond graph (two distinct 2-hop paths A→D) pins the current semantics:
bounded var-length enumerates both trails (2 rows); unbounded is reachability
(D reached once → 1 row). Marks exactly where true unbounded path enumeration
is still missing.
A feature-by-feature status grid against openCypher 9 (clauses, patterns/paths,
expressions, functions, null/type semantics), tagged supported / divergent-by-
design / deferred. Adds the 'divergent by design' axis the prose concept doc
lacked (RDF-model choices: nodes are IRIs, relationships are edge annotations,
id() is an IRI, unbounded var-length is reachability). Wired into SUMMARY.md and
linked from the concept doc. Next step (noted in-doc): drive the marks from
executable openCypher TCK scenarios.
Post-rebase integration: the SPARQL table fast-path formatter in the CLI
(fluree-db-cli/src/output.rs) predates pr/3's Binding::Map/Rel variants and the
Path tuple→struct reshape. Update Path to the struct pattern and add Map / Rel
arms (defensive — these Cypher-only values aren't reached via the SPARQL table
surface, but the match must be exhaustive).
pr/2 left this Cypher-over-HTTP policy regression test #[ignore]'d because the
application/cypher transport wasn't implemented (#1357). pr/3 added that route
(execute_cypher_ledger with policy wrapping), so the test now passes — un-ignore
it and refresh the stale doc. Not a duplicate: pr/3's cypher_http_integration.rs
has no policy coverage.
GovernanceOptions has exactly the specified fields, so the struct-update is a
no-op (clippy::needless_update). Removes the warning I added to the Cypher
route's qc_opts during conflict resolution, and the matching pre-existing one in
sparql_qc_opts that landed in the reconstructed region.
@bplatz bplatz requested review from aaj3f and zonotope June 23, 2026 13:14
Base automatically changed from pr/2-rdf12-annotations to main June 23, 2026 13:17
bplatz added 3 commits June 24, 2026 15:54
Three policy-enforcement gaps where traversal/probe reads bypassed the
per-flake view-policy filter that scan operators apply:

- property_path.rs compute_closure: the typed-predicate `else` arm
  ingested raw edges with no filter_edges call, while the wildcard arm
  filtered. This regressed SPARQL/JSON-LD `?x :p+ ?y` transitive paths,
  which under a non-root policy emitted policy-hidden edges/nodes.
- property_path.rs forward_step/backward_step wildcard branches read
  edges without filter_edges (typed branches already filtered).
- server cypher write: the conditional-write branch probe (MERGE
  ON CREATE/ON MATCH, DELETE relationship guard) ran against an
  unpolicied GraphDb, making the committed branch a one-bit existence
  oracle over policy-hidden nodes. Wrap the probe in the same view
  policy the commit uses, matching the Cypher read path and SPARQL
  UPDATE's policy-wrapped WHERE.

filter_edges short-circuits for root / no-policy, so non-policy queries
are unaffected.
- Exponentiation `^` now binds tighter than unary `-`, matching
  openCypher/Neo4j precedence: `-2 ^ 2` is -(2^2) = -4, not (-2)^2 = 4.
  The `^` right operand still accepts a signed exponent (`2 ^ -3`), and
  `^` stays right-associative (`-2 ^ 2 ^ 2` = -16). Reordered the parse
  layering to mult > unary > power > postfix.
- Variable-length path bounds (`*n` / `*n..m`) now reject out-of-range
  values via u32::try_from instead of a truncating `as u32` cast, which
  silently wrapped e.g. *4294967312 to a 16-hop expansion.

Adds an end-to-end precedence regression test.
The unboxed Rel { start, predicate, end, reifier } variant (96-byte
payload) made it the size driver of Binding, growing size_of::<Binding>()
from 88 to 104 bytes. Binding is the engine's per-cell value, cloned and
scattered through join/sort/materializer on every SPARQL/JSON-LD query,
so this taxed queries that never touch Cypher.

Move the payload into a boxed RelValue struct; the rare relationship
value pays one indirection and the common variants restore 88 bytes.

@aaj3f aaj3f left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an approve review, but I found some non-cypher-path (i.e. jsonld / sparql) performance and policy-security regressions, amongst other things, that prompted me to then ask Claude to do a full review. I mentioned this in-person, but I've passed that full audit to you in Slack (https://fluree-internal.slack.com/files/UKWLAAHBQ/F0BCYDH697C/pr-1361-review.md) for you to investigate and address as you see fit

(Oh, should add--super excited for this!)

bplatz added 17 commits June 24, 2026 19:48
The Cypher metadata functions (labels/keys/properties/type/startNode/
endNode) and loop-local member access (`[x IN list | x.prop]`) read graph
flakes lazily during synchronous scalar expression evaluation, which could
not await the engine's async policy enforcer — so under a non-root view
policy they returned policy-hidden flakes.

Reuse the existing enforcement and the EXISTS-style async pre-resolution
rail rather than adding a new mechanism:

- metadata.rs: split each reader into a raw read + a pure flake→binding
  reduction, and add policy-filtered async variants that thread the raw
  flakes through BinaryScanOperator::filter_flakes_by_policy (the same
  filter scans use). The synchronous readers are fail-closed under a
  non-root policy (return empty + warn) as a safety net.
- metadata_resolve.rs: a per-row async resolver that, under an active
  policy, evaluates metadata calls / Member / list-comprehension / reduce
  / list-predicate through the filtered path and substitutes the computed
  value as Expression::Resolved, so the later sync evaluator never reaches
  a raw read. Comprehensions/reduce are resolved whole (their loop-local
  scope only exists during iteration).
- bind.rs / filter.rs: BindOperator and FilterOperator route through the
  resolver when the expression contains a metadata read and a policy is
  active; the no-policy fast path is unchanged.
- where_plan.rs: do not fuse a metadata-read bind or filter into the
  synchronous inline-operator path (apply_inline can't await), mirroring
  the existing EXISTS exclusion; they become deferred Bind/Filter
  operators that resolve through the async path.

Outer-var `n.prop` and pattern-comprehension projections already lower to
auxiliary scan joins, so they were already policy-correct and are
untouched. Adds it_policy_cypher covering properties/keys/WHERE/list
comprehension, each with a no-policy positive control proving the feature
works generally, not merely fail-closed.
- expr_touches_list (the ORDER-BY-over-collect() guard) fell through to
  `false` for Index/Case/List/Map/comprehension/reduce, so e.g.
  `ORDER BY vs[0]` or `ORDER BY CASE … vs … END` over a collected list
  bypassed the guard and could reach the sort comparator on an unorderable
  value. Enumerate all compound forms (deny-by-default).
- UNWIND row.field desugar (collect/rewrite/replace alias helpers) skipped
  Case and Exists, so `row.field` inside a CASE/EXISTS lowered to null —
  a silent wrong write. Recurse into both, matching subst_expr.
- Unterminated `/* …` block comment leaked its final byte as a token
  instead of erroring; add LexError::UnterminatedComment and fail cleanly.
- Bounded path_exists (`(a)-[:T*1..k]->(b)`) enumerated the entire bounded
  frontier then tested membership, so a near target behind a high-fan-out
  anchor did full-closure work and could spuriously hit max_visited →
  ResourceLimit. Thread the target into traverse_bounded and return on
  first in-range reach (the unbounded form already short-circuits).
- DETACH DELETE lowered the outbound and inbound edge scans as two
  independent OPTIONALs sharing only the bound node, so the WHERE solver
  cross-joined them into O×I transient rows for a hub node. Emit one
  OPTIONAL over a UNION of both directions: each row binds one direction,
  the other delete template skips, giving O+I rows. The OPTIONAL still
  preserves node-only deletion when the node has no edges.
- SET with multiple property items pushed an independent OPTIONAL old-value
  lookup per item, cross-joining into kₐ×k_b×k_c rows over multi-valued
  predicates. Collect the lookups and emit one OPTIONAL over a UNION of the
  per-predicate branches (Σkᵢ rows). Final flakes are unchanged.
shortestPath/allShortestPaths built a per-hop edges Vec (a second
allocation plus a predicate Arc-bump and two Sid Arc-bumps per hop) for
every emitted path, but only Cypher's relationships(p) reads it. The
operator is shared with the JSON-LD/FQL query surface, which has no
relationships() function, so those queries paid the cost for nothing.

Add ShortestPathPattern::needs_relationships — false for the JSON-LD/FQL
lowering, conservatively true for Cypher (relationships() usage isn't
visible at pattern-lowering time). The operator builds edges only when
set; edges remain derivable from nodes + the single predicate/direction.
Two unauthenticated denial-of-service vectors in the openCypher
front-end, both reachable from a single small request on the HTTP
query/update routes (the parser runs synchronously on the request
thread, so either aborts the whole server process):

- Unbounded recursion. The recursive-descent parser had no nesting
  limit, so deeply-nested input (parens, NOT/unary chains, or nested
  CALL subqueries) overflowed the stack -> SIGABRT. TokenStream now
  tracks recursion depth and errors past MAX_PARSE_DEPTH. Guards sit at
  the expression re-entry point (parse_or), the self-recursive unary
  layers (parse_not, parse_unary), parse_statement, and
  parse_call_subquery -- the CALL cycle recurses
  parse_call_subquery <-> parse_call_body and never re-enters
  parse_statement, so it needs its own guard. Bounding the parser depth
  also bounds every downstream AST walker.

- Exponential XOR. parse_xor desugared `a XOR b` into
  `(a OR b) AND NOT(a AND b)`, cloning the left operand twice per
  operator -> O(2^n) AST; ~60 XOR terms exhausted memory. XOR is now a
  first-class operator: BinOp::Xor in the AST and Function::Xor in the
  shared IR, evaluated as a two-valued boolean fold that reproduces the
  old truthiness semantics exactly, with no duplication.

Adds regression tests covering deep-nesting rejection and a 2000-term
XOR chain.
is_cypher_query() called to_ascii_lowercase(), heap-allocating a String
on every query and transact request -- the Cypher check runs before the
JSON-LD/SPARQL dispatch, so the standard RDF surface paid the cost on its
hot path. Replace it with a borrowed, non-allocating case-insensitive
substring scan that matches the same two media types.
Cypher writes staged directly via the cached handle instead of going
through transact_via_consensus like every other write surface, with two
consequences:

- No idempotency. A retried submission (client timeout + retry) was not
  deduplicated and committed twice; the Idempotency-Key header was never
  read.
- Pre-lock TOCTOU. A conditional `MERGE ... ON MATCH/ON CREATE` chose its
  branch by probing the cached pre-lock snapshot, so a concurrent writer
  could create the node between the probe and the commit, producing a
  duplicate that MERGE's uniqueness contract forbids.

Add a TransactionBody::Cypher variant carrying the raw statement plus its
bound parameters. The monolithic committer lowers it to a Txn inside the
stage+commit retry loop, under the ledger write lock -- resolving a
conditional plan with a policy-wrapped probe and re-resolving on each
retry -- so the branch choice is consistent with the committed head and
retries are deduplicated by Idempotency-Key. The route now mirrors the
SPARQL UPDATE path. Adds an HTTP idempotency regression test.
The both-vars-unbound bounded closure ran a layered BFS per start node
with no cancellation check -- unlike the unbounded branch, which polls
per dequeue. Over a dense graph this loop runs nodes x edges x depth work
uninterruptibly, so a query could not be cancelled mid-closure. Poll
cancellation at the top of the per-start loop and the bounded BFS step.
The GROUP BY key and the defensive sort order for the Cypher-only
Binding::Path / Rel / Map variants disagreed with their PartialEq/Hash
identity, a latent inconsistency where GROUP BY and DISTINCT (or a sort)
could classify the same values differently:

- Path group key keyed on nodes only, but Eq/Hash key on nodes + edges.
  Two paths over the same node sequence reached via different parallel
  edges are distinct paths (Cypher WITH path, collect over
  allShortestPaths); GROUP BY merged them while DISTINCT kept them apart.
  The key now includes edges. The RDF/JSON-LD surface never populates
  edges, so this is a no-op there.
- compare_bindings ordered Path by nodes only, Rel by the full
  (reifier, start, predicate, end) tuple, and Map by insertion order —
  none matching Eq. Align all three so equal values compare Equal: Path
  compares edges after nodes, Rel uses reifier-only identity when both
  are reified, and Map compares entries in key order.

These variants are not produced on the SPARQL/JSON-LD surfaces, so there
is no standard-RDF behavior change; this is internal-consistency
hardening. Adds a group-key regression test.
…ONAL

REMOVE n.a, n.b, … pushed one independent OPTIONAL { ?n p ?old } per
property, so on multi-valued predicates the staged WHERE cross-joined to
Πkᵢ transient rows -- the same hub blow-up the SET multi-property fix
already addressed. Route REMOVE's property items through the existing
push_unioned_old_values helper so they share a single
OPTIONAL { … UNION … } (Σkᵢ rows). The now-redundant per-property
push_optional_old_value helper is removed.
const_usize clamped a negative literal SKIP/LIMIT with (*n).max(0),
silently turning LIMIT -1 / SKIP -5 into 0. openCypher errors on a
negative bound; clamping is a surprising divergence a driver would not
expect. Reject it with a clear message.
The lookup_node_ref doc block still described the old buggy approach --
claiming anonymous nodes "break" and that patterns with an anonymous node
in a relationship are rejected. The implementation keys anonymous nodes
on their source span (?#__anon_{start}_{end}) so every appearance shares
one VarId, and the case works (anonymous_relationship_lowers_to_plain_triple).
Rewrite the comment to describe the actual behavior.
The CLI classified any valid JSON body as JSON-LD before checking for
Cypher, so a `{"cypher": "...", "params": {...}}` envelope (the form the
server accepts under Content-Type: application/cypher) was routed to the
JSON-LD pipeline and failed — forcing an explicit --format cypher /
--cypher. Both the query and update sniffers now recognize a JSON object
with a string `cypher` field as Cypher, via a shared
looks_like_cypher_envelope helper, before the generic JSON fallthrough.
The request span's input_format was only ever "sparql" or "json-ld", so
Cypher requests were traced as "json-ld". Tag them "cypher" (mirroring the
route's own SPARQL/Cypher/JSON-LD dispatch order). Observability only.
Adds cypher_write_under_employee_bearer_denied, the write-side analogue of
sparql_update_under_employee_bearer_denied: a restricted employee identity
issuing a Cypher SET on a policy-protected predicate is rejected with the
policy's exMessage. Locks in f:modify enforcement on the Cypher update
route (which now commits through consensus with the same PolicyContext).
Replace the redundant `.and_then(|t| t.as_i64())` with
`.and_then(serde_json::Value::as_i64)` to satisfy the workspace
clippy::redundant-closure-for-method-calls lint (denied in CI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants