Skip to content

chore: consolidated feature branch for ENVITED-X pipeline#14

Draft
jdsika wants to merge 12 commits into
mainfrom
feat/envited-x-pipeline
Draft

chore: consolidated feature branch for ENVITED-X pipeline#14
jdsika wants to merge 12 commits into
mainfrom
feat/envited-x-pipeline

Conversation

@jdsika

@jdsika jdsika commented May 7, 2026

Copy link
Copy Markdown

Purpose

This PR exists solely to observe CI status for the combined feature branch used by the ENVITED-X asset pipeline. It is NOT intended to be merged.

Contents

14 commits stacked on upstream main (1c5f68e4, includes #3447):

# PR Description
1 linkml/linkml#3309 fix(owlgen): warn on covering axiom edge cases
2 linkml/linkml#3446 feat(generators): --normalize-prefixes
3 linkml/linkml#3449 feat(generators): --default-language
4 linkml/linkml#3450 feat(gen-shacl): --message-template + {comments} (depends on #3)
5 linkml/linkml#3451 feat(gen-shacl): sh:sparql from rules
6 feat(gen-shacl): exclusive-value SPARQL rule pattern (extends #5)
7 linkml/linkml#3473 fix(shaclgen): minCount/maxCount 0 for zero cardinality
8 linkml/linkml#3485 fix(shaclgen): sh:pattern in any_of
9–13 ASCS-eV/linkml#1 feat(generators): --deterministic (RDFC-1.0 + WL hashing, 5 commits)
14 fix(shaclgen): apply default_language to SPARQL constraint messages

Changes from previous version

  • Rebuilt from scratch on current main — all merge commits removed, clean cherry-pick stack
  • Fixed import orderingfrom linkml.utils.rdf_canonicalize sorted with linkml.* group (ruff isort)
  • Fixed trailing newlines — stripped double \n from generated tutorial .ttl files (end-of-file-fixer)
  • Fixed owlgen formatting — collapsed multi-line self._present(...) to single line (ruff format)
  • Bundled rdf_canonicalize.py into linkml.utils — avoids PyPI linkml-runtime missing the module when installed via git
  • Added commit 14 — applies --default-language to SPARQL constraint sh:message literals
  • Deterministic output now 5 commits (was 2) — includes trailing-dot CURIE fix, test/fixture updates, pre-commit pass

Known CI issues

  • Docker build skipped due to prefixmaps git pin (no git in container) — expected until prefixmaps#82 releases v0.2.8

TODO

  • Fix trailing newline in deterministic_turtle() / canonicalize_rdf_graph()rdflib.Graph.serialize(format="turtle") produces \n\n; currently patched by stripping in committed files

DO NOT MERGE

This branch will be force-pushed when upstream PRs are updated. It will be removed once all upstream PRs are merged.

@jdsika jdsika force-pushed the feat/envited-x-pipeline branch 9 times, most recently from a705c35 to 3d3a52a Compare May 12, 2026 16:35
@jdsika jdsika force-pushed the feat/envited-x-pipeline branch 2 times, most recently from 8a9a3ad to e21079c Compare June 9, 2026 16:28
jdsika added 12 commits June 10, 2026 21:34
Emit warnings for abstract class covering axiom edge cases:

- Zero children: warn that no covering axiom will be generated
- One child: warn that the covering axiom degenerates to an equivalence
  (Parent = Child), recommending --skip-abstract-class-as-unionof-subclasses

Both axioms are still emitted when applicable (semantically correct per
OWL 2), but warnings alert users who extend the ontology downstream.

Tests verify warnings are logged, flag suppression works, the
single-child covering axiom triple is correctly asserted, plus
negative tests for multi-child and concrete class cases, and the
mixin-only children edge case.

Refs: linkml#3309, linkml#3219
Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
… names

Add an opt-in --normalize-prefixes flag to OWL, SHACL, and JSON-LD
Context generators that normalises non-standard prefix aliases to
well-known names from a static prefix map (derived from rdflib 7.x
defaults, cross-checked against prefix.cc consensus).

Key design decisions:
- Static frozen map (MappingProxyType) instead of runtime
  Graph().namespaces() lookup eliminates rdflib version dependency
- Both http://schema.org/ and https://schema.org/ map to 'schema'
- Shared normalize_graph_prefixes() helper used by OWL and SHACL
- Two-phase graph normalisation: Phase 1 normalises schema-declared
  prefixes, Phase 2 cleans up runtime-injected bindings
- Collision detection: skip with warning when standard prefix name
  is already user-declared for a different namespace
- Phase 2 guard prevents overwriting HTTPS bindings with HTTP variants

The flag defaults to off, preserving existing behaviour.

Tests cover OWL, SHACL, and context generators with sdo->schema,
dce->dc, http/https edge case, custom prefix preservation, flag-off
backward compatibility, cross-generator consistency, prefix collision
detection, schema1 regression prevention, Phase 2 HTTPS guard, empty
schema edge case, and static map integrity.

Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Python truthiness check `if s.maximum_cardinality:` evaluates to False
when the value is 0 (an integer), silently skipping sh:maxCount 0 emission.
The same bug affected minimum_cardinality and exact_cardinality.

Replace all three truthiness checks with explicit `is not None` guards:
- `if s.minimum_cardinality is not None:`
- `if s.maximum_cardinality is not None:`
- `elif s.exact_cardinality is not None:` (two occurrences)

Add regression tests:
- test_zero_maximum_cardinality_emits_maxcount
- test_zero_exact_cardinality_emits_both_counts

This is the primary mechanism for suppressing inherited slots on subclasses
via slot_usage (OWL maxCardinality 0 pattern).

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
The SHACL generator translated any_of branches by dispatching
solely on `any.range` (class, type, enum, or simple datatype).
If a branch specified `pattern:` — either alone or combined
with a range — the constraint was silently dropped, producing
an empty blank node `[ ]` (trivially satisfied) instead of the
intended `[ sh:pattern "..." ]`.

This is a problem for schemas that use pattern alternatives in
`any_of`, such as the SPDX license field where valid values are
either members of a fixed enum (SPDX identifiers), IRIs, or
custom identifiers matching the LicenseRef- pattern defined in
SPDX Specification v2.3 Annex D (ABNF: license-ref =
["DocumentRef-"(idstring)":"]"LicenseRef-"(idstring)).

The fix adds a single check after the range dispatch:

    if any.pattern:
        g.add((range_list[-1], SH.pattern, Literal(any.pattern)))

This correctly handles:
- Pattern-only branches (no range): node gets only sh:pattern
- Range + pattern branches: node gets both sh:datatype and sh:pattern
- Range-only branches (no pattern): unchanged behaviour

The test suite now includes a dedicated schema exercising all
three cases, with assertions on both the generated RDF triples
and pyshacl validation of conforming/non-conforming data.

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
…erals

Add a `--default-language` CLI option to both gen-owl and gen-shacl that
emits BCP 47 language-tagged string literals for human-readable annotations.

gen-owl changes:
- New `default_language` field on OwlSchemaGenerator
- `_LANGUAGE_TAGGABLE_RANGES` frozenset (string, ncname) guards tagging
- `_resolve_language()` checks element-level in_language first, then default
- `_literal()` helper creates properly tagged Literal objects
- `add_metadata()` tags string-range and fallback-range literals
- `add_enum()` PV labels respect language tags
- New `--default-language` Click option

gen-shacl changes:
- New `default_language` field on ShaclGenerator
- NodeShape rdfs:label / rdfs:comment get language tags
- PropertyShape sh:name / sh:description get language tags via prop_pv_text()
- Numeric literals (sh:order, sh:minCount, etc.) are never tagged
- New `--default-language` Click option

Tests:
- 3 new OWL tests: tagged labels, backward-compat plain literals, URI ranges
- 4 new SHACL tests: NodeShape, PropertyShape, plain literals, numeric guard

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Resolve all five line-level review comments from amc-corey-cox on the
--default-language feature.

Code fixes
- Drop the unconditional language-tag emission from the catch-all branch
  of OwlSchemaGenerator.add_metadata. This branch fires for ranges that
  are neither types, subsets, nor classes -- in practice enum-ranged
  metaslots such as pv_formula (range pv_formula_options) on a
  PermissibleValue / EnumDefinition, obligation_level on a
  SlotDefinition, or alias_predicate on a StructuredAlias. Tagging these
  permissible-value identifiers shifts the datatype from xsd:string to
  rdf:langString and breaks downstream sh:in / owl:oneOf matching.
  (The original review comment cited status="testing" as the example;
  status has range uriorcurie in the metamodel and takes the URIRef
  branch -- the illustrative concern was correct, the metaslot named
  was not.)
- Extract the duplicated BCP 47 regex and resolution policy from
  owlgen.py and shaclgen.py into a new shared module
  linkml.utils.language_tags. The module exposes BCP47_RE (RFC 5646
  section 2.1 ABNF), is_well_formed_bcp47() (well-formedness per
  section 2.2.9), and a LanguageTagResolver class.
- LanguageTagResolver validates the default tag once at construction
  and remembers per-element in_language tags it has already warned
  about, collapsing "hundreds of warnings per run" to one per distinct
  malformed tag. The missing-ClassVar observation on shaclgen is moot
  with the inline regex removed entirely.
- Assign self._language_resolver before super().__post_init__() in
  both generators so any parent-class hook can safely call
  _resolve_language during initialisation.

Test changes
- Rewrite test_default_language_does_not_tag_uri_range_metaslots with a
  strong negative assertion: walk every triple in the generated graph
  and require that any language-tagged literal sits under a predicate
  in a fixed allowlist (rdfs:label, rdfs:comment, skos:definition,
  skos:prefLabel, skos:altLabel, skos:editorialNote, skos:note,
  skos:example, dcterms:title, dcterms:description). Also assert
  bibo:status (uriorcurie range) emits a URIRef.
- Add test_default_language_does_not_tag_enum_ranged_metaslot_in_catchall_branch:
  monkey-patches pv_formula's slot URI to a non-linkml: value so the
  catch-all else branch actually fires, then asserts the emitted
  permissible-value identifier carries no language tag.
- Add test_default_language_bcp47_warning_is_deduplicated to both
  test_owlgen.py and test_shaclgen.py: stamp the same malformed tag on
  multiple elements and assert exactly one warning per distinct tag.

Standards references
- RFC 5646 section 2.1 (Syntax / ABNF) and section 2.2.9 (Classes of
  Conformance): https://www.rfc-editor.org/rfc/rfc5646
- RDF 1.1 Concepts section 3.3 (Literals -- language-tagged strings):
  https://www.w3.org/TR/rdf11-concepts/
- SHACL section 2.3.2.1 (sh:name / sh:description) -- the predicates
  this feature stamps with language tags.

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
…apes

Add a new --message-template option that attaches sh:message literals to
each property shape using a user-defined template string.

Supported placeholders:
  {name}        — slot name (underscore-separated)
  {title}       — slot title (human-readable), falls back to name
  {description} — slot description, falls back to empty string
  {comments}    — slot comments joined with "; ", falls back to empty string
  {class}       — enclosing class name
  {path}        — property IRI (compact or full)

The resulting message is stripped of leading/trailing whitespace and
omitted entirely when empty (avoids blank sh:message literals).

When --default-language is also set, the literal is language-tagged.

Example:
  gen-shacl --message-template "{name} ({class}): {description} [{comments}]"

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Implement SHACL-SPARQL constraint generation for the boolean-guard
pattern commonly used in conditional validation rules. When a LinkML
class has rules: blocks with preconditions (value_presence: PRESENT)
and postconditions (equals_string: true), the generator now emits
sh:SPARQLConstraint nodes on the corresponding sh:NodeShape.

Features:
- New _add_rules() method translates recognised rule patterns to SPARQL
- Boolean-guard pattern: if value present then flag must be true
- Rule description mapped to sh:message on the constraint
- Deactivated rules are skipped
- Warnings emitted for bidirectional/open_world rule flags
- New --emit-rules/--no-emit-rules CLI flag (default: enabled)
- Full URI references in SPARQL (no PREFIX declarations needed)

The generated SPARQL follows W3C SHACL Section 5 and uses the pre-bound
\ variable per Section 5.3.1. Constraints are validated by pyshacl
with advanced=True.

Refs: linkml#2464
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Add a --deterministic / --no-deterministic CLI flag (default off) to OWL,
SHACL, JSON-LD Context, and JSON-LD generators that produces diff-stable
output using Weisfeiler-Lehman structural hashing on top of the RDFC-1.0
canonicalization from upstream (linkml#3407).

Three-phase hybrid pipeline (when --deterministic is set):
1. RDFC-1.0 canonicalization (upstream) produces sequential _:c14nN IDs
2. Weisfeiler-Lehman structural hashing replaces sequential IDs with
   content-based _:b<sha256> hashes that remain stable when unrelated
   triples are added/removed
3. rdflib re-serialization recovers idiomatic Turtle (inline blank
   nodes, collection syntax, filtered prefixes, preserved xsd:string)

Without --deterministic, upstream's always-on RDFC-1.0 canonicalization
is used directly (via canonicalize_rdf_graph).

Additional features gated behind --deterministic:
- Expression sorting (any_of/all_of/none_of/exactly_one_of) in owlgen
- Collection sorting (sh:in, sh:ignoredProperties) in shaclgen
- Permissible value sorting in owlgen and shaclgen
- JSON-LD deterministic key ordering (deterministic_json)
- JSON-LD context structured ordering (jsonldcontextgen)

Rebased on top of upstream linkml#3407 (pyoxigraph RDFC-1.0).

Refs: linkml#1847, linkml#3407
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
When --default-language is set, the sh:message literal on SPARQL
constraints (sh:SPARQLConstraint) was emitted without a language tag.
Add lang=self._resolve_language() to the Literal() constructor call
for SPARQL rule descriptions.

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
rdflib's Turtle serializer always emits a trailing double newline.
Normalize to single newline in deterministic_turtle() and the rdflib
fallback path in canonicalize_rdf_graph() for consistent file endings.

Note: CLI print() still adds a newline after serialize()'s trailing
newline. Callers capturing stdout should strip trailing blank lines
(e.g. via sed).

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Address all 6 review comments from @amc-corey-cox:

1. bidirectional rules: now skip-and-warn (continue) instead of
   emit-and-warn, preventing silent semantic divergence.

2. elseconditions: add explicit warning when else branch is dropped,
   so schema authors know only the forward branch is emitted.

3. PresenceEnum wrapping: keep PresenceEnum(PresenceEnum.PRESENT) -
   it is NOT redundant. PresenceEnum.PRESENT is a PermissibleValue,
   but parsed schemas return PresenceEnum instances; the wrapping
   ensures type-compatible comparison. Added explanatory comment.

4. xsd:boolean vs xsd:string: change SPARQL from ?flag != true to
   str(?flag) != "true" so the comparison works regardless of
   whether the data stores the flag as xsd:boolean or xsd:string.

5. Unused fixture: delete boolean_guard_rules.yaml (tests inline
   duplicate schemas as Python strings).

6. End-to-end pyshacl test: add test_rule_boolean_guard_pyshacl_end_to_end
   that validates a conforming instance passes and a violating instance
   (missing flag) is correctly flagged by pyshacl with advanced=True.

Additional tests added:
- test_rule_with_elseconditions_warns (warning emission)
- test_rule_bidirectional_skipped (skip behavior + warning)

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the feat/envited-x-pipeline branch from cab84ad to b2b3dba Compare June 10, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant