feat(lint): consolidate consistency rules into linter (#99 part 1)#147
feat(lint): consolidate consistency rules into linter (#99 part 1)#147JohnRDOrazio wants to merge 20 commits intodevfrom
Conversation
Captures the brainstormed design before implementation begins: * Two-PR rollout: PR1 adds 6 new lint rules and reconciles 3 partial- overlap rules; PR2 removes consistency_service, its routes, the worker task, and the frontend Consistency tab. * Settled rule semantics: orphan-class keeps the loose definition, undefined-parent renames to dangling-ref and expands to cover rdfs:domain and rdfs:range, duplicate-label becomes case-insensitive + same-type + all entity types. * Level placements per the issue body (orphan-individual and deprecated-parent at L2; the other four at L4). * Out of scope: discussion #87's rdflib-vs-SQL question, the duplicate detection pipeline, the cross-references endpoint. Two reconciliation choices remain pending damienriehl's input on the issue thread; this spec reflects the working decisions and will be revised if those decisions change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12-task TDD plan executing the design doc's PR1 scope: 6 new rules, 3 reconciled rules, level description refresh, level-membership coverage test, full regression, PR open. Each task is bite-sized (write failing test → run → implement → run → commit) with concrete code, exact paths, and specific shell commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flags properties (ObjectProperty / DatatypeProperty / AnnotationProperty / rdf:Property) declared in the ontology but never used as a predicate in any triple. Mirrors consistency_service._check_unused_property; lives at L4 (Quality).
Address code review on Task 1 (#99): * The s != prop guard protects against a degenerate (prop, prop, X) triple, not against the rdf:type declaration as the previous comment claimed. graph.subjects(prop, None) queries triples where prop is the predicate, so the rdf:type triple was never in scope. * Number the test section header to match the file's convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flags individuals whose rdf:type target is not declared as owl:Class in this ontology. One finding per (individual, undeclared-type) pair. Mirrors consistency_service._check_orphan_individual; lives at L2 (Consistency).
Flags ObjectProperty and DatatypeProperty declarations with no rdfs:domain. AnnotationProperty is intentionally excluded (annotations are by convention domain-agnostic). L4 (Quality).
Flags ObjectProperty and DatatypeProperty declarations with no rdfs:range. AnnotationProperty intentionally excluded. L4 (Quality).
Flags classes that subclass an owl:deprecated class. Reuses the shared is_deprecated helper from rdf_utils which accepts both boolean and string literal forms. L2 (Consistency). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fires once when an ontology has more than 5 root classes (classes with no parent except owl:Thing). Ontology-scope finding: subject_iri=None, subject_type='other'. L4 (Quality). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure rename. Behavior unchanged — the rule still only checks subClassOf targets. Predicate-axis expansion comes in the next commit. Existing LintIssue rows with rule_id='undefined-parent' are left in place; they're snapshots of past runs and lint results are regenerable.
Address code review on Task 7 (#99): * test_linter.py:279 — comment said "undefined-parent issues expected" * linter.py:_check_dangling_ref — docstring still described the old rule. Both updated to refer to dangling-ref; the docstring also notes that domain/range coverage arrives in the next commit.
The rule now inspects rdfs:subClassOf, rdfs:domain, and rdfs:range targets uniformly. Each finding carries details.predicate so the UI can show which axis triggered the dangling reference. References into well-known namespaces (rdf/rdfs/owl/xsd/skos/dc/dcterms) and into namespaces declared via owl:imports are skipped, mirroring consistency_service._check_dangling_ref. The details payload is also reshaped: undefined_parent / undefined_parent_local are renamed to dangling_target / dangling_target_local since the rule no longer applies only to parent classes.
Address code review on Task 8 (#99): * Drop the redundant declared_subjects intermediate — graph.subjects() already returns a superset, so the union was a no-op. * Drop the now-redundant 'obj == OWL.Thing or' guard since OWL.Thing is already in the known set. * Test docstring listed 6 of 7 well-known namespaces; add the missing 'dc' so the doc matches the implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ly (#99) Matching is now case-insensitive, per language, and grouped by entity type — so a class and a property sharing 'knows' is no longer a false positive, but two ObjectProperties with the same label still are. Scope expanded from class-only to all entity types. Mirrors the per- type semantics from consistency_service._check_duplicate_label while keeping the lint rule's case-insensitive matching.
Address code review on Task 9 (#99): * Dedup IRIs in the per-group list — a resource whose two labels fold to the same key (e.g. 'Apple' and 'apple') was being inserted twice and inflating total_duplicates / duplicate_iris. * Normalise language tags with .lower() in the grouping key, matching _check_label_per_language and BCP 47's case-insensitive semantics. * Use the etype from the group key for subject_type instead of re-invoking _determine_entity_type per emit. * Tests: add explicit subject_type assertions in the class and individual duplicate tests so the per-type emission is verified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
L2 now mentions orphan-individual and deprecated-parent. L4 now mentions unused-property, empty-domain/range, and multi-root.
Locks in the level placements from the design doc so a future accidental edit to LINT_LEVELS gets caught at test time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConsolidates ontology consistency checks into the linter: adds six new lint rules, replaces ChangesConsistency-to-Lint Consolidation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ontokit/services/linter.py`:
- Around line 609-627: original_label_for is keyed by subject IRI so when a
subject has multiple rdfs:label values the wrong label can be reported for a
duplicate group; change the logic in the grouping loop that builds groups and
original_label_for so you track one canonical display label per group key (key =
(etype, label_str.lower(), lang_key)) instead of per subject: when you add
subj_iri to groups[key] also set original_label_for[key] = label_str (use
setdefault so the first-seen label for that group wins deterministically), then
later references that expect a representative label (e.g., code that builds
message/details["label"]) should read from original_label_for[key] not
original_label_for[subj_iri]; update all uses in the duplicate reporting portion
(the code handling groups and message/details) accordingly.
- Around line 472-477: The loop that normalizes owl:imports entries (inside the
graph.triples loop that builds imported_ns) only adds a slash-terminated
namespace, causing hash-qualified terms to appear dangling; update the
normalization logic in that loop (variables: imported, imp_str, imported_ns) to
also account for hash namespaces by preserving existing trailing '#' or adding a
'#' variant in addition to the '/' variant (e.g., if imported URI already ends
with '#' keep it, otherwise add both imp_str + '/' and imp_str + '#') so both
slash and hash namespace forms are considered imported.
- Around line 1430-1439: The loop currently restricts inspection to subjects
typed as OWL.NamedIndividual by using graph.subjects(RDF.type,
OWL.NamedIndividual), which misses common instances typed only as rdf:type
ex:Person; change the iteration to consider all subjects that have an rdf:type
(e.g., graph.subjects(RDF.type, None) or graph.subjects(predicate=RDF.type)),
keep the existing guards for URIRef on ind and type_target, remove the
special-case check that requires type_target == OWL.NamedIndividual, and
continue skipping types found in declared_classes so the rule runs for ordinary
rdf:type instances (references: ind, type_target, OWL.NamedIndividual,
declared_classes, graph.objects(..., RDF.type)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 45d6c8f4-bdf8-4a87-9f7f-944d31e77d55
📒 Files selected for processing (5)
docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.mddocs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.mdontokit/services/linter.pytests/unit/test_lint_config.pytests/unit/test_linter.py
Address CodeRabbit + cumulative-review feedback in one pass: * _check_orphan_individual now iterates all subjects with rdf:type and filters via _determine_entity_type, catching implicit individuals declared with just rdf:type ex:Person (no owl:NamedIndividual marker). * RDFS.Class coverage extended across deprecated-parent, orphan-individual, and multi-root via a new _class_subjects helper; previously only owl:Class subjects were considered. * _check_duplicate_label keys original_label_for by group key instead of subject IRI so a subject with two labels in different groups reports each group's actual label rather than the first-seen one. * owl:imports namespace handling now considers both the slash and the hash form when the imported IRI has no trailing separator, so hash-style imported terms are no longer flagged as dangling. * Stale test names from the undefined-parent rename are updated. * duplicate-label gets a details-shape assertion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Heads-up notes from the cumulative review for the canary period (PR1 leaves Behavioral divergences from
|
Lift PR patch coverage above the codecov target by adding targeted tests for the blank-node and edge-case guards in the new check methods. Each new test exercises one of the previously-uncovered 'continue' branches: blank-node iterations in unused-property, orphan-individual, empty-domain, empty-range, deprecated-parent, and multi-root; plus the no-type / non-literal / empty-label / multi-group guards in duplicate-label.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/test_linter.py`:
- Around line 1237-1250: The test test_dangling_ref_flags_undefined_range
currently only asserts the reported predicate for the "dangling-ref" issue; add
an assertion that the issue's dangling_target equals the undefined datatype IRI
to prevent regressions. Locate the test function
test_dangling_ref_flags_undefined_range and after the existing predicate
assertion (matches[0].details["predicate"] == str(RDFS.range)) add an assertion
that matches[0].details["dangling_target"] == str(EX.UndeclaredDatatype") so the
linter's reported target (for EX.age range) is validated along with the
predicate.
- Around line 1267-1279: The test function
test_dangling_ref_skips_well_known_namespaces should be extended to include
dc/dcterms examples: add triples to the Graph similar to the existing XSD/SKOS
examples (e.g., using EX.something with RDF.type OWL.ObjectProperty and
RDFS.range set to DC.title and another to DCTERMS.creator or similar) so the
OntologyLinter lint call for enabled_rules={"dangling-ref"} validates that dc
and dcterms URIs are also skipped; keep the assertion _results_with_rule(issues,
"dangling-ref") == [] unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 03f88a0e-a520-421b-b29b-f29d9b97e68e
📒 Files selected for processing (2)
ontokit/services/linter.pytests/unit/test_linter.py
🚧 Files skipped from review as they are similar to previous changes (1)
- ontokit/services/linter.py
Two tiny CodeRabbit-flagged gaps in the dangling-ref tests: * test_dangling_ref_flags_undefined_range only asserted details.predicate, not the dangling_target IRI. Add the target assertion so a regression in either field is caught. * test_dangling_ref_skips_well_known_namespaces only exercised XSD and SKOS — the implementation also skips DC and DCTERMS. Add a DC.title range and a DCTERMS.creator range so all 7 well-known namespaces are represented in the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
PR1 of two for issue #99. Adds 6 new rules to the linter and reconciles 3 partial-overlap rules so the lint system covers everything
consistency_service.pychecks.consistency_service.py, its routes, the worker task, and the frontend Consistency tab are all left in place; PR2 will remove them once we have confidence the lint pipeline produces equivalent findings.New rules
rule_idunused-propertyorphan-individualrdf:typenot declared asowl:Classempty-domainrdfs:domainempty-rangerdfs:rangedeprecated-parentowl:deprecatedclassmulti-rootReconciled rules
undefined-parent→dangling-ref(rename + expand). Now scansrdfs:subClassOf,rdfs:domain, andrdfs:range. Each finding carriesdetails.predicate. Stays at L1 (Critical). ExistingLintIssuerows with the oldrule_idare left in place; lint runs are user-triggered and regenerable. Thedetailspayload is also reshaped:undefined_parent/undefined_parent_localare renamed todangling_target/dangling_target_localsince the rule no longer applies only to parent classes.duplicate-label(broaden). Scope expanded from class-only to all entity types; matching key is now(entity_type, label_lower, lang)so cross-type collisions and language-tag-case differences are no longer false positives. Stays at L3.orphan-class— no code change. The "no instances" variant from consistency simply disappears with PR2.Out of scope
consistency_service.pyand its routes (PR2)Design
docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.mdPlan
docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.mdTest plan
pytest tests/green🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests