Skip to content

feat(lint): consolidate consistency rules into linter (#99 part 1)#147

Open
JohnRDOrazio wants to merge 20 commits intodevfrom
feat/issue-99-consolidate-consistency
Open

feat(lint): consolidate consistency rules into linter (#99 part 1)#147
JohnRDOrazio wants to merge 20 commits intodevfrom
feat/issue-99-consolidate-consistency

Conversation

@JohnRDOrazio
Copy link
Copy Markdown
Member

@JohnRDOrazio JohnRDOrazio commented May 3, 2026

Summary

PR1 of two for issue #99. Adds 6 new rules to the linter and reconciles 3 partial-overlap rules so the lint system covers everything consistency_service.py checks. consistency_service.py, its routes, the worker task, and the frontend Consistency tab are all left in place; PR2 will remove them once we have confidence the lint pipeline produces equivalent findings.

New rules

rule_id Severity Level Notes
unused-property warning L4 Property declared but never used as predicate
orphan-individual warning L2 Individual's rdf:type not declared as owl:Class
empty-domain info L4 Object/Datatype property with no rdfs:domain
empty-range info L4 Same for rdfs:range
deprecated-parent warning L2 Class subclasses an owl:deprecated class
multi-root info L4 Fires once if >5 root classes; ontology-scope finding

Reconciled rules

  • undefined-parentdangling-ref (rename + expand). Now scans rdfs:subClassOf, rdfs:domain, and rdfs:range. Each finding carries details.predicate. Stays at L1 (Critical). Existing LintIssue rows with the old rule_id are left in place; lint runs are user-triggered and regenerable. The details payload is also reshaped: undefined_parent / undefined_parent_local are renamed to dangling_target / dangling_target_local since the rule no longer applies only to parent classes.
  • duplicate-label (broaden). Scope expanded from class-only to all entity types; matching key is now (entity_type, label_lower, lang) so cross-type collisions and language-tag-case differences are no longer false positives. Stays at L3.
  • orphan-class — no code change. The "no instances" variant from consistency simply disappears with PR2.

Out of scope

Design

docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md

Plan

docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md

Test plan

  • Per-rule unit tests for every new and reconciled rule (26 new tests)
  • Level-membership coverage test
  • Full backend regression pytest tests/ green
  • ruff + mypy strict clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added six new lint rules to catch unused properties, orphan individuals, missing property domains/ranges, deprecated parents, and multi-root class situations.
    • Broadened reference validation to flag dangling subclass/domain/range targets while skipping well-known/imported vocabularies.
    • Improved duplicate-label detection with case-insensitive matching and entity-type + language grouping.
  • Documentation

    • Added a detailed design and implementation plan for consolidating consistency checks into the linter and updated human-readable lint level descriptions.
  • Tests

    • Extensive unit tests added/expanded to cover new/renamed rules, edge cases, and regression guards.

JohnRDOrazio and others added 17 commits May 3, 2026 13:51
Captures the brainstormed design before implementation begins:

* Two-PR rollout: PR1 adds 6 new lint rules and reconciles 3 partial-
  overlap rules; PR2 removes consistency_service, its routes, the worker
  task, and the frontend Consistency tab.
* Settled rule semantics: orphan-class keeps the loose definition,
  undefined-parent renames to dangling-ref and expands to cover
  rdfs:domain and rdfs:range, duplicate-label becomes case-insensitive
  + same-type + all entity types.
* Level placements per the issue body (orphan-individual and
  deprecated-parent at L2; the other four at L4).
* Out of scope: discussion #87's rdflib-vs-SQL question, the duplicate
  detection pipeline, the cross-references endpoint.

Two reconciliation choices remain pending damienriehl's input on the
issue thread; this spec reflects the working decisions and will be
revised if those decisions change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12-task TDD plan executing the design doc's PR1 scope: 6 new rules,
3 reconciled rules, level description refresh, level-membership
coverage test, full regression, PR open. Each task is bite-sized
(write failing test → run → implement → run → commit) with concrete
code, exact paths, and specific shell commands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flags properties (ObjectProperty / DatatypeProperty / AnnotationProperty /
rdf:Property) declared in the ontology but never used as a predicate in
any triple. Mirrors consistency_service._check_unused_property; lives at
L4 (Quality).
Address code review on Task 1 (#99):
* The s != prop guard protects against a degenerate (prop, prop, X)
  triple, not against the rdf:type declaration as the previous comment
  claimed. graph.subjects(prop, None) queries triples where prop is the
  predicate, so the rdf:type triple was never in scope.
* Number the test section header to match the file's convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flags individuals whose rdf:type target is not declared as owl:Class
in this ontology. One finding per (individual, undeclared-type) pair.
Mirrors consistency_service._check_orphan_individual; lives at L2
(Consistency).
Flags ObjectProperty and DatatypeProperty declarations with no
rdfs:domain. AnnotationProperty is intentionally excluded (annotations
are by convention domain-agnostic). L4 (Quality).
Flags ObjectProperty and DatatypeProperty declarations with no
rdfs:range. AnnotationProperty intentionally excluded. L4 (Quality).
Flags classes that subclass an owl:deprecated class. Reuses the
shared is_deprecated helper from rdf_utils which accepts both boolean
and string literal forms. L2 (Consistency).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fires once when an ontology has more than 5 root classes (classes with
no parent except owl:Thing). Ontology-scope finding: subject_iri=None,
subject_type='other'. L4 (Quality).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure rename. Behavior unchanged — the rule still only checks
subClassOf targets. Predicate-axis expansion comes in the next
commit. Existing LintIssue rows with rule_id='undefined-parent' are
left in place; they're snapshots of past runs and lint results are
regenerable.
Address code review on Task 7 (#99):
* test_linter.py:279 — comment said "undefined-parent issues expected"
* linter.py:_check_dangling_ref — docstring still described the old
  rule. Both updated to refer to dangling-ref; the docstring also
  notes that domain/range coverage arrives in the next commit.
The rule now inspects rdfs:subClassOf, rdfs:domain, and rdfs:range
targets uniformly. Each finding carries details.predicate so the UI
can show which axis triggered the dangling reference. References into
well-known namespaces (rdf/rdfs/owl/xsd/skos/dc/dcterms) and into
namespaces declared via owl:imports are skipped, mirroring
consistency_service._check_dangling_ref.

The details payload is also reshaped: undefined_parent /
undefined_parent_local are renamed to dangling_target /
dangling_target_local since the rule no longer applies only to
parent classes.
Address code review on Task 8 (#99):
* Drop the redundant declared_subjects intermediate — graph.subjects()
  already returns a superset, so the union was a no-op.
* Drop the now-redundant 'obj == OWL.Thing or' guard since OWL.Thing is
  already in the known set.
* Test docstring listed 6 of 7 well-known namespaces; add the missing
  'dc' so the doc matches the implementation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ly (#99)

Matching is now case-insensitive, per language, and grouped by entity
type — so a class and a property sharing 'knows' is no longer a false
positive, but two ObjectProperties with the same label still are.
Scope expanded from class-only to all entity types. Mirrors the per-
type semantics from consistency_service._check_duplicate_label while
keeping the lint rule's case-insensitive matching.
Address code review on Task 9 (#99):
* Dedup IRIs in the per-group list — a resource whose two labels fold
  to the same key (e.g. 'Apple' and 'apple') was being inserted twice
  and inflating total_duplicates / duplicate_iris.
* Normalise language tags with .lower() in the grouping key, matching
  _check_label_per_language and BCP 47's case-insensitive semantics.
* Use the etype from the group key for subject_type instead of
  re-invoking _determine_entity_type per emit.
* Tests: add explicit subject_type assertions in the class and
  individual duplicate tests so the per-type emission is verified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
L2 now mentions orphan-individual and deprecated-parent. L4 now
mentions unused-property, empty-domain/range, and multi-root.
Locks in the level placements from the design doc so a future
accidental edit to LINT_LEVELS gets caught at test time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77b969dd-56b0-49db-816d-8c64a1c70e21

📥 Commits

Reviewing files that changed from the base of the PR and between fd2897b and 8b2dbb4.

📒 Files selected for processing (1)
  • tests/unit/test_linter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_linter.py

📝 Walkthrough

Walkthrough

Consolidates ontology consistency checks into the linter: adds six new lint rules, replaces undefined-parent with expanded dangling-ref (covers subClassOf/domain/range and import skips), rewrites duplicate-label to be entity-type+language+case-insensitive, updates lint-level definitions, and adds broad unit-test coverage.

Changes

Consistency-to-Lint Consolidation

Layer / File(s) Summary
Design / Plan
docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md, docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md
PR1 design and implementation plan: add 6 new linter rules, rename/expand undefined-parent → dangling-ref, broaden duplicate-label, enumerate tasks/tests and PR2 cutover steps.
Rule Registry & Levels
ontokit/services/linter.py (top / LINT_RULES, _LEVEL_*, LINT_LEVEL_DEFINITIONS)
Adds dangling-ref (replaces undefined-parent), and new rules unused-property, orphan-individual, empty-domain, empty-range, deprecated-parent, multi-root; updates duplicate-label metadata and lint-level composition and descriptions.
Core Checkers / Helpers
ontokit/services/linter.py (check implementations / helpers)
Implements _check_dangling_ref covering rdfs:subClassOf, rdfs:domain, rdfs:range with well-known/import skip logic; rewrites _check_duplicate_label to group by entity type+language, case-insensitive matching; adds _check_unused_property, _check_orphan_individual, _check_empty_domain, _check_empty_range, _check_deprecated_parent (uses is_deprecated), _check_multi_root, and helper _class_subjects.
Tests — config
tests/unit/test_lint_config.py
Updates level-1 expectation to include dangling-ref instead of undefined-parent.
Tests — rule coverage
tests/unit/test_linter.py
Removes old undefined-parent tests; adds/updates extensive tests: dangling-ref variants (subClassOf/domain/range + import/namespace skips + predicate detail), ~25 new tests for the six new rules, expanded duplicate-label tests (case-insensitive, type- and language-aware), lint-level membership assertions, and many defensive/regression guards.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

  • #99 — Directly implemented by this PR: consolidates consistency checks into the linter and introduces the listed new and reconciled rules.

Possibly related PRs

  • #103 — Related to label/language handling; touches similar label normalization/grouping concerns.

"I hopped through triples, sniffed each node and name,
Dangling refs found, and labels called by new rule-frame;
Six rules now patrol the ontology glade,
Tests are plenty, so no surprises stayed.
A rabbit cheers the linted parade." 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: consolidating consistency rules into the linter for issue #99 part 1, which aligns with the substantial refactoring of linter rules shown in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 83.58% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-99-consolidate-consistency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 92.19858% with 11 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
ontokit/services/linter.py 92.19% 4 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ontokit/services/linter.py`:
- Around line 609-627: original_label_for is keyed by subject IRI so when a
subject has multiple rdfs:label values the wrong label can be reported for a
duplicate group; change the logic in the grouping loop that builds groups and
original_label_for so you track one canonical display label per group key (key =
(etype, label_str.lower(), lang_key)) instead of per subject: when you add
subj_iri to groups[key] also set original_label_for[key] = label_str (use
setdefault so the first-seen label for that group wins deterministically), then
later references that expect a representative label (e.g., code that builds
message/details["label"]) should read from original_label_for[key] not
original_label_for[subj_iri]; update all uses in the duplicate reporting portion
(the code handling groups and message/details) accordingly.
- Around line 472-477: The loop that normalizes owl:imports entries (inside the
graph.triples loop that builds imported_ns) only adds a slash-terminated
namespace, causing hash-qualified terms to appear dangling; update the
normalization logic in that loop (variables: imported, imp_str, imported_ns) to
also account for hash namespaces by preserving existing trailing '#' or adding a
'#' variant in addition to the '/' variant (e.g., if imported URI already ends
with '#' keep it, otherwise add both imp_str + '/' and imp_str + '#') so both
slash and hash namespace forms are considered imported.
- Around line 1430-1439: The loop currently restricts inspection to subjects
typed as OWL.NamedIndividual by using graph.subjects(RDF.type,
OWL.NamedIndividual), which misses common instances typed only as rdf:type
ex:Person; change the iteration to consider all subjects that have an rdf:type
(e.g., graph.subjects(RDF.type, None) or graph.subjects(predicate=RDF.type)),
keep the existing guards for URIRef on ind and type_target, remove the
special-case check that requires type_target == OWL.NamedIndividual, and
continue skipping types found in declared_classes so the rule runs for ordinary
rdf:type instances (references: ind, type_target, OWL.NamedIndividual,
declared_classes, graph.objects(..., RDF.type)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 45d6c8f4-bdf8-4a87-9f7f-944d31e77d55

📥 Commits

Reviewing files that changed from the base of the PR and between 1125819 and c1f03ae.

📒 Files selected for processing (5)
  • docs/superpowers/plans/2026-05-03-issue-99-pr1-consistency-to-lint.md
  • docs/superpowers/specs/2026-05-03-issue-99-consolidate-consistency-into-lint-design.md
  • ontokit/services/linter.py
  • tests/unit/test_lint_config.py
  • tests/unit/test_linter.py

Comment thread ontokit/services/linter.py Outdated
Comment thread ontokit/services/linter.py Outdated
Comment thread ontokit/services/linter.py Outdated
Address CodeRabbit + cumulative-review feedback in one pass:

* _check_orphan_individual now iterates all subjects with rdf:type and
  filters via _determine_entity_type, catching implicit individuals
  declared with just rdf:type ex:Person (no owl:NamedIndividual marker).
* RDFS.Class coverage extended across deprecated-parent,
  orphan-individual, and multi-root via a new _class_subjects helper;
  previously only owl:Class subjects were considered.
* _check_duplicate_label keys original_label_for by group key instead
  of subject IRI so a subject with two labels in different groups
  reports each group's actual label rather than the first-seen one.
* owl:imports namespace handling now considers both the slash and the
  hash form when the imported IRI has no trailing separator, so
  hash-style imported terms are no longer flagged as dangling.
* Stale test names from the undefined-parent rename are updated.
* duplicate-label gets a details-shape assertion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JohnRDOrazio
Copy link
Copy Markdown
Member Author

Heads-up notes from the cumulative review for the canary period (PR1 leaves consistency_service.py in place so we can compare findings against the lint pipeline before PR2 deletes it).

Behavioral divergences from consistency_service to be aware of

These are intentional improvements / pre-existing patterns in the linter, not bugs — but if you're spot-checking parity during the canary, expect them to surface:

  1. dangling-ref skiplist includes DC (Dublin Core Elements). consistency_service._check_dangling_ref only skips DCTERMS. The linter version also skips http://purl.org/dc/elements/1.1/. So a dangling reference into dc:title will be silent in the lint pipeline but flagged by the consistency pipeline. Defensible — DC Elements is a well-established standard vocabulary and a domain/range pointing into it shouldn't be treated as a broken reference.

  2. dangling-ref deduplication is finer-grained. The linter's dedup key is the 3-tuple (subject, predicate, target); consistency_service uses the 2-tuple (subject, target). So a property whose rdfs:domain and rdfs:range both reference the same undeclared URI produces two findings in the linter and one in consistency. The linter's behavior is more informative.

  3. _check_orphan_individual now also catches implicit individuals. After this PR's polish commit, the rule iterates all rdf:type subjects (filtered to "individual" entity type via _determine_entity_type), not just subjects explicitly declared as owl:NamedIndividual. So EX.Alice rdf:type EX.UndeclaredPerson (without owl:NamedIndividual marker) is now flagged in lint but not in consistency. This is the same direction as Consolidate consistency checks into the lint rule system #99's stated goal — covering more real cases — and doesn't drop any findings the consistency rule would catch.

  4. _check_orphan_individual and _check_deprecated_parent and _check_multi_root now include rdfs:Class declarations. Previously they iterated only OWL.Class subjects, missing classes typed as rdfs:Class. After the polish commit they use a shared _class_subjects() helper that unions both. consistency_service._iter_classes already covered both, so this brings the two pipelines closer together for these rules — but the pre-PR linter behavior was OWL-only.

Cosmetic items deferred (not in this PR)

  • The cumulative reviewer flagged that the spec doc says details.duplicates while the implementation uses details.duplicate_iris. The implementation is the source of truth (the key name was carried over from the pre-PR code, no consumer reads it differently). I'll fix the spec doc separately rather than churning this PR.

Tracking the canary

Once this PR is merged, consistency_service.py and the lint pipeline will produce findings in parallel for the same projects. The four divergences above are the expected differences — anything else surfacing during the canary period is worth investigating before PR2 deletes the consistency pipeline.

Lift PR patch coverage above the codecov target by adding targeted
tests for the blank-node and edge-case guards in the new check
methods. Each new test exercises one of the previously-uncovered
'continue' branches: blank-node iterations in unused-property,
orphan-individual, empty-domain, empty-range, deprecated-parent, and
multi-root; plus the no-type / non-literal / empty-label / multi-group
guards in duplicate-label.
@JohnRDOrazio
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_linter.py`:
- Around line 1237-1250: The test test_dangling_ref_flags_undefined_range
currently only asserts the reported predicate for the "dangling-ref" issue; add
an assertion that the issue's dangling_target equals the undefined datatype IRI
to prevent regressions. Locate the test function
test_dangling_ref_flags_undefined_range and after the existing predicate
assertion (matches[0].details["predicate"] == str(RDFS.range)) add an assertion
that matches[0].details["dangling_target"] == str(EX.UndeclaredDatatype") so the
linter's reported target (for EX.age range) is validated along with the
predicate.
- Around line 1267-1279: The test function
test_dangling_ref_skips_well_known_namespaces should be extended to include
dc/dcterms examples: add triples to the Graph similar to the existing XSD/SKOS
examples (e.g., using EX.something with RDF.type OWL.ObjectProperty and
RDFS.range set to DC.title and another to DCTERMS.creator or similar) so the
OntologyLinter lint call for enabled_rules={"dangling-ref"} validates that dc
and dcterms URIs are also skipped; keep the assertion _results_with_rule(issues,
"dangling-ref") == [] unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03f88a0e-a520-421b-b29b-f29d9b97e68e

📥 Commits

Reviewing files that changed from the base of the PR and between c1f03ae and fd2897b.

📒 Files selected for processing (2)
  • ontokit/services/linter.py
  • tests/unit/test_linter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ontokit/services/linter.py

Comment thread tests/unit/test_linter.py
Comment thread tests/unit/test_linter.py
Two tiny CodeRabbit-flagged gaps in the dangling-ref tests:

* test_dangling_ref_flags_undefined_range only asserted details.predicate,
  not the dangling_target IRI. Add the target assertion so a regression
  in either field is caught.
* test_dangling_ref_skips_well_known_namespaces only exercised XSD and
  SKOS — the implementation also skips DC and DCTERMS. Add a DC.title
  range and a DCTERMS.creator range so all 7 well-known namespaces are
  represented in the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant