Skip to content

Improve validator CLI safety, detailed reporting, and documentation#12

Open
atibaut wants to merge 8 commits intomainfrom
refactor/universal-dataset-validation
Open

Improve validator CLI safety, detailed reporting, and documentation#12
atibaut wants to merge 8 commits intomainfrom
refactor/universal-dataset-validation

Conversation

@atibaut
Copy link
Copy Markdown

@atibaut atibaut commented Apr 13, 2026

Summary

This PR improves the validation workflow to prevent terminal overload during large runs while preserving full diagnostics in a report file. It also clarifies CLI behavior and separates universal validation from suite-style testing.

Why

Large validation runs can produce very high terminal output, which can degrade VS Code responsiveness. We needed:

  • compact terminal output by default
  • full validation detail persisted to file
  • clear user-facing CLI docs for defaults and tuning options
  • explicit separation between universal validation and sample-suite expectations

Workflow separation (new)

The repository now has two distinct CLI workflows:

  • Universal validation: scripts/validate.py
    • Pure SHACL conformance validation for arbitrary datasets
    • No implicit positives/negatives expectation logic
  • Suite testing: scripts/validate_suite.py
    • Intended for sample test suites organized with positives/negatives
    • Evaluates expected outcomes from path semantics and reports PASS/FAIL/UNCLASSIFIED

This removes ambiguity and keeps test-suite semantics out of the generic validator.

What changed

  • Added full detailed report export for validation runs in scripts/validate.py.
  • Kept terminal output compact while still showing status, counts, and per-file invalid summaries in scripts/validate.py.
  • Added and documented report path support with default:
    • logs/validation-report.txt
  • Kept max terminal listing control exposed via CLI with default 50:
    • --max-files-report (0 means unlimited)
  • Added optional per-file timeout and progress controls in scripts/validate.py.
  • Added timeout support plumbing in validation engine in scripts/validator.py.
  • Removed ambiguous ValidationResult.passed() semantics from model layer.
  • Added dedicated suite runner scripts/validate_suite.py for expected-outcome testing.
  • Updated docs for:
    • minimal default CLI usage
    • optional tuning usage
    • report behavior and defaults
    • vocabulary loading rationale
    • supported RDF serializations and mixed-format directory validation
    • explicit two-workflow usage guidance
  • Added logs directory to git ignore in .gitignore.

Validation and testing performed

  • Ran validator against local sample suites and confirmed:
    • compact terminal output
    • full violation details written to report file
  • Ran validator against latest upstream draft assets from mobilityDCAT-AP and confirmed successful end-to-end execution with report generation.
  • Ran dedicated suite runner and confirmed expected pass/fail behavior by positives/negatives paths.
  • Confirmed no diagnostics for changed files.

Notes

  • No change to SHACL rule semantics.
  • Universal conformance and suite-expectation testing are now separated by command.
  • Full details remain available in report outputs even when terminal listing is capped.

Files changed

  • .gitignore
  • README.md
  • docs/README.md
  • scripts/validate.py
  • scripts/validate_suite.py
  • scripts/validator.py

atibaut added 2 commits April 13, 2026 08:49
…alidation

- Simplify ValidationResult: remove is_positive_test() and is_negative_test()
- Update passed() to directly return conformance status
- Refactor validate.py to group by conformance (valid/invalid) instead of test expectations
- Files can now be located anywhere without folder structure constraints
- Works with any example datasets regardless of naming conventions
Copilot AI review requested due to automatic review settings April 13, 2026 09:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SHACL validation CLI to keep terminal output manageable during large directory runs while still persisting full diagnostics to a report file, and documents the updated CLI defaults/options.

Changes:

  • Added detailed report generation to a configurable --report-file path (default logs/validation-report.txt) and kept terminal output compact with optional verbosity.
  • Added optional per-file timeout and per-file progress output for directory validation.
  • Updated top-level and docs README CLI documentation and ignored the generated logs directory.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/validator.py Adds SIGALRM-based per-file timeout plumbing and adjusts validation result semantics.
scripts/validate.py Adds report writer, CLI flags (--report-file, --max-files-report, --timeout, --progress), and compact directory output behavior.
README.md Documents CLI defaults, report behavior, and tuning options.
docs/README.md Expands CLI option documentation and usage examples.
.gitignore Ignores /logs/ to avoid committing generated reports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to +114
return True, result, None

return True # Return True if file was successfully validated (regardless of result)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_single_file() now returns True whenever the file loads/validates without a loader error, even if result.conforms is False. Since main() uses this boolean to decide the process exit code, an invalid single-file validation will currently exit 0, which makes it hard to use the CLI in CI or scripting. Consider returning result.conforms (or providing a dedicated flag to ignore conformance in exit status).

Suggested change
return True, result, None
return True # Return True if file was successfully validated (regardless of result)
return result.conforms, result, None
return result.conforms

Copilot uses AI. Check for mistakes.
atibaut and others added 4 commits April 13, 2026 13:10
That is good proposal.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
That is good proposal.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Also good proposal.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
this is good.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants