Skip to content

fix(report): filter empty LLM findings and add SARIF rules[] array#158

Open
mimran-khan wants to merge 1 commit into
NVIDIA:mainfrom
mimran-khan:fix/sarif-compliance-and-empty-findings
Open

fix(report): filter empty LLM findings and add SARIF rules[] array#158
mimran-khan wants to merge 1 commit into
NVIDIA:mainfrom
mimran-khan:fix/sarif-compliance-and-empty-findings

Conversation

@mimran-khan

Copy link
Copy Markdown
Contributor

Summary

Two SARIF output correctness issues:

  1. Empty findings from LLM failures inflate results — When the LLM meta-analyzer fails or returns partial data, findings with empty rule_id or message leak into the SARIF output, producing invalid results that confuse downstream tooling (IDEs, CI integrations).

  2. Missing tool.driver.rules[] array — The SARIF 2.1.0 spec recommends rules[] containing reportingDescriptor entries for each rule referenced by results. Many consumers (GitHub Code Scanning, VS Code SARIF Viewer) use this to display rule metadata.

Changes

src/skillspector/sarif_models.py

  • Added SarifReportingDescriptor model with id and shortDescription fields
  • Added optional rules field to SarifDriver

src/skillspector/nodes/report.py (_build_sarif)

  • Filter findings where rule_id or message is falsy before building results
  • Collect unique rule IDs during result building
  • Generate sorted rules[] array from seen rule IDs

Testing

12 new tests covering:

  • Empty/None rule_id filtered
  • Empty/None message filtered
  • All-empty findings produce zero results
  • Valid findings unchanged
  • rules[] present with correct id and shortDescription
  • Multiple findings for same rule produce single rule entry
  • Rules sorted alphabetically by id
  • No rules emitted when all findings are empty

Existing 44 tests in test_sarif.py and test_report.py continue to pass.

Fixes #146, Fixes #148

Two SARIF output issues fixed:

1. Empty findings (missing rule_id or message) from LLM meta-analyzer
   failures are now filtered before SARIF serialization. Previously
   these could produce invalid SARIF results with empty ruleId.

2. SARIF output now includes the tool.driver.rules[] array containing
   reportingDescriptor entries for each unique rule referenced by
   results. This brings output closer to SARIF 2.1.0 spec compliance
   and enables IDE integrations that require rule metadata.

Added SarifReportingDescriptor model to sarif_models.py.

Fixes NVIDIA#146, NVIDIA#148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant