fix(findings): deduplicate cross-analyzer findings before scoring#142
fix(findings): deduplicate cross-analyzer findings before scoring#142mimran-khan wants to merge 2 commits into
Conversation
|
PS: These issues were found while running bunch of skill evals using SkillSpector and hence raising all the issues so that i can use these with correct reference to Upstream code |
rng1995
left a comment
There was a problem hiding this comment.
Review: deduplicate cross-analyzer findings before scoring
The deduplicate function itself is correct and very well tested (highest-confidence retention, whitespace normalization, no-matched-text passthrough, stable sort). My concern is how it's wired into report(), so I'm requesting changes.
Blocking: cross-file dedup removes findings from the report, not just from scoring
In report() the deduped list is assigned to filtered_findings and fed to both _compute_risk_score and _build_sarif:
findings = deduplicate(raw_findings)
filtered_findings = findingsThe cross-file pass keys on (rule_id, matched_text[:100]) and ignores file, so the same matched pattern across N files collapses to a single finding with a single location. Consequences for a security scanner:
- The emitted SARIF/JSON shows one affected file; the other N-1 files are silently unrepresented. A consumer (e.g., SARIF code scanning) only annotates one location, which can leave genuinely affected files un-remediated.
- The PR title scopes this to "before scoring," but the implementation also mutates the reported findings — broader than advertised.
- "cross-analyzer" in the title implies merging duplicate reports from different analyzers, but the key is a single
rule_id, so this actually collapses one rule's matches across files.
Recommendations (either is fine)
- Apply dedup to the score computation only and keep the full findings list in the report; or
- Aggregate collapsed instances into the surviving finding (retain all file locations / an occurrence count) instead of discarding them, so no affected-file information is lost.
Minor
- The sort uses
severity_order.get(f.severity, 4)without normalizing case. A non-uppercase severity sorts last; usef.severity.upper()to match the rest of the codebase (e.g._compute_risk_score). matched_text[:100]means two distinct payloads sharing a 100-char prefix collapse cross-file — acceptable as noise reduction but worth a comment given the security context.
Inter-PR note
PR #139 (per-rule diminishing returns) targets the same repeated-pattern saturation via a different mechanism and edits report.py from the same base. Once both land, dedup runs first, so #139's diminishing weights rarely trigger for identical patterns (partial redundancy). Please confirm the combined behavior is intended.
|
Thanks for the thorough review! I've addressed the blocking concern: Cross-file dedup now only affects scoring, not the report.
Also fixed the case sensitivity issue on the severity sort ( The |
When multiple analyzers detect the same pattern (e.g. subprocess.run with shell=True) in multiple files of a skill, each occurrence was treated as a completely independent finding. A skill calling subprocess in 5 modules got 5 findings, each contributing full points — even though it represents the same design choice observed in different places. This commit adds a two-pass deduplication step in the report node: 1. Same-file dedup: identical (rule_id, file, matched_text) collapsed to the highest-confidence instance 2. Cross-file dedup: identical (rule_id, matched_text) across different files collapsed to one representative finding Findings without matched_text are never cross-file deduplicated since they lack a reliable identity signal. Output is sorted by severity then file for stable, predictable reports. Fixes NVIDIA#137 Signed-off-by: Mohammed Imran Khan <mohammed_imran.khan@outlook.com>
…ll findings in report Cross-file deduplication now only affects risk score calculation. The full findings list (with all affected file locations) is preserved in SARIF, JSON, markdown, and terminal output so that consumers see every affected file rather than a single representative. Also normalizes severity case in the sort key to handle non-uppercase severity strings consistently.
4163b16 to
e6620ea
Compare
Summary
When multiple analyzers detect the same pattern in multiple files of a skill, each occurrence was treated as a completely independent finding. A skill that calls
subprocess.run(cmd, shell=True)in 5 modules got 5 TM1 findings, each contributing full points to the risk score — even though it represents the same architectural decision observed in different places.Changes
nodes/deduplicate.py: New module with a purededuplicate()function implementing two-pass deduplication:(rule_id, file, matched_text)→ keep highest confidence(rule_id, matched_text)across files → keep highest confidence representativenodes/report.py: Callsdeduplicate()on findings before scoring and SARIF generationDesign Decisions
matched_textmatched_textBehavioral Impact
shell=Truein 5 files, same patternsubprocesscalls in same fileTest Plan
ruff checkandruff formatcleanFixes #137