fix(findings): deduplicate cross-analyzer findings before scoring by mimran-khan · Pull Request #142 · NVIDIA/SkillSpector

mimran-khan · 2026-06-22T17:56:29Z

Summary

When multiple analyzers detect the same pattern in multiple files of a skill, each occurrence was treated as a completely independent finding. A skill that calls subprocess.run(cmd, shell=True) in 5 modules got 5 TM1 findings, each contributing full points to the risk score — even though it represents the same architectural decision observed in different places.

Changes

nodes/deduplicate.py: New module with a pure deduplicate() function implementing two-pass deduplication:
1. Same-file dedup: identical (rule_id, file, matched_text) → keep highest confidence
2. Cross-file dedup: identical (rule_id, matched_text) across files → keep highest confidence representative
nodes/report.py: Calls deduplicate() on findings before scoring and SARIF generation

Design Decisions

Decision	Rationale
Dedup key uses first 100 chars of `matched_text`	Prevents pathologically long matches from consuming memory while maintaining specificity
No cross-file dedup for empty `matched_text`	Without matched text, there's no reliable identity signal
Dedup before scoring (in report node)	Preserves raw findings for meta_analyzer LLM analysis, only deduplicates for final output
Sort output by severity then file	Stable, predictable report ordering

Behavioral Impact

Scenario	Before	After
`shell=True` in 5 files, same pattern	5 × TM1 HIGH findings	1 × TM1 HIGH finding
Different `subprocess` calls in same file	Multiple findings	Multiple (different matched_text)
Same rule_id, different patterns	All kept	All kept (different dedup keys)
Findings without matched_text	All kept	All kept (no cross-file dedup)

Test Plan

17 unit tests covering same-file dedup, cross-file dedup, no-matched-text preservation, edge cases (empty list, sorting, whitespace normalization, truncation, mixed scenarios, real-world simulation)
Full test suite: 693 passed, 0 failures, 0 regressions
ruff check and ruff format clean

Fixes #137

mimran-khan · 2026-06-22T19:14:46Z

PS: These issues were found while running bunch of skill evals using SkillSpector and hence raising all the issues so that i can use these with correct reference to Upstream code

rng1995

Review: deduplicate cross-analyzer findings before scoring

The deduplicate function itself is correct and very well tested (highest-confidence retention, whitespace normalization, no-matched-text passthrough, stable sort). My concern is how it's wired into report(), so I'm requesting changes.

Blocking: cross-file dedup removes findings from the report, not just from scoring

In report() the deduped list is assigned to filtered_findings and fed to both _compute_risk_score and _build_sarif:

findings = deduplicate(raw_findings)
filtered_findings = findings

The cross-file pass keys on (rule_id, matched_text[:100]) and ignores file, so the same matched pattern across N files collapses to a single finding with a single location. Consequences for a security scanner:

The emitted SARIF/JSON shows one affected file; the other N-1 files are silently unrepresented. A consumer (e.g., SARIF code scanning) only annotates one location, which can leave genuinely affected files un-remediated.
The PR title scopes this to "before scoring," but the implementation also mutates the reported findings — broader than advertised.
"cross-analyzer" in the title implies merging duplicate reports from different analyzers, but the key is a single rule_id, so this actually collapses one rule's matches across files.

Recommendations (either is fine)

Apply dedup to the score computation only and keep the full findings list in the report; or
Aggregate collapsed instances into the surviving finding (retain all file locations / an occurrence count) instead of discarding them, so no affected-file information is lost.

Minor

The sort uses severity_order.get(f.severity, 4) without normalizing case. A non-uppercase severity sorts last; use f.severity.upper() to match the rest of the codebase (e.g. _compute_risk_score).
matched_text[:100] means two distinct payloads sharing a 100-char prefix collapse cross-file — acceptable as noise reduction but worth a comment given the security context.

Inter-PR note

PR #139 (per-rule diminishing returns) targets the same repeated-pattern saturation via a different mechanism and edits report.py from the same base. Once both land, dedup runs first, so #139's diminishing weights rarely trigger for identical patterns (partial redundancy). Please confirm the combined behavior is intended.

mimran-khan · 2026-06-22T19:55:39Z

Thanks for the thorough review! I've addressed the blocking concern:

Cross-file dedup now only affects scoring, not the report.

deduplicate() is applied only to the findings list passed to _compute_risk_score()
The full (non-deduped) findings list is used for SARIF, JSON, markdown, and terminal output — all affected file locations are preserved
Added a test (test_report_dedup_affects_score_only_not_report_output) that verifies 4 files with the same pattern all appear in the report while the score remains lower than 4x base points

Also fixed the case sensitivity issue on the severity sort (f.severity.upper()).

The matched_text[:100] prefix collision is kept as-is since it's an acceptable noise reduction tradeoff — added a note in the code comment.

When multiple analyzers detect the same pattern (e.g. subprocess.run with shell=True) in multiple files of a skill, each occurrence was treated as a completely independent finding. A skill calling subprocess in 5 modules got 5 findings, each contributing full points — even though it represents the same design choice observed in different places. This commit adds a two-pass deduplication step in the report node: 1. Same-file dedup: identical (rule_id, file, matched_text) collapsed to the highest-confidence instance 2. Cross-file dedup: identical (rule_id, matched_text) across different files collapsed to one representative finding Findings without matched_text are never cross-file deduplicated since they lack a reliable identity signal. Output is sorted by severity then file for stable, predictable reports. Fixes NVIDIA#137 Signed-off-by: Mohammed Imran Khan <mohammed_imran.khan@outlook.com>

…ll findings in report Cross-file deduplication now only affects risk score calculation. The full findings list (with all affected file locations) is preserved in SARIF, JSON, markdown, and terminal output so that consumers see every affected file rather than a single representative. Also normalizes severity case in the sort key to handle non-uppercase severity strings consistently.

This was referenced Jun 22, 2026

fix(scoring): prevent risk score saturation via per-rule diminishing returns #139

Merged

feat(cli): add --recursive flag for multi-skill directory scanning #141

Open

rng1995 requested changes Jun 22, 2026

View reviewed changes

rng1995 mentioned this pull request Jun 22, 2026

fix(meta-analyzer): add heuristic fallback filter for --no-llm mode #143

Open

3 tasks

mimran-khan added 2 commits June 23, 2026 01:28

mimran-khan force-pushed the fix/cross-analyzer-deduplication-137 branch from 4163b16 to e6620ea Compare June 22, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(findings): deduplicate cross-analyzer findings before scoring#142

fix(findings): deduplicate cross-analyzer findings before scoring#142
mimran-khan wants to merge 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/cross-analyzer-deduplication-137

mimran-khan commented Jun 22, 2026

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mimran-khan commented Jun 22, 2026

Summary

Changes

Design Decisions

Behavioral Impact

Test Plan

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

rng1995 left a comment

Choose a reason for hiding this comment

Review: deduplicate cross-analyzer findings before scoring

Blocking: cross-file dedup removes findings from the report, not just from scoring

Recommendations (either is fine)

Minor

Inter-PR note

Uh oh!

mimran-khan commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants