Skip to content

fix(scoring): document confidence scaling, sort by severity within rule bucket#153

Merged
keshprad merged 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/scoring-confidence-docs-and-sort
Jun 24, 2026
Merged

fix(scoring): document confidence scaling, sort by severity within rule bucket#153
keshprad merged 2 commits into
NVIDIA:mainfrom
mimran-khan:fix/scoring-confidence-docs-and-sort

Conversation

@mimran-khan

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #139 addressing non-blocking review feedback from @rng1995:

  1. Confidence scaling documented_compute_risk_score docstring now explicitly states that each finding's contribution is scaled by its confidence value (clamped to [0,1]) and that zero-confidence findings are skipped without consuming a weight slot.

  2. Severity sort within rule bucket — Findings are now sorted by (rule_id, severity DESC) before applying diminishing weights, ensuring the highest-severity occurrence of each rule always receives the full weight regardless of input order.

Test plan

  • All 38 existing test_report.py tests pass (scoring, bands, edge cases, real-world scenarios)
  • test_same_rule_mixed_severities validates the sort ensures CRITICAL gets full weight

Closes review feedback on #139.

@keshprad keshprad left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for regression coverage: the implementation behavior looks correct from local validation, but this PR should add or adjust a test for the order-dependent case it fixes.

The existing test_same_rule_mixed_severities has CRITICAL before LOW, so it would pass on the base branch too. Please add a [LOW, CRITICAL] case for the same rule_id asserting score 52, which directly covers the new severity sort behavior.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review

The implementation is correct — sorting findings by (rule_id, severity_rank) before applying diminishing weights ensures the highest-severity occurrence of each rule gets full weight, and the docstring updates are clear.

But @keshprad's requested regression test is still missing — no new commit since the review (head is still 6c9287e1, and the diff only touches report.py). As noted there, the existing test_same_rule_mixed_severities orders CRITICAL before LOW, so it would also pass on the base branch and doesn't actually exercise the new sort. Keeping this at request changes pending:

  • A same-rule_id case in [LOW, CRITICAL] input order asserting the sorted score (~52). Without the sort that input scores lower (LOW full + CRITICAL half-weighted), so the case directly locks in the fix.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — implementation is correct, but the requested regression test is still missing

Head is unchanged since @keshprad's review (6c9287e1); the diff is still report.py only (+14/-1) with no test changes. Re-requesting changes for the same reason. Credit to @keshprad for the catch.

The implementation itself is correct: sorting by (rule_id, severity_rank) before applying the diminishing per-rule weights ensures the highest-severity occurrence in each bucket always receives the full weight, and the confidence-scaling docstring is a useful clarification.

The gap is regression coverage of the behavior this PR fixes. The existing test_same_rule_mixed_severities lists CRITICAL before LOW, so it already passes on the base branch (no sort) and does not exercise the new ordering. Please add a same-rule_id [LOW, CRITICAL] case (input order LOW first) asserting the sorted score of 52:

  • With the sort: CRITICAL receives the full weight and LOW the diminished weight -> 52.
  • Without the sort (base branch): LOW would take full weight and CRITICAL the half weight -> a lower score.

That input-order-sensitive case is exactly what locks in the fix and guards against a future regression. Once it's added I'm happy to approve.

@mimran-khan

Copy link
Copy Markdown
Contributor Author

Thanks for review, will work on this and add it.

…le bucket

Addresses non-blocking review feedback:

1. Docstring now explicitly documents that each finding's contribution is
   scaled by its confidence value, and that zero-confidence findings are
   skipped without consuming a weight slot.

2. Findings are now sorted by (rule_id, severity DESC) before applying
   diminishing weights, ensuring the highest-severity occurrence of each
   rule always receives the full weight regardless of input order.
… sort

Add test_same_rule_low_before_critical_sorted_correctly that feeds
[LOW, CRITICAL] for the same rule_id. Without the severity sort the
score is 30 (LOW gets full weight); with it the score is 52 (CRITICAL
gets full weight). This locks in the fix and guards against regression.

Addresses review feedback from @keshprad and @rng1995.

Signed-off-by: mimran-khan <mohammed_imran.khan@outlook.com>
@mimran-khan mimran-khan force-pushed the fix/scoring-confidence-docs-and-sort branch from 6c9287e to 4d50828 Compare June 23, 2026 09:35
@mimran-khan

Copy link
Copy Markdown
Contributor Author

Addressed — pushed the regression test:

test_same_rule_low_before_critical_sorted_correctly: feeds [LOW, CRITICAL] for the same rule_id and asserts score 52. Without the severity sort the score would be 30 (LOW gets full weight at 51.0=5, CRITICAL gets diminished at 500.5=25). With the sort it's 52 (CRITICAL first at 501.0=50, LOW second at 50.5=2.5).

Branch rebased on latest main (5 PRs merged upstream today).

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Automated SkillSpector Review]

Re-review: blocker resolved — approving.

The requested input-order regression test is now present: test_same_rule_low_before_critical_sorted_correctly feeds [LOW, CRITICAL] for the same rule_id and asserts score == 52. That directly locks in the (rule_id, severity_rank) sort — without it the same input scores 30 (LOW full-weight, CRITICAL half), so the test would fail on the pre-sort behavior. Implementation and docstring were already correct. Approving.

@mimran-khan mimran-khan requested a review from keshprad June 24, 2026 08:37

@keshprad keshprad left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: requested regression coverage is now present and the change looks good.

The new test_same_rule_low_before_critical_sorted_correctly feeds [LOW, CRITICAL] for the same rule_id and asserts score 52, directly covering the input-order case from the previous review. I verified the report tests and the full default test suite against the PR source with PYTHONPATH=src; both pass. Approving.

@keshprad keshprad merged commit 7bc9c0f into NVIDIA:main Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants