fix(scoring): document confidence scaling, sort by severity within rule bucket#153
Conversation
keshprad
left a comment
There was a problem hiding this comment.
Requesting changes for regression coverage: the implementation behavior looks correct from local validation, but this PR should add or adjust a test for the order-dependent case it fixes.
The existing test_same_rule_mixed_severities has CRITICAL before LOW, so it would pass on the base branch too. Please add a [LOW, CRITICAL] case for the same rule_id asserting score 52, which directly covers the new severity sort behavior.
rng1995
left a comment
There was a problem hiding this comment.
Re-review
The implementation is correct — sorting findings by (rule_id, severity_rank) before applying diminishing weights ensures the highest-severity occurrence of each rule gets full weight, and the docstring updates are clear.
But @keshprad's requested regression test is still missing — no new commit since the review (head is still 6c9287e1, and the diff only touches report.py). As noted there, the existing test_same_rule_mixed_severities orders CRITICAL before LOW, so it would also pass on the base branch and doesn't actually exercise the new sort. Keeping this at request changes pending:
- A same-
rule_idcase in[LOW, CRITICAL]input order asserting the sorted score (~52). Without the sort that input scores lower (LOW full + CRITICAL half-weighted), so the case directly locks in the fix.
rng1995
left a comment
There was a problem hiding this comment.
Re-review — implementation is correct, but the requested regression test is still missing
Head is unchanged since @keshprad's review (6c9287e1); the diff is still report.py only (+14/-1) with no test changes. Re-requesting changes for the same reason. Credit to @keshprad for the catch.
The implementation itself is correct: sorting by (rule_id, severity_rank) before applying the diminishing per-rule weights ensures the highest-severity occurrence in each bucket always receives the full weight, and the confidence-scaling docstring is a useful clarification.
The gap is regression coverage of the behavior this PR fixes. The existing test_same_rule_mixed_severities lists CRITICAL before LOW, so it already passes on the base branch (no sort) and does not exercise the new ordering. Please add a same-rule_id [LOW, CRITICAL] case (input order LOW first) asserting the sorted score of 52:
- With the sort: CRITICAL receives the full weight and LOW the diminished weight -> 52.
- Without the sort (base branch): LOW would take full weight and CRITICAL the half weight -> a lower score.
That input-order-sensitive case is exactly what locks in the fix and guards against a future regression. Once it's added I'm happy to approve.
|
Thanks for review, will work on this and add it. |
…le bucket Addresses non-blocking review feedback: 1. Docstring now explicitly documents that each finding's contribution is scaled by its confidence value, and that zero-confidence findings are skipped without consuming a weight slot. 2. Findings are now sorted by (rule_id, severity DESC) before applying diminishing weights, ensuring the highest-severity occurrence of each rule always receives the full weight regardless of input order.
… sort Add test_same_rule_low_before_critical_sorted_correctly that feeds [LOW, CRITICAL] for the same rule_id. Without the severity sort the score is 30 (LOW gets full weight); with it the score is 52 (CRITICAL gets full weight). This locks in the fix and guards against regression. Addresses review feedback from @keshprad and @rng1995. Signed-off-by: mimran-khan <mohammed_imran.khan@outlook.com>
6c9287e to
4d50828
Compare
|
Addressed — pushed the regression test:
Branch rebased on latest |
rng1995
left a comment
There was a problem hiding this comment.
[Automated SkillSpector Review]
Re-review: blocker resolved — approving.
The requested input-order regression test is now present: test_same_rule_low_before_critical_sorted_correctly feeds [LOW, CRITICAL] for the same rule_id and asserts score == 52. That directly locks in the (rule_id, severity_rank) sort — without it the same input scores 30 (LOW full-weight, CRITICAL half), so the test would fail on the pre-sort behavior. Implementation and docstring were already correct. Approving.
keshprad
left a comment
There was a problem hiding this comment.
Re-review: requested regression coverage is now present and the change looks good.
The new test_same_rule_low_before_critical_sorted_correctly feeds [LOW, CRITICAL] for the same rule_id and asserts score 52, directly covering the input-order case from the previous review. I verified the report tests and the full default test suite against the PR source with PYTHONPATH=src; both pass. Approving.
Summary
Follow-up to #139 addressing non-blocking review feedback from @rng1995:
Confidence scaling documented —
_compute_risk_scoredocstring now explicitly states that each finding's contribution is scaled by its confidence value (clamped to [0,1]) and that zero-confidence findings are skipped without consuming a weight slot.Severity sort within rule bucket — Findings are now sorted by
(rule_id, severity DESC)before applying diminishing weights, ensuring the highest-severity occurrence of each rule always receives the full weight regardless of input order.Test plan
test_report.pytests pass (scoring, bands, edge cases, real-world scenarios)test_same_rule_mixed_severitiesvalidates the sort ensures CRITICAL gets full weightCloses review feedback on #139.