Skip to content

fix(0.40.5): gates compute a real holdout delta (baseline/candidate were conflated)#101

Merged
drewstone merged 1 commit into
mainfrom
fix/0.40.5-gate-baseline-candidate-separation
May 25, 2026
Merged

fix(0.40.5): gates compute a real holdout delta (baseline/candidate were conflated)#101
drewstone merged 1 commit into
mainfrom
fix/0.40.5-gate-baseline-candidate-separation

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Second substrate gap caught by the gtm cycle. runImprovementLoop merged baseline + candidate judge scores into one cellId-keyed map; since they share cellIds, baseline overwrote candidate → delta always 0 → improvement could never ship.

Fix: GateContext.baselineJudgeScores (separate from candidate judgeScores); runImprovementLoop populates both distinctly; heldOutGate + defaultProductionGate read candidate vs baseline correctly. Tests assert a real ship (delta 4.0) + real hold (delta 0), replacing the loose assertion that masked it. 1422/1422. 0.40.4 → 0.40.5.

…ere conflated)

CRITICAL: runImprovementLoop merged baseline + candidate judge scores into ONE
map keyed by cellId. Baseline and candidate share cellIds (same holdout
scenarios), so the baseline overwrote the candidate — both heldOutGate and
defaultProductionGate then computed baseline == candidate, delta == 0, and a
real improvement could NEVER ship. Surfaced by the gtm cycle (the second
substrate gap the first real consumer caught).

Fix:
- GateContext gains `baselineJudgeScores` (separate from candidate `judgeScores`).
- runImprovementLoop populates them distinctly (no merge).
- heldOutGate + defaultProductionGate read candidate from `judgeScores` and
  baseline from `baselineJudgeScores` — a real delta.

Tests: heldOutGate now asserts a real ship (delta 4.0) AND a real hold
(delta 0) with separate maps — replacing the loose `['ship','hold']`
assertion that masked the bug. Suite 1422/1422, build + lint clean.

0.40.4 → 0.40.5 (npm + python lockstep). Unblocks gated improvement everywhere.
@drewstone drewstone merged commit 360c670 into main May 25, 2026
1 check passed
@drewstone drewstone deleted the fix/0.40.5-gate-baseline-candidate-separation branch May 25, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant