Fix temperature scaling in HardConcrete deterministic gates#41
Merged
Fix temperature scaling in HardConcrete deterministic gates#41
Conversation
`HardConcrete._deterministic_gates` computed `sigmoid(qz_logits)` without dividing by `temperature`, so `.eval()` output silently ignored the temperature parameter. This contradicted `_sample_gates`, `get_penalty`, and `get_active_prob`, which all apply the temperature scaling. For PolicyEngine's default `temperature=0.25` this was a 4x distortion in log-odds space and broke train/eval consistency for every L0Linear / L0Conv2d / L0DepthwiseConv2d / L0Gate user. TemperatureScheduler updates were also silently dropped at eval time. The fix uses `sigmoid(qz_logits / temperature)`, matching Louizos et al. 2017 Eq. 11 and the temperature-correct implementations in `SparseCalibrationWeights` (calibration.py) and `SparseL0Linear` (sparse.py). Adds three regression tests: - `test_deterministic_gates_respect_temperature` fixes identical logits, varies temperature, and asserts eval output differs. Confirmed to FAIL on the pre-fix code (`AssertionError: Deterministic gates should depend on temperature`). - `test_deterministic_gates_match_reference_formula` pins the closed-form output against `sigmoid(log_alpha / beta) * (zeta - gamma) + gamma`. - `test_sparsity_stats_match_eval_activation` checks that `get_sparsity()` (temperature-aware) stays consistent with the fraction of gates actually non-zero at eval time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis
added a commit
that referenced
this pull request
Apr 17, 2026
This file documented a temperature-scaling bug in `HardConcrete._deterministic_gates` and pointed readers at a "gold standard" reference file `l0_louizos_improved_gate.py` that does not actually exist in the repository. The bug itself is fixed by #41 (the `sigmoid(qz_logits / temperature)` change in `l0/distributions.py`), and the three standalone modules (`distributions.py`, `calibration.py`, `sparse.py`) are now consistent. Remove the stale doc now that its content is either obsolete or misleading. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
MaxGhenis
commented
Apr 17, 2026
Contributor
Author
MaxGhenis
left a comment
There was a problem hiding this comment.
LGTM — the critical fix is exactly right.
Verified:
l0/distributions.py:137changestorch.sigmoid(self.qz_logits)totorch.sigmoid(self.qz_logits / self.temperature)— matches Louizos et al. 2017 Eq. 11 and the standaloneSparseCalibrationWeights/SparseL0Linearclosed form.- Only the
_deterministic_gatesmean is changed;_sample_gates,get_penalty,get_active_probwere already temperature-aware and are untouched (no over-reach). - Three regression tests are load-bearing:
test_deterministic_gates_respect_temperature— confirmed to FAIL on the pre-fix impl: I monkeypatched_deterministic_gatesback tosigmoid(qz_logits)and observedtorch.allclose(high_gates, low_gates) == True, which violates the test'snot torch.allcloseassertion.test_deterministic_gates_match_reference_formula— pins the closed form (the fix's Eq. 11).test_sparsity_stats_match_eval_activation— subsumes finding #3 (reported vs actual sparsity agreement within 0.15).
- Findings #1 (critical), #2 (test gap), #3 (within-class inconsistency) all addressed.
- No unrelated changes; just the 5-line code fix + 89-line test block + changelog.
Will go green once #44 merges and lint unblocks.
MaxGhenis
added a commit
that referenced
this pull request
Apr 17, 2026
* Delete CRITICAL_TEMPERATURE_BUG.md This file documented a temperature-scaling bug in `HardConcrete._deterministic_gates` and pointed readers at a "gold standard" reference file `l0_louizos_improved_gate.py` that does not actually exist in the repository. The bug itself is fixed by #41 (the `sigmoid(qz_logits / temperature)` change in `l0/distributions.py`), and the three standalone modules (`distributions.py`, `calibration.py`, `sparse.py`) are now consistent. Remove the stale doc now that its content is either obsolete or misleading. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Switch CI lint job from black to ruff #40 switched the repo's formatter from `black -l 79` to `ruff format` (default 88-char line length) and updated the Makefile, but the reusable lint workflow still invoked `lgeiger/black-action` with `. -l 79 --check`. Since ruff-formatted files don't pass `black -l 79`, every PR's `lint` check has been failing since #40. Replace the black action with a `uvx ruff format --check .` run, matching what `make format-check` would do locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HardConcrete._deterministic_gateswas computingsigmoid(qz_logits)without the/ temperaturescaling, so.eval()output silently ignored the temperature parameter. For PolicyEngine's defaulttemperature=0.25this was a 4x distortion in log-odds space and broke train/eval consistency for everyL0Linear/L0Conv2d/L0DepthwiseConv2d/L0Gatecaller, andTemperatureSchedulerupdates were dropped at eval time.sigmoid(qz_logits / temperature), matching Louizos et al. 2017 Eq. 11 and the temperature-correct implementations inSparseCalibrationWeights(calibration.py) andSparseL0Linear(sparse.py).Addresses bug-hunt findings
_deterministic_gates(fixed inl0/distributions.py:132).test_deterministic_gates_respect_temperature, confirmed to FAIL on pre-fix code and PASS after).get_active_prob/get_sparsityused temperature but_deterministic_gatesdid not (test_sparsity_stats_match_eval_activationpins the consistency after the fix).Test plan
uv run pytest tests -x -qpasses (86 passed, 1 skipped; previously 83).l0/distributions.pyand re-rantest_deterministic_gates_respect_temperature— fails withAssertionError: Deterministic gates should depend on temperature.test_deterministic_gates_match_reference_formulapins the closed form againstSparseCalibrationWeights/SparseL0Linear.test_sparsity_stats_match_eval_activationverifiesget_sparsity_statsagrees with actual eval-time activation rate.