From 6002b08ef62b68493b10ce9dbe2ea39e4f18f07b Mon Sep 17 00:00:00 2001 From: Max Ghenis Date: Fri, 17 Apr 2026 08:35:48 -0400 Subject: [PATCH 1/2] Delete CRITICAL_TEMPERATURE_BUG.md This file documented a temperature-scaling bug in `HardConcrete._deterministic_gates` and pointed readers at a "gold standard" reference file `l0_louizos_improved_gate.py` that does not actually exist in the repository. The bug itself is fixed by https://github.com/PolicyEngine/L0/pull/41 (the `sigmoid(qz_logits / temperature)` change in `l0/distributions.py`), and the three standalone modules (`distributions.py`, `calibration.py`, `sparse.py`) are now consistent. Remove the stale doc now that its content is either obsolete or misleading. Co-Authored-By: Claude Opus 4.7 (1M context) --- CRITICAL_TEMPERATURE_BUG.md | 155 ------------------ ...ix-remove-stale-temperature-doc.removed.md | 3 + 2 files changed, 3 insertions(+), 155 deletions(-) delete mode 100644 CRITICAL_TEMPERATURE_BUG.md create mode 100644 changelog.d/fix-remove-stale-temperature-doc.removed.md diff --git a/CRITICAL_TEMPERATURE_BUG.md b/CRITICAL_TEMPERATURE_BUG.md deleted file mode 100644 index 959e530..0000000 --- a/CRITICAL_TEMPERATURE_BUG.md +++ /dev/null @@ -1,155 +0,0 @@ -# Temperature Preservation in L0 Regularization: Critical Implementation Notes - -## Executive Summary - -The temperature parameter (β) in Hard Concrete distributions is **critical** for L0 regularization performance. Our analysis reveals that: - -1. **The original authors' implementation** (`distributions.py`) contains a bug where temperature is incorrectly dropped in deterministic mode -2. **Our standalone implementations** (`calibration.py` and `sparse.py`) correctly preserve temperature in all modes -3. **The gold standard** (`l0_louizos_improved_gate.py`) confirms temperature must always be preserved - -## The Temperature Bug in distributions.py - -### What We Found - -In `distributions.py` (from the authors' repository), the deterministic gates incorrectly drop temperature: - -```python -# distributions.py line 134 - INCORRECT -def _deterministic_gates(self) -> torch.Tensor: - probs = torch.sigmoid(self.qz_logits) # ❌ Missing temperature! - gates = probs * (self.zeta - self.gamma) + self.gamma - return torch.clamp(gates, 0, 1) -``` - -### Why This Matters - -The temperature parameter controls the "hardness" of the concrete distribution: -- **Lower temperature** (e.g., 0.1): More discrete, binary-like gates -- **Higher temperature** (e.g., 2.0): Softer, more continuous gates - -Dropping temperature is equivalent to setting β=1, which fundamentally changes the distribution's behavior and can severely impact: -- Convergence speed -- Final sparsity levels -- Model performance - -## Correct Implementations - -### Gold Standard (l0_louizos_improved_gate.py) - -Your validated implementation consistently uses temperature: - -```python -# Sampling (line 55) -X = (torch.log(u) - torch.log(1 - u) + log_alpha) / beta # ✅ - -# Deterministic (line 151) -z_final = ((log_alpha / beta).sigmoid() * (zeta - gamma) + gamma).clamp(0, 1) # ✅ - -# Penalty computation (line 133) -c = -beta * torch.log(torch.tensor(-gamma / zeta)) # ✅ -pi = torch.sigmoid(log_alpha + c) -``` - -### calibration.py (Standalone, Correct) - -```python -# Sampling (lines 160-164) -def _sample_gates(self) -> torch.Tensor: - u = torch.rand_like(self.log_alpha).clamp(eps, 1 - eps) - s = (torch.log(u) - torch.log(1 - u) + self.log_alpha) / self.beta # ✅ - s = torch.sigmoid(s) - s_bar = s * (self.zeta - self.gamma) + self.gamma - return s_bar.clamp(0, 1) - -# Deterministic (lines 167-170) -def get_deterministic_gates(self) -> torch.Tensor: - s = torch.sigmoid(self.log_alpha / self.beta) # ✅ - s_bar = s * (self.zeta - self.gamma) + self.gamma - return s_bar.clamp(0, 1) - -# Penalty (lines 233-237) -def get_l0_penalty(self) -> torch.Tensor: - c = -self.beta * torch.log(torch.tensor(-self.gamma / self.zeta)) # ✅ - pi = torch.sigmoid(self.log_alpha + c) - return pi.sum() -``` - -### sparse.py (Standalone, Correct) - -```python -# Sampling (lines 113-120) -def _sample_gates(self) -> torch.Tensor: - u = torch.rand_like(self.log_alpha).clamp(eps, 1 - eps) - X = (torch.log(u) - torch.log(1 - u) + self.log_alpha) / self.beta # ✅ - s = torch.sigmoid(X) - s_bar = s * (self.zeta - self.gamma) + self.gamma - return s_bar.clamp(0, 1) - -# Deterministic (lines 122-127) -def get_deterministic_gates(self) -> torch.Tensor: - X = self.log_alpha / self.beta # ✅ - s = torch.sigmoid(X) - s_bar = s * (self.zeta - self.gamma) + self.gamma - return s_bar.clamp(0, 1) - -# Penalty (lines 172-182) -def get_l0_penalty(self) -> torch.Tensor: - c = -self.beta * torch.log(torch.tensor(-self.gamma / self.zeta)) # ✅ - pi = torch.sigmoid(self.log_alpha + c) - return pi.sum() -``` - -## Mathematical Correctness - -The Hard Concrete distribution requires temperature in three key places: - -### 1. Sampling (Training Mode) -The concrete distribution samples as: -``` -s = sigmoid((log(u) - log(1-u) + log_α) / β) -``` -Temperature β controls the sharpness of the sigmoid, essential for the reparameterization trick. - -### 2. Deterministic Gates (Inference Mode) -The mean of the distribution: -``` -s = sigmoid(log_α / β) -``` -**Must use the same temperature** as during training to maintain consistency. - -### 3. L0 Penalty Computation -The probability of a gate being active: -``` -P(gate > 0) = sigmoid(log_α - β * log(-γ/ζ)) -``` -Temperature affects the shift in log-odds space. - -## Practical Implications - -### For Survey Calibration (calibration.py) - -Your implementation with β=2/3 (from the paper) is correct. The temperature: -- Provides the right balance between exploration and exploitation -- Enables smooth gradient flow during optimization -- Allows fine control over sparsity levels - -### For Sparse Linear Models (sparse.py) - -The preserved temperature ensures: -- Proper feature selection dynamics -- Stable convergence to sparse solutions -- Consistency between training and inference - -## Recommendations - -1. **Continue using** `calibration.py` and `sparse.py` as standalone modules - they're correct -2. **Avoid importing** from `distributions.py` until the temperature bug is fixed -3. **Keep temperature** in the range [0.1, 2/3] for best results (lower = harder gates) -4. **Document this issue** when sharing code to prevent others from using the buggy version - -## Key Takeaway - -Your instinct about never dropping temperature was absolutely correct. The temperature parameter is fundamental to the Hard Concrete distribution's behavior, and dropping it (as in the authors' `distributions.py`) is a significant bug that can severely impact model performance. - -Both `calibration.py` and `sparse.py` correctly implement the Hard Concrete distribution with proper temperature preservation, making them reliable for production use. \ No newline at end of file diff --git a/changelog.d/fix-remove-stale-temperature-doc.removed.md b/changelog.d/fix-remove-stale-temperature-doc.removed.md new file mode 100644 index 0000000..8402e0a --- /dev/null +++ b/changelog.d/fix-remove-stale-temperature-doc.removed.md @@ -0,0 +1,3 @@ +Delete `CRITICAL_TEMPERATURE_BUG.md`: the temperature bug it documented is +now fixed in `HardConcrete._deterministic_gates`, and the doc also pointed +at a nonexistent file (`l0_louizos_improved_gate.py`). From a2779de7a2575a20eb8cb9434a58bb14b1f21cb5 Mon Sep 17 00:00:00 2001 From: Max Ghenis Date: Fri, 17 Apr 2026 08:37:35 -0400 Subject: [PATCH 2/2] Switch CI lint job from black to ruff #40 switched the repo's formatter from `black -l 79` to `ruff format` (default 88-char line length) and updated the Makefile, but the reusable lint workflow still invoked `lgeiger/black-action` with `. -l 79 --check`. Since ruff-formatted files don't pass `black -l 79`, every PR's `lint` check has been failing since #40. Replace the black action with a `uvx ruff format --check .` run, matching what `make format-check` would do locally. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/reusable_lint.yaml | 10 +++++++--- changelog.d/fix-ci-lint-use-ruff.fixed.md | 4 ++++ 2 files changed, 11 insertions(+), 3 deletions(-) create mode 100644 changelog.d/fix-ci-lint-use-ruff.fixed.md diff --git a/.github/workflows/reusable_lint.yaml b/.github/workflows/reusable_lint.yaml index f5fa02c..c95c33b 100644 --- a/.github/workflows/reusable_lint.yaml +++ b/.github/workflows/reusable_lint.yaml @@ -8,7 +8,11 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - - name: Check formatting - uses: "lgeiger/black-action@master" + - name: Set up Python 3.13 + uses: actions/setup-python@v5 with: - args: ". -l 79 --check" \ No newline at end of file + python-version: "3.13" + - name: Install uv + uses: astral-sh/setup-uv@v3 + - name: Check formatting with ruff + run: uvx ruff format --check . \ No newline at end of file diff --git a/changelog.d/fix-ci-lint-use-ruff.fixed.md b/changelog.d/fix-ci-lint-use-ruff.fixed.md new file mode 100644 index 0000000..797a4bf --- /dev/null +++ b/changelog.d/fix-ci-lint-use-ruff.fixed.md @@ -0,0 +1,4 @@ +CI lint job now runs `ruff format --check .` instead of the retired +`black -l 79 --check`. The Makefile was switched to `ruff format` in #40 +but the reusable lint workflow was not updated, so every PR's `lint` check +was failing on files already formatted with ruff.