From 6002b08ef62b68493b10ce9dbe2ea39e4f18f07b Mon Sep 17 00:00:00 2001
From: Max Ghenis <mghenis@gmail.com>
Date: Fri, 17 Apr 2026 08:35:48 -0400
Subject: [PATCH 1/2] Delete CRITICAL_TEMPERATURE_BUG.md

This file documented a temperature-scaling bug in
`HardConcrete._deterministic_gates` and pointed readers at a "gold
standard" reference file `l0_louizos_improved_gate.py` that does not
actually exist in the repository. The bug itself is fixed by
https://github.com/PolicyEngine/L0/pull/41 (the `sigmoid(qz_logits /
temperature)` change in `l0/distributions.py`), and the three standalone
modules (`distributions.py`, `calibration.py`, `sparse.py`) are now
consistent. Remove the stale doc now that its content is either obsolete
or misleading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CRITICAL_TEMPERATURE_BUG.md                   | 155 ------------------
 ...ix-remove-stale-temperature-doc.removed.md |   3 +
 2 files changed, 3 insertions(+), 155 deletions(-)
 delete mode 100644 CRITICAL_TEMPERATURE_BUG.md
 create mode 100644 changelog.d/fix-remove-stale-temperature-doc.removed.md

diff --git a/CRITICAL_TEMPERATURE_BUG.md b/CRITICAL_TEMPERATURE_BUG.md
deleted file mode 100644
index 959e530..0000000
--- a/CRITICAL_TEMPERATURE_BUG.md
+++ /dev/null
@@ -1,155 +0,0 @@
-# Temperature Preservation in L0 Regularization: Critical Implementation Notes
-
-## Executive Summary
-
-The temperature parameter (β) in Hard Concrete distributions is **critical** for L0 regularization performance. Our analysis reveals that:
-
-1. **The original authors' implementation** (`distributions.py`) contains a bug where temperature is incorrectly dropped in deterministic mode
-2. **Our standalone implementations** (`calibration.py` and `sparse.py`) correctly preserve temperature in all modes
-3. **The gold standard** (`l0_louizos_improved_gate.py`) confirms temperature must always be preserved
-
-## The Temperature Bug in distributions.py
-
-### What We Found
-
-In `distributions.py` (from the authors' repository), the deterministic gates incorrectly drop temperature:
-
-```python
-# distributions.py line 134 - INCORRECT
-def _deterministic_gates(self) -> torch.Tensor:
-    probs = torch.sigmoid(self.qz_logits)  # ❌ Missing temperature!
-    gates = probs * (self.zeta - self.gamma) + self.gamma
-    return torch.clamp(gates, 0, 1)
-```
-
-### Why This Matters
-
-The temperature parameter controls the "hardness" of the concrete distribution:
-- **Lower temperature** (e.g., 0.1): More discrete, binary-like gates
-- **Higher temperature** (e.g., 2.0): Softer, more continuous gates
-
-Dropping temperature is equivalent to setting β=1, which fundamentally changes the distribution's behavior and can severely impact:
-- Convergence speed
-- Final sparsity levels
-- Model performance
-
-## Correct Implementations
-
-### Gold Standard (l0_louizos_improved_gate.py)
-
-Your validated implementation consistently uses temperature:
-
-```python
-# Sampling (line 55)
-X = (torch.log(u) - torch.log(1 - u) + log_alpha) / beta  # ✅
-
-# Deterministic (line 151)
-z_final = ((log_alpha / beta).sigmoid() * (zeta - gamma) + gamma).clamp(0, 1)  # ✅
-
-# Penalty computation (line 133)
-c = -beta * torch.log(torch.tensor(-gamma / zeta))  # ✅
-pi = torch.sigmoid(log_alpha + c)
-```
-
-### calibration.py (Standalone, Correct)
-
-```python
-# Sampling (lines 160-164)
-def _sample_gates(self) -> torch.Tensor:
-    u = torch.rand_like(self.log_alpha).clamp(eps, 1 - eps)
-    s = (torch.log(u) - torch.log(1 - u) + self.log_alpha) / self.beta  # ✅
-    s = torch.sigmoid(s)
-    s_bar = s * (self.zeta - self.gamma) + self.gamma
-    return s_bar.clamp(0, 1)
-
-# Deterministic (lines 167-170)  
-def get_deterministic_gates(self) -> torch.Tensor:
-    s = torch.sigmoid(self.log_alpha / self.beta)  # ✅
-    s_bar = s * (self.zeta - self.gamma) + self.gamma
-    return s_bar.clamp(0, 1)
-
-# Penalty (lines 233-237)
-def get_l0_penalty(self) -> torch.Tensor:
-    c = -self.beta * torch.log(torch.tensor(-self.gamma / self.zeta))  # ✅
-    pi = torch.sigmoid(self.log_alpha + c)
-    return pi.sum()
-```
-
-### sparse.py (Standalone, Correct)
-
-```python
-# Sampling (lines 113-120)
-def _sample_gates(self) -> torch.Tensor:
-    u = torch.rand_like(self.log_alpha).clamp(eps, 1 - eps)
-    X = (torch.log(u) - torch.log(1 - u) + self.log_alpha) / self.beta  # ✅
-    s = torch.sigmoid(X)
-    s_bar = s * (self.zeta - self.gamma) + self.gamma
-    return s_bar.clamp(0, 1)
-
-# Deterministic (lines 122-127)
-def get_deterministic_gates(self) -> torch.Tensor:
-    X = self.log_alpha / self.beta  # ✅
-    s = torch.sigmoid(X)
-    s_bar = s * (self.zeta - self.gamma) + self.gamma
-    return s_bar.clamp(0, 1)
-
-# Penalty (lines 172-182)
-def get_l0_penalty(self) -> torch.Tensor:
-    c = -self.beta * torch.log(torch.tensor(-self.gamma / self.zeta))  # ✅
-    pi = torch.sigmoid(self.log_alpha + c)
-    return pi.sum()
-```
-
-## Mathematical Correctness
-
-The Hard Concrete distribution requires temperature in three key places:
-
-### 1. Sampling (Training Mode)
-The concrete distribution samples as:
-```
-s = sigmoid((log(u) - log(1-u) + log_α) / β)
-```
-Temperature β controls the sharpness of the sigmoid, essential for the reparameterization trick.
-
-### 2. Deterministic Gates (Inference Mode)
-The mean of the distribution:
-```
-s = sigmoid(log_α / β)
-```
-**Must use the same temperature** as during training to maintain consistency.
-
-### 3. L0 Penalty Computation
-The probability of a gate being active:
-```
-P(gate > 0) = sigmoid(log_α - β * log(-γ/ζ))
-```
-Temperature affects the shift in log-odds space.
-
-## Practical Implications
-
-### For Survey Calibration (calibration.py)
-
-Your implementation with β=2/3 (from the paper) is correct. The temperature:
-- Provides the right balance between exploration and exploitation
-- Enables smooth gradient flow during optimization
-- Allows fine control over sparsity levels
-
-### For Sparse Linear Models (sparse.py)
-
-The preserved temperature ensures:
-- Proper feature selection dynamics
-- Stable convergence to sparse solutions
-- Consistency between training and inference
-
-## Recommendations
-
-1. **Continue using** `calibration.py` and `sparse.py` as standalone modules - they're correct
-2. **Avoid importing** from `distributions.py` until the temperature bug is fixed
-3. **Keep temperature** in the range [0.1, 2/3] for best results (lower = harder gates)
-4. **Document this issue** when sharing code to prevent others from using the buggy version
-
-## Key Takeaway
-
-Your instinct about never dropping temperature was absolutely correct. The temperature parameter is fundamental to the Hard Concrete distribution's behavior, and dropping it (as in the authors' `distributions.py`) is a significant bug that can severely impact model performance.
-
-Both `calibration.py` and `sparse.py` correctly implement the Hard Concrete distribution with proper temperature preservation, making them reliable for production use.
\ No newline at end of file
diff --git a/changelog.d/fix-remove-stale-temperature-doc.removed.md b/changelog.d/fix-remove-stale-temperature-doc.removed.md
new file mode 100644
index 0000000..8402e0a
--- /dev/null
+++ b/changelog.d/fix-remove-stale-temperature-doc.removed.md
@@ -0,0 +1,3 @@
+Delete `CRITICAL_TEMPERATURE_BUG.md`: the temperature bug it documented is
+now fixed in `HardConcrete._deterministic_gates`, and the doc also pointed
+at a nonexistent file (`l0_louizos_improved_gate.py`).

From a2779de7a2575a20eb8cb9434a58bb14b1f21cb5 Mon Sep 17 00:00:00 2001
From: Max Ghenis <mghenis@gmail.com>
Date: Fri, 17 Apr 2026 08:37:35 -0400
Subject: [PATCH 2/2] Switch CI lint job from black to ruff

#40 switched the repo's formatter from `black -l 79` to `ruff format`
(default 88-char line length) and updated the Makefile, but the reusable
lint workflow still invoked `lgeiger/black-action` with `. -l 79 --check`.
Since ruff-formatted files don't pass `black -l 79`, every PR's `lint`
check has been failing since #40.

Replace the black action with a `uvx ruff format --check .` run, matching
what `make format-check` would do locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .github/workflows/reusable_lint.yaml      | 10 +++++++---
 changelog.d/fix-ci-lint-use-ruff.fixed.md |  4 ++++
 2 files changed, 11 insertions(+), 3 deletions(-)
 create mode 100644 changelog.d/fix-ci-lint-use-ruff.fixed.md

diff --git a/.github/workflows/reusable_lint.yaml b/.github/workflows/reusable_lint.yaml
index f5fa02c..c95c33b 100644
--- a/.github/workflows/reusable_lint.yaml
+++ b/.github/workflows/reusable_lint.yaml
@@ -8,7 +8,11 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
-      - name: Check formatting
-        uses: "lgeiger/black-action@master"
+      - name: Set up Python 3.13
+        uses: actions/setup-python@v5
         with:
-          args: ". -l 79 --check"
\ No newline at end of file
+          python-version: "3.13"
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+      - name: Check formatting with ruff
+        run: uvx ruff format --check .
\ No newline at end of file
diff --git a/changelog.d/fix-ci-lint-use-ruff.fixed.md b/changelog.d/fix-ci-lint-use-ruff.fixed.md
new file mode 100644
index 0000000..797a4bf
--- /dev/null
+++ b/changelog.d/fix-ci-lint-use-ruff.fixed.md
@@ -0,0 +1,4 @@
+CI lint job now runs `ruff format --check .` instead of the retired
+`black -l 79 --check`. The Makefile was switched to `ruff format` in #40
+but the reusable lint workflow was not updated, so every PR's `lint` check
+was failing on files already formatted with ruff.