dCDH Phase 3a: placebo SE, non-binary treatment, parity SE assertions#300
dCDH Phase 3a: placebo SE, non-binary treatment, parity SE assertions#300
Conversation
- Add _compute_per_group_if_placebo_horizon() for multi-horizon placebo analytical SE and bootstrap SE (resolves Phase 2 deferral) - Remove binary treatment enforcement; support ordinal and continuous treatment via L_max path (5 downstream code path fixes) - Generalize multi-switch detection to count non-zero treatment diffs - Add SE/CI/placebo parity assertions for multi-horizon R golden values - Sweep stale Phase 2 comments, add TODO.md entries for remaining deferrals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add REGISTRY Note for non-binary treatment: paper basis, L_max requirement, float baselines, cohort definition, dose-magnitude pooling - Fix _drop_crossing_cells() docstring to match implementation: >1 treatment-change period (not "direction reversal") - Update _validate_and_aggregate_to_cells() and fit() docstrings to reflect non-binary support - Fix type annotations: Tuple[int,int,int] -> Tuple[float,int,int] for cohort keys - Add test_monotone_multi_step_dropped for 0->1->2 path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker Execution note: I did not run the changed tests in this sandbox because the available Python lacks Executive Summary
Methodology Cross-check against the cited source: the dynamic paper’s event-study estimator compares each switcher to groups with the same period-one treatment, so the baseline-matching part of the non-binary extension is directionally consistent with the paper. The placebo section separately defines
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P0: Add explicit ValueError for non-binary treatment + L_max=None.
P0: Change multi-horizon gate from L_max>=2 to L_max>=1 so per-group
DID_{g,1} path activates at L_max=1 (handles non-binary correctly).
Populate overall_att from per-group l=1 when per-period path yields NaN.
P1: Update REGISTRY Notes to document placebo SE as library extension
(paper's Theorem 1 is for DID_l, placebo IF applies same structure
to backward outcome differences). Update results docstring.
P2: Fix test_non_binary_treatment_accepted to assert ValueError,
add test_non_binary_treatment_with_lmax regression.
Update test_L_max_1 to test per-group path behavior (not cross-path
equality, since per-group and per-period are documented different
estimands).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Execution note: I could not run the changed tests in this sandbox because the available Python environment is missing Executive Summary
Methodology
Code Quality
Performance Maintainability Tech Debt
Security Documentation/Tests
Path to Approval
|
… N_S=0
P0: When L_max >= 1, always set overall_att from per-group DID_{g,1}
(not conditional on NaN). Fixes mixed binary/non-binary panels where
per-period N_S > 0 but excludes non-binary switches.
P1: Gate TWFE diagnostic and twowayfeweights() to binary-only treatment.
Emit warning on fit(), raise ValueError on standalone helper.
P1: Refactor _compute_dcdh_bootstrap() to skip scalar DID_M when
divisor_overall <= 0 but still process multi_horizon_inputs and
placebo_horizon_inputs. Fixes non-binary bootstrap path.
P2: Add regressions for mixed 0->1/0->2 panel at L_max=1, non-binary
bootstrap, and TWFE diagnostic skip on non-binary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Execution note: I could not run the changed tests in this sandbox because the available Python environment is missing Executive Summary
Methodology The paper itself does not make the placebo/non-binary direction suspicious here: it covers placebo estimators and extends beyond the binary case, while
Code Quality No findings beyond the two methodology/inference contract issues above. Performance No findings. Maintainability No findings beyond the documentation drift noted below. Tech Debt No findings. The new TODO entries are appropriate tracking for deferrable work, but they do not mitigate the two Security No findings. Documentation/Tests
Path to Approval
|
…nment
P1: Sync overall_* from event_study_effects[1] AFTER bootstrap propagation
so bootstrap SE/p/CI flow to top-level surface for L_max=1.
P1: Label overall estimand as "DID_1" (not "DID_M") when L_max=1 in
__repr__. Update REGISTRY and results docstrings to document L_max=1
as per-group DID_1 path.
P2: Add binary + non-binary L_max=1 bootstrap regressions asserting
overall_* == event_study_effects[1].
P3: Update TODO.md, REGISTRY L_max >= 2 references to L_max >= 1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Execution note: I could not run the changed tests in this sandbox because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…udy gate
P1: Propagate L_max=1 => DID_1 label through summary(), to_dataframe("overall"),
to_dataframe("joiners_leavers"), CV label, and __repr__. Show event study
table in summary() when L_max >= 1 (not just >= 2). Update normalized
effects error message to reference L_max >= 1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Execution note: I could not run the changed tests in this sandbox because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…tion
P1: For non-binary L_max >= 1, use per-group N_l as n_switcher_cells
(not binary N_S=0), count non-baseline obs as n_treated_obs, and
suppress joiner/leaver decomposition (binary concept). Ensures
DID_1 label pairs with consistent per-group metadata.
P2: Add test_nonbinary_lmax1_renderer_contract asserting DID_1 label
in __repr__/to_dataframe, n_switcher_cells > 0, joiners unavailable.
P3: Update to_dataframe() docstring for L_max >= 1 contract.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Needs changes. Highest unmitigated severity: Execution note: I could not run the targeted tests in this sandbox because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: Fix to_dataframe("joiners_leavers") overall row n_obs: use
n_treated_obs when joiners/leavers are suppressed (non-binary)
instead of summing zeroed joiner/leaver obs counts.
P2: Add joiners_leavers and summary() assertions to
test_nonbinary_lmax1_renderer_contract.
P3: Update to_dataframe() docstring: event_study/normalized available
at L_max >= 1 (was >= 2).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Highest unmitigated severity: Execution note: I could not run targeted reproductions in this sandbox because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…itcher guard
P1: Suppress joiner/leaver decomposition for ALL L_max >= 1 (not just
non-binary). The decomposition is a per-period DID_M concept that
can differ from the per-group DID_1 estimand on mixed panels.
P1: Add no-switcher guard after multi-horizon computation - raise
ValueError if N_l == 0 at horizon 1 (catches constant-treatment
non-binary panels).
P3: Update REGISTRY SE parity tolerance note (10%/15% for multi-horizon,
5% for single-horizon). Fix stale Step 12c comment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good. Highest unmitigated severity: Execution note: this is a static re-review. The sandbox does not have Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2: Add test_constant_nonbinary_treatment_raises (no-switcher guard)
and test_L_max_1_suppresses_joiner_leaver_decomposition (mixed-
direction binary L_max=1 renderer contract).
P3: Add REGISTRY Note documenting TWFE diagnostic as binary-only
(fit() warns+skips, twowayfeweights() raises on non-binary).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the sandbox does not have Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: Sync bootstrap_results.overall_se/ci/p_value from event_study
horizon 1 when L_max >= 1. The nested bootstrap object now matches
the top-level DID_1 surface.
P2: Use "Eligible switchers (N_1)" label in summary() when L_max >= 1
(was "Switcher cells (N_S)" which is a DID_M concept).
P2: Add test_L_max_1_bootstrap_results_overall_synced asserting
bootstrap_results.overall_* == event_study horizon 1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: the sandbox does not have Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…acts
P1: Narrow bootstrap_results.overall_* sync to L_max == 1 only.
When L_max >= 2, the cost-benefit delta overrides overall_*, so
bootstrap_results.overall_* stays on the scalar DID_M bootstrap.
P1: Gate binary-only Phase 1 artifacts on non-binary panels:
per_period_effects emptied, single-period placebo_* set to NaN,
placebo_available set to False. These are DID_M concepts that
don't apply to non-binary treatment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: Clear bootstrap_results.bootstrap_distribution to None when
L_max == 1. The stored DID_M distribution doesn't match the
DID_1 summary statistics. Per-horizon bootstrap stats are
accessible via event_study_ses/cis/p_values.
P2: Add bootstrap_distribution is None assertion to the L_max=1
bootstrap sync regression test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only: this sandbox does not have the runtime deps needed to execute the added tests ( Executive Summary
Methodology
Code Quality No material findings. Performance No material findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
|
…date placebo text
P2: Gate cost_benefit_delta computation to L_max >= 2 (was >= 1).
Delta is only meaningful with multiple horizons.
P2: Update n_switcher_cells docstring to document dual semantics:
N_S cell count for L_max=None, N_1 group count for L_max >= 1.
P3: Update stale placebo-bootstrap text in results docstrings and
REGISTRY: multi-horizon placebos now have valid bootstrap SE/CI/p
via placebo_horizon_ses/cis/p_values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only: this sandbox does not have Executive Summary
Methodology
Code Quality No material findings. Performance No material findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
|
Summary
_compute_per_group_if_placebo_horizon()helper (resolves Phase 2 deferral of placebo inference)L_max >= 1): remove binary enforcement gate, generalize within-cell-varying check, fixastype(int)truncation, rewrite multi-switch detectionDIDmultiplegtDYNgolden valuesMethodology references (required if estimator / math changes)
L_max >= 1because the per-period DID path uses binary joiner/leaver categorization; the multi-horizon per-groupDID_{g,l}handles non-binary correctly (documented in REGISTRY)(D_{g,1}, F_g, S_g)- consistent with the paper's general setupValidation
tests/test_chaisemartin_dhaultfoeuille.py(+8 new tests: placebo SE finite, placebo bootstrap SE, ordinal treatment, within-cell heterogeneity rejection, single large dose, multi-switch detection, monotone multi-step drop, normalized effects general formula),tests/test_chaisemartin_dhaultfoeuille_parity.py(+3 SE/placebo parity tests)Security / privacy
Generated with Claude Code