Skip to content

dCDH Phase 3a: placebo SE, non-binary treatment, parity SE assertions#300

Merged
igerber merged 14 commits intomainfrom
dcdh-phase3a-cleanup-nonbinary
Apr 13, 2026
Merged

dCDH Phase 3a: placebo SE, non-binary treatment, parity SE assertions#300
igerber merged 14 commits intomainfrom
dcdh-phase3a-cleanup-nonbinary

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 12, 2026

Summary

  • Add multi-horizon placebo analytical SE and bootstrap SE via new _compute_per_group_if_placebo_horizon() helper (resolves Phase 2 deferral of placebo inference)
  • Support non-binary (ordinal and continuous) treatment via the multi-horizon per-group path (L_max >= 1): remove binary enforcement gate, generalize within-cell-varying check, fix astype(int) truncation, rewrite multi-switch detection
  • Add per-horizon SE and placebo SE parity assertions against R DIDmultiplegtDYN golden values
  • Sweep stale Phase 2 comments, add TODO.md entries for remaining deferrals
  • Add REGISTRY.md documentation for non-binary treatment support

Methodology references (required if estimator / math changes)

  • Method name(s): ChaisemartinDHaultfoeuille (dCDH) - placebo influence function, non-binary treatment generalization
  • Paper / source link(s): de Chaisemartin & D'Haultfoeuille (2022, NBER WP 29873), Web Appendix Section 3.7.3 (variance formula), Section 2 (general treatment setup)
  • Any intentional deviations from the source (and why):
    • Non-binary treatment requires L_max >= 1 because the per-period DID path uses binary joiner/leaver categorization; the multi-horizon per-group DID_{g,l} handles non-binary correctly (documented in REGISTRY)
    • Groups with >1 treatment-change period are dropped (including monotone multi-step paths like 0->1->2) because the per-group building block attributes the full outcome change to the first switch (documented in REGISTRY)
    • Cohort definition under non-binary pools groups with different dose magnitudes but same (D_{g,1}, F_g, S_g) - consistent with the paper's general setup

Validation

  • Tests added/updated: tests/test_chaisemartin_dhaultfoeuille.py (+8 new tests: placebo SE finite, placebo bootstrap SE, ordinal treatment, within-cell heterogeneity rejection, single large dose, multi-switch detection, monotone multi-step drop, normalized effects general formula), tests/test_chaisemartin_dhaultfoeuille_parity.py (+3 SE/placebo parity tests)
  • 93 unit tests + 12 parity tests + 19 methodology tests all passing

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 2 commits April 12, 2026 18:43
- Add _compute_per_group_if_placebo_horizon() for multi-horizon placebo
  analytical SE and bootstrap SE (resolves Phase 2 deferral)
- Remove binary treatment enforcement; support ordinal and continuous
  treatment via L_max path (5 downstream code path fixes)
- Generalize multi-switch detection to count non-zero treatment diffs
- Add SE/CI/placebo parity assertions for multi-horizon R golden values
- Sweep stale Phase 2 comments, add TODO.md entries for remaining deferrals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add REGISTRY Note for non-binary treatment: paper basis, L_max requirement,
  float baselines, cohort definition, dose-magnitude pooling
- Fix _drop_crossing_cells() docstring to match implementation: >1
  treatment-change period (not "direction reversal")
- Update _validate_and_aggregate_to_cells() and fit() docstrings to reflect
  non-binary support
- Fix type annotations: Tuple[int,int,int] -> Tuple[float,int,int] for
  cohort keys
- Add test_monotone_multi_step_dropped for 0->1->2 path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⛔ Blocker

Execution note: I did not run the changed tests in this sandbox because the available Python lacks numpy/pandas; this review is based on diff and source inspection.

Executive Summary

  • One unmitigated P0 and one unmitigated P1 remain in the estimator changes.
  • Non-binary treatment is now accepted on code paths that still use the binary-only per-period logic, so unsupported inputs can yield wrong or missing estimates with no explicit guard.
  • Dynamic placebo SE / p-value / CI are now emitted, but the public methodology contract still says those fields should remain NaN.
  • The new TODO.md tracking is fine for deferred parity follow-up, but it does not mitigate either blocker.

Methodology

Cross-check against the cited source: the dynamic paper’s event-study estimator compares each switcher to groups with the same period-one treatment, so the baseline-matching part of the non-binary extension is directionally consistent with the paper. The placebo section separately defines DID^{pl}_ℓ as a testable implication of the assumptions, while the confidence-interval theorem and cohort-recentered variance formula are stated for DID_ℓ / δ_ℓ, not for the placebo estimand. (arxiv.org)

Code Quality

  • No additional code-quality finding beyond the methodology blockers above.

Performance

  • No material performance regressions stood out in the changed paths.

Maintainability

  • No independent maintainability blocker beyond the contract split already called out between code, registry, docstrings, and tests.

Tech Debt

  • Severity P3 — The new TODO tracking is acceptable for the mixed-direction SE parity follow-up, but it does not mitigate the P0 / P1 issues above because project rules do not allow TODO tracking to downgrade correctness or methodology defects. Impact: none by itself. Concrete fix: keep the TODOs, but resolve the blockers in code/registry rather than treating them as deferred. Relevant file: TODO.md:59.

Security

  • No security findings in the changed files.

Documentation/Tests

Path to Approval

  1. Enforce the non-binary gate before the Phase 1 per-period aggregation, and either route L_max=1 through the per-group DID_{g,1} machinery or reject L_max=1 until that path exists.
  2. Resolve the placebo inference contract: either revert dynamic-placebo inference to NaN, or document the new placebo IF/SE surface as an explicit methodology note/deviation and update every public docstring/result surface to match.
  3. Add regressions for non-binary + L_max=None, non-binary + L_max=1, and mixed binary/non-binary switch panels so the unsupported fallback cannot silently reappear.

P0: Add explicit ValueError for non-binary treatment + L_max=None.
P0: Change multi-horizon gate from L_max>=2 to L_max>=1 so per-group
    DID_{g,1} path activates at L_max=1 (handles non-binary correctly).
    Populate overall_att from per-group l=1 when per-period path yields NaN.
P1: Update REGISTRY Notes to document placebo SE as library extension
    (paper's Theorem 1 is for DID_l, placebo IF applies same structure
    to backward outcome differences). Update results docstring.
P2: Fix test_non_binary_treatment_accepted to assert ValueError,
    add test_non_binary_treatment_with_lmax regression.
    Update test_L_max_1 to test per-group path behavior (not cross-path
    equality, since per-group and per-period are documented different
    estimands).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: df6db2c415fc4ab8ffb4d0624253bbf30a066686


Overall Assessment
⛔ Blocker

Execution note: I could not run the changed tests in this sandbox because the available Python environment is missing numpy and pytest; this review is based on diff and source inspection.

Executive Summary

  • The prior placebo-contract blocker is resolved: REGISTRY.md and the results docstrings now document multi-horizon placebo SE/bootstrap as an intentional library extension, so I am not counting that issue again.
  • The previous non-binary blocker is only partially fixed: mixed 0→1 and 0→2 panels with L_max=1 can still return a top-level overall_att / overall_se built from the binary-only per-period path, even though the supported general-design estimand is the per-group DID_{g,1} path. (nber.org)
  • [Newly identified] The PR widens non-binary support into twfe_diagnostic / twowayfeweights(), but that diagnostic still hard-codes treated cells as d_gt == 1; the reference TwoWayFEWeights API treats D as the treatment variable and its summary measures over treated (g,t) cells generally, not only D == 1. (rdrr.io)
  • [Newly identified] Pure non-binary panels with n_bootstrap > 0 still lose all horizon/bootstrap inference because bootstrap execution returns early on the binary-only N_S divisor before processing the valid multi-horizon inputs. The dynamic paper explicitly allows non-binary, non-absorbing treatments, so this new supported input class needs its inference path wired end to end. (nber.org)
  • The new tests cover pure non-binary panels, but they still miss the mixed 0→1 / 0→2 L_max=1 case and the non-binary bootstrap path, so the remaining correctness issues are not protected.

Methodology
Previous placebo-inference documentation issues are resolved and do not block approval.

  • Severity P0 — Previous blocker only partially resolved: mixed binary/non-binary panels with L_max=1 still put the wrong estimand on the top-level results surface. Impact: _compute_per_period_dids() still defines switchers only through 0→1 / 1→0, Step 10 computes overall_att from that binary-only subset whenever N_S > 0, and Step 20 only swaps in multi_horizon_inference[1] when overall_att is NaN. On a supported mixed panel, results.overall_att / overall_se can silently exclude the non-binary switchers while event_study_effects[1] includes them. That contradicts both the PR’s own non-binary support claim and the dynamic paper’s general-design DID_{g,\ell} construction, which compares switchers to groups with the same period-one treatment that have not changed yet by the relevant horizon. Concrete fix: when any treatment value outside {0,1} is present and L_max=1, bypass the per-period DID_M path entirely and set the top-level overall_* fields from multi_horizon_inference[1] for all such panels, or reject L_max=1 until that contract is explicitly documented. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:2056, diff_diff/chaisemartin_dhaultfoeuille.py:911, diff_diff/chaisemartin_dhaultfoeuille.py:1577. (nber.org)
  • Severity P1 — [Newly identified] Non-binary support is now exposed on the TWFE diagnostic path, but the diagnostic math still assumes binary treatment. Impact: the shared validator now accepts non-binary inputs, fit() computes twfe_* by default on that input, and _compute_twfe_diagnostic() normalizes weights, counts negative weights, and computes sigma_fe using treated_mask = d_arr == 1. Cells with dose 2, 3, etc. are dropped from the very diagnostic the PR now blesses via test_twowayfeweights_accepts_non_binary_treatment, so results.twfe_* and standalone twowayfeweights() can silently misstate the diagnostic surface. Concrete fix: either restore a binary-only guard for twfe_diagnostic / twowayfeweights() or implement the general-treatment formula consistent with the reference TwoWayFEWeights package. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:202, diff_diff/chaisemartin_dhaultfoeuille.py:611, diff_diff/chaisemartin_dhaultfoeuille.py:3468, tests/test_chaisemartin_dhaultfoeuille.py:1674. (rdrr.io)

Code Quality

  • Severity P1 — [Newly identified] The new non-binary path is not propagated through bootstrap orchestration. Impact: fit() correctly builds multi_horizon_inputs and placebo_horizon_inputs, but still calls _compute_dcdh_bootstrap() with divisor_overall=N_S; on pure non-binary panels N_S == 0, the mixin returns immediately with all-NaN bootstrap results before it reaches the event-study or placebo-horizon bootstrap loops. Supported non-binary panels therefore cannot obtain the requested bootstrap SE/CI/p-values for DID_l or dynamic placebos. Concrete fix: make _compute_dcdh_bootstrap() skip only the scalar overall DID_M bootstrap when divisor_overall <= 0, while still processing multi_horizon_inputs and placebo_horizon_inputs if they were provided. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:1521, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:162, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:253. (nber.org)

Performance
No findings in the changed paths.

Maintainability
No independent maintainability finding beyond the contract/propagation issues above.

Tech Debt

  • Severity P3 — The new TODO entries are appropriately tracked deferred work and do not block merge by project policy. Impact: none by itself; the single-period placebo-SE limitation and mixed-direction parity follow-up are now explicitly recorded. Concrete fix: no action required beyond keeping the entries tracked. Ref: TODO.md:59.

Security
No security findings in the changed files.

Documentation/Tests

  • Severity P2 — The new tests still miss the two branches that matter for this PR’s risk surface, and one new test encodes the wrong contract. Impact: there is no regression for a mixed 0→1 / 0→2 panel at L_max=1, no non-binary panel with n_bootstrap > 0, and test_twowayfeweights_accepts_non_binary_treatment currently blesses the diagnostic bug above instead of catching it. Concrete fix: add a mixed-panel L_max=1 regression that asserts the intended top-level overall_* contract on non-binary input, add a pure non-binary bootstrap regression expecting finite event_study_ses[1] (and placebo bootstrap when available), and replace the standalone-helper acceptance test with whichever contract you actually support. Refs: tests/test_chaisemartin_dhaultfoeuille.py:133, tests/test_chaisemartin_dhaultfoeuille.py:1674, tests/test_chaisemartin_dhaultfoeuille.py:1823, tests/test_chaisemartin_dhaultfoeuille.py:1988.
  • Severity P3 — The parity-tolerance documentation is now stale. Impact: REGISTRY.md still says pure-direction SE parity uses a 5% tolerance, while the updated tests now use 10% generally and 15% on the long-panel scenario. Concrete fix: update the registry note to match the new tolerated gap, or tighten the tests back to the documented tolerance if 5% is still the intended contract. Refs: docs/methodology/REGISTRY.md:604, tests/test_chaisemartin_dhaultfoeuille_parity.py:235.

Path to Approval

  1. For any non-binary input with L_max=1, stop using the per-period binary DID_M aggregate as the top-level result. Either map overall_att / overall_se / overall_p_value / overall_conf_int to multi_horizon_inference[1] unconditionally, or reject L_max=1 until that contract is explicit.
  2. Prevent non-binary data from flowing into binary-only TWFE diagnostics. Either reject non-binary twfe_diagnostic / twowayfeweights() calls with a clear error, or implement the general-treatment formulas consistent with TwoWayFEWeights.
  3. Refactor _compute_dcdh_bootstrap() so divisor_overall <= 0 only disables the scalar DID_M bootstrap; it should still compute event_study_ses / placebo_horizon_ses when those inputs exist.
  4. Add regressions for: mixed 0→1 + 0→2 at L_max=1; pure 0→2 with n_bootstrap > 0; and the intended non-binary TWFE-diagnostic contract.

… N_S=0

P0: When L_max >= 1, always set overall_att from per-group DID_{g,1}
    (not conditional on NaN). Fixes mixed binary/non-binary panels where
    per-period N_S > 0 but excludes non-binary switches.
P1: Gate TWFE diagnostic and twowayfeweights() to binary-only treatment.
    Emit warning on fit(), raise ValueError on standalone helper.
P1: Refactor _compute_dcdh_bootstrap() to skip scalar DID_M when
    divisor_overall <= 0 but still process multi_horizon_inputs and
    placebo_horizon_inputs. Fixes non-binary bootstrap path.
P2: Add regressions for mixed 0->1/0->2 panel at L_max=1, non-binary
    bootstrap, and TWFE diagnostic skip on non-binary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 966a00d1044692b4faaa5cc12ae7c91d3ab13a4c


Overall Assessment

⚠️ Needs changes

Execution note: I could not run the changed tests in this sandbox because the available Python environment is missing numpy; this review is based on diff and source inspection.

Executive Summary

  • Previously flagged blockers look resolved: mixed non-binary L_max=1 panels now route the top-level effect through horizon 1, non-binary TWFE diagnostics are guarded, and the bootstrap mixin no longer returns early when N_S == 0. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:L1585-L1600, diff_diff/chaisemartin_dhaultfoeuille.py:L611-L622, diff_diff/chaisemartin_dhaultfoeuille.py:L3671-L3679, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L163-L190.
  • The high-level direction of the PR is consistent with the source material: the paper discusses placebo estimators and explicitly notes that its results can extend beyond the binary treatment case to ordered discrete treatments. The remaining issues are about the Python contract around L_max=1, not about widening support per se. citeturn13open0
  • Severity P1: L_max=1 now changes the top-level overall estimand from Phase 1 DID_M to per-group DID_1, but REGISTRY and the public results surface still describe that top-level quantity as DID_M.
  • Severity P1: when n_bootstrap > 0, Step 20 overwrites the top-level overall_* inference for L_max=1 with pre-bootstrap analytical horizon-1 values, so results.overall_* can silently disagree with results.event_study_effects[1].
  • Severity P2/P3: tests and docs still miss the new L_max=1 public-surface contract and contain stale L_max/placebo/parity wording.

Methodology

The paper itself does not make the placebo/non-binary direction suspicious here: it covers placebo estimators and extends beyond the binary case, while REGISTRY.md explicitly labels placebo SE as a documented library extension. citeturn13open0

Code Quality

No findings beyond the two methodology/inference contract issues above.

Performance

No findings.

Maintainability

No findings beyond the documentation drift noted below.

Tech Debt

No findings. The new TODO entries are appropriate tracking for deferrable work, but they do not mitigate the two P1 issues above.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Pick a single L_max=1 overall-estimand contract. If overall_* should remain Phase 1 DID_M, stop overriding from multi_horizon_inference[1]; if it should be DID_1, update REGISTRY and every top-level label/docstring/output surface to say so.
  2. Preserve bootstrap inference on that same L_max=1 overall surface. After bootstrap propagation, make overall_se, overall_t_stat, overall_p_value, and overall_conf_int match event_study_effects[1] whenever L_max=1.
  3. Add regressions for both a binary and a non-binary L_max=1 panel with n_bootstrap > 0 asserting that the top-level overall surface and horizon-1 event-study surface are identical.

…nment

P1: Sync overall_* from event_study_effects[1] AFTER bootstrap propagation
    so bootstrap SE/p/CI flow to top-level surface for L_max=1.
P1: Label overall estimand as "DID_1" (not "DID_M") when L_max=1 in
    __repr__. Update REGISTRY and results docstrings to document L_max=1
    as per-group DID_1 path.
P2: Add binary + non-binary L_max=1 bootstrap regressions asserting
    overall_* == event_study_effects[1].
P3: Update TODO.md, REGISTRY L_max >= 2 references to L_max >= 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e59840d2eefeb68a503e00c10eacf4cf4ee1f4d4


Overall Assessment

⚠️ Needs changes

Highest unmitigated severity: P1.

Execution note: I could not run the changed tests in this sandbox because pytest is not installed.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings beyond the L_max=1 public-surface inconsistency above.

Tech Debt

  • Severity P3 (informational). Impact: the PR adds the remaining deferrable items to TODO.md, which is the right mitigation under this repo’s review policy. Concrete fix: none required for approval. Refs: TODO.md:L59-L60

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Make the L_max=1 public surface internally consistent with the implemented/registered estimand: update summary(), to_dataframe("overall"), and to_dataframe("joiners_leavers") so the overall row is DID_1, not DID_M, and either use horizon-1 counts/metadata for that row or expose DID_M separately.
  2. Add L_max=1 renderer regressions on both binary and mixed/non-binary panels, with and without bootstrap, asserting the chosen contract for summary(), to_dataframe("overall"), and to_dataframe("joiners_leavers").

…udy gate

P1: Propagate L_max=1 => DID_1 label through summary(), to_dataframe("overall"),
    to_dataframe("joiners_leavers"), CV label, and __repr__. Show event study
    table in summary() when L_max >= 1 (not just >= 2). Update normalized
    effects error message to reference L_max >= 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bb745703c36824d555996be217a4ecc2944523c9


Overall Assessment

⚠️ Needs changes

Highest unmitigated severity: P1.

Execution note: I could not run the changed tests in this sandbox because pytest and pandas are not installed.

Executive Summary

  • The previous L_max=1 public-surface P1 is only partially fixed: the top-level label now says DID_1, but the decomposition/metadata surfaces still come from the old binary DID_M path.
  • On mixed or pure non-binary panels, that can silently pair a finite DID_1 with binary-only counts such as Switcher cells (N_S)=0, Treated observations=0, or joiners_leavers counts that exclude valid 0->2 switchers.
  • I do not count the new non-binary DID_l path or the dynamic-placebo SE work as undocumented methodology defects. The dynamic companion paper is written for general treatment paths and placebo estimands, and the extra Python restrictions/extensions are now documented in REGISTRY.md as notes/deviations. (nber.org)
  • The new TODO.md rows correctly track the remaining non-blocking deferrals around single-period placebo SE and mixed-direction SE parity.
  • Tests still miss L_max=1 renderer/metadata assertions, which is where the remaining contract bug lives.

Methodology

Code Quality

  • No additional findings beyond the L_max=1 public-contract issue above.

Performance

  • No findings.

Maintainability

  • No additional findings beyond the public-contract issue above.

Tech Debt

  • Severity P3. Impact: the PR adds the remaining deferrable dCDH items to TODO.md, so the single-period placebo-SE gap and the mixed-direction SE-parity limitation are properly tracked and non-blocking under this repo’s policy. Concrete fix: none required for approval. Refs: TODO.md:L59, TODO.md:L60

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Make the L_max=1 public surface internally consistent: do not label the overall row/block as DID_1 while still attaching DID_M/binary-only decomposition metadata to it.
  2. Recompute or suppress the binary-only metadata on non-binary fits: n_treated_obs, n_switcher_cells, and the joiners_leavers row counts cannot continue to come from D==1 and 0->1 / 1->0 masks when non-binary switchers are supported.
  3. Add L_max=1 renderer regressions on mixed and non-binary panels for summary() and to_dataframe("joiners_leavers"), asserting label/count consistency with event_study_effects[1].

…tion

P1: For non-binary L_max >= 1, use per-group N_l as n_switcher_cells
    (not binary N_S=0), count non-baseline obs as n_treated_obs, and
    suppress joiner/leaver decomposition (binary concept). Ensures
    DID_1 label pairs with consistent per-group metadata.
P2: Add test_nonbinary_lmax1_renderer_contract asserting DID_1 label
    in __repr__/to_dataframe, n_switcher_cells > 0, joiners unavailable.
P3: Update to_dataframe() docstring for L_max >= 1 contract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 56b2526cfbb61fedb52d1cbd73aafd7a7b40f995


Overall Assessment

Needs changes.

Highest unmitigated severity: P1.

Execution note: I could not run the targeted tests in this sandbox because pytest and pandas are not installed.

Executive Summary

  • The previous L_max=1 blocker is only partially fixed. The PR now correctly promotes the top-level overall estimand to DID_1, but the joiners/leavers reporting surface is still not fully consistent with that estimand.
  • On mixed-direction binary L_max=1 fits, summary() and to_dataframe("joiners_leavers") still expose DID_+ / DID_- from the old per-period DID_M path even though the reported overall row is now DID_1.
  • On non-binary L_max=1 fits, the PR suppresses DID_+ / DID_-, but the overall joiners_leavers row still computes n_obs from zeroed joiner/leaver counts, so it can emit a finite DID_1 with n_obs=0.
  • I did not find an undocumented source-material defect in the new non-binary control-pool logic or the new multi-horizon placebo SE itself. The dynamic paper explicitly allows treatments that can increase, stay constant, or decrease over time, and defines placebo estimators as backward analogues built off the same baseline-matched control logic. citeturn1search0turn1search13
  • The new tests improve numeric coverage, but they still miss the renderer/dataframe cases where the remaining L_max=1 contract bug lives.

Methodology

Code Quality

  • No additional findings beyond the L_max=1 public-surface contract issue above.

Performance

  • No findings.

Maintainability

  • No additional findings beyond the L_max=1 surface inconsistency above.

Tech Debt

  • Severity P3. Impact: the remaining single-period placebo-SE limitation and mixed-direction SE-parity limitation are now explicitly tracked in TODO.md, so they are non-blocking under the repo policy. Concrete fix: none required for approval. References: TODO.md:L59, TODO.md:L60.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Make the L_max=1 public reporting surface internally consistent: do not present DID_1 as the overall row while still showing Phase 1 DID_M joiner/leaver views as if they decompose it.
  2. For non-binary L_max=1 fits, stop building the joiners_leavers overall row’s n_obs from n_joiner_obs + n_leaver_obs; either compute a DID_1-appropriate count or mark that row/surface unavailable.
  3. Add renderer/dataframe regressions for summary() and to_dataframe("joiners_leavers") on:
    1. a mixed-direction binary L_max=1 panel where DID_1 can differ from DID_M;
    2. a non-binary L_max=1 panel, asserting the surface does not emit n_obs=0 for a finite overall estimate.

P1: Fix to_dataframe("joiners_leavers") overall row n_obs: use
    n_treated_obs when joiners/leavers are suppressed (non-binary)
    instead of summing zeroed joiner/leaver obs counts.
P2: Add joiners_leavers and summary() assertions to
    test_nonbinary_lmax1_renderer_contract.
P3: Update to_dataframe() docstring: event_study/normalized available
    at L_max >= 1 (was >= 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 46273b6045dae501503ec67711c7658ab27c0097


Overall Assessment
Needs changes.

Highest unmitigated severity: P1.

Execution note: I could not run targeted reproductions in this sandbox because numpy is not installed, so this re-review is based on static inspection of the diff and the surrounding dCDH code.

Executive Summary

  • The previous L_max=1 blocker is only partially fixed. The PR fixes the non-binary renderer/dataframe contract, but mixed-direction binary L_max=1 fits still expose DID_1 as the overall estimand while leaving DID_+ / DID_- on the old per-period path.
  • The prior non-binary joiners_leavers n_obs=0 bug appears fixed: non-binary L_max=1 now suppresses the binary-only decomposition and gives the overall row a positive count.
  • The new non-binary branch introduces a separate edge-case bug: constant-treatment non-binary panels with L_max >= 1 can now fall through to an all-NaN results object instead of raising the documented “no switchers” ValueError.
  • I did not find an unmitigated paper/registry defect in the new non-binary support or placebo-horizon SE path itself.
  • Test coverage improved, but it still misses the mixed-direction binary L_max=1 renderer/dataframe path and the new non-binary no-switcher path.

Methodology

Code Quality

  • Severity P1. Impact: the new non-binary L_max >= 1 path no longer enforces the documented “no switchers” failure mode. The old N_S == 0 guard is now bypassed for all non-binary panels when L_max is set, but there is no replacement check based on actual treatment changes. A constant-treatment non-binary panel therefore falls through _compute_multi_horizon_dids() with N_l = 0, then fit() still builds event_study_effects[1] and syncs the top-level overall_* fields from that all-NaN entry instead of raising the documented ValueError. This is missing empty-result handling on a new code path. Concrete fix: after switch metadata is computed, raise if no group ever changes treatment, even when L_max >= 1 and treatment is non-binary; reserve the current NaN/warning behavior for cases where switchers exist but a horizon is not estimable. References: diff_diff/chaisemartin_dhaultfoeuille.py#L922, diff_diff/chaisemartin_dhaultfoeuille.py#L1046, diff_diff/chaisemartin_dhaultfoeuille.py#L2480, diff_diff/chaisemartin_dhaultfoeuille.py#L1588, docs/methodology/REGISTRY.md#L577.

Performance

  • No findings.

Maintainability

  • No findings beyond the contract issues above.

Tech Debt

  • Severity P3. Impact: the remaining single-period placebo-SE limitation and the mixed-direction SE-parity limitation are now explicitly tracked in TODO.md, so they are non-blocking under the repo’s deferred-work policy. Concrete fix: none required for approval. References: TODO.md#L59, TODO.md#L60.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new tests cover the non-binary L_max=1 renderer path and mixed binary/non-binary numeric path, but they still do not pin the unresolved mixed-direction binary L_max=1 renderer/dataframe contract or the new non-binary no-switcher edge case. That leaves both live P1 paths unguarded. Concrete fix: add one mixed-direction binary regression asserting that summary() and to_dataframe("joiners_leavers") do not present legacy DID_+ / DID_- as a decomposition of a DID_1 overall row, and add one constant-treatment non-binary regression with L_max=1 asserting the documented no-switchers ValueError. References: tests/test_chaisemartin_dhaultfoeuille.py#L2354, tests/test_chaisemartin_dhaultfoeuille.py#L2425.
  • Severity P3. Impact: REGISTRY still says pure-direction SE parity uses a 5% tolerance, but the PR relaxes the parity tests to 10% generally and 15% for the long-panel scenario. That is documentation drift, not a blocker. Concrete fix: either update the REGISTRY parity note or tighten the test tolerances back to the documented thresholds after explaining the new gap. References: docs/methodology/REGISTRY.md#L604, tests/test_chaisemartin_dhaultfoeuille_parity.py#L232.

Path to Approval

  1. Make the mixed-direction binary L_max=1 public surface estimand-consistent by either suppressing joiners_* / leavers_* and the joiners-leavers renderer/dataframe, or recomputing a DID_1-consistent decomposition.
  2. Restore the documented no-switchers error for non-binary L_max >= 1 using actual treatment-change metadata rather than the binary N_S proxy.
  3. Add regressions for both cases above: mixed-direction binary summary() / to_dataframe("joiners_leavers"), and constant-treatment non-binary L_max=1 expecting ValueError.

…itcher guard

P1: Suppress joiner/leaver decomposition for ALL L_max >= 1 (not just
    non-binary). The decomposition is a per-period DID_M concept that
    can differ from the per-group DID_1 estimand on mixed panels.
P1: Add no-switcher guard after multi-horizon computation - raise
    ValueError if N_l == 0 at horizon 1 (catches constant-treatment
    non-binary panels).
P3: Update REGISTRY SE parity tolerance note (10%/15% for multi-horizon,
    5% for single-horizon). Fix stale Step 12c comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 79a01abb790981cc4febb262331c1c1812dcce02


Overall Assessment

✅ Looks good.

Highest unmitigated severity: P2.

Execution note: this is a static re-review. The sandbox does not have pytest, numpy, or pandas, so I could not run the added tests or do numeric reproductions.

Executive Summary

  • The two prior P1 blockers appear resolved in code: L_max=1 now routes the public surface through DID_1 and suppresses the old joiner/leaver decomposition on that surface, and the non-binary L_max>=1 path now raises on no-switcher data instead of surfacing an all-NaN result.
  • I did not find an unmitigated methodology mismatch in the new placebo-horizon SE or non-binary support. Both are now documented in REGISTRY.md, with placebo-horizon SE explicitly labeled as a library extension.
  • The remaining known limitations are properly tracked in TODO.md: single-period placebo DID_M^pl still has NaN SE, and mixed-direction SE parity remains intentionally unasserted.
  • Remaining gaps are lower severity: the exact regression tests requested in the prior review are still missing, and the new non-binary TWFE-diagnostic behavior is not fully reflected in the registry/docstrings.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity P3. Impact: The remaining single-period placebo-SE limitation and mixed-direction SE-parity limitation are now explicitly tracked in TODO.md, so they are non-blocking under the repo’s deferred-work policy. Concrete fix: none. TODO.md#L59, TODO.md#L60

Security

  • No findings.

Documentation/Tests

P2: Add test_constant_nonbinary_treatment_raises (no-switcher guard)
    and test_L_max_1_suppresses_joiner_leaver_decomposition (mixed-
    direction binary L_max=1 renderer contract).
P3: Add REGISTRY Note documenting TWFE diagnostic as binary-only
    (fit() warns+skips, twowayfeweights() raises on non-binary).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 285aee9d82e4eb2fd04d0b5ea0a11b29e7c45fc9


Overall Assessment

⚠️ Needs changes

Static review only: the sandbox does not have pytest, numpy, or pandas, so I could not execute the added tests.

Executive Summary

  • The prior re-review gaps are largely addressed: this PR adds the missing constant-nonbinary L_max=1 failure test, the L_max=1 joiner/leaver suppression regression, and the non-binary TWFE diagnostic note.
  • Severity P1: on the new L_max=1 surface, the top-level overall_* fields are remapped to DID_1, but results.bootstrap_results.overall_* is still computed from the old scalar DID_M bootstrap branch. That leaves a public inference object reporting the wrong SE/CI/p-value for the reported overall estimand on the newly added path.
  • I did not find an unmitigated source-material mismatch in the non-binary DID_{g,l} route, the documented multi-switch drop rule, or the placebo-horizon SE extension; those choices are now explicitly disclosed in the registry and are compatible with the paper’s per-group event-study / placebo constructions. citeturn2view0turn4view0turn2view1turn2view2
  • The new TODO.md entries correctly track the remaining non-blocking deferrals: single-period DID_M^pl still has no SE, and mixed-direction SE parity is intentionally not asserted.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity P3. Impact: the remaining single-period placebo-SE limitation and mixed-direction SE-parity limitation are properly tracked under the repo’s deferred-work policy, so they are non-blocking here. Concrete fix: none. Relevant tracking: TODO.md:59, TODO.md:60.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new tests correctly pin the top-level overall_* == event_study_effects[1] contract, but they never assert the nested bootstrap_results.overall_* contract on L_max=1, which is why the P1 inference mismatch above slips through. Concrete fix: extend the new L_max=1 bootstrap tests to assert bootstrap_results.overall_* too, or assert that those fields are intentionally empty when L_max=1. Relevant tests: tests/test_chaisemartin_dhaultfoeuille.py:1823, tests/test_chaisemartin_dhaultfoeuille.py:2459.

Path to Approval

  1. Make the nested bootstrap surface consistent with the new L_max=1 overall estimand: either sync bootstrap_results.overall_* to horizon-1 event-study bootstrap results, or null those fields and document that event_study_ses[1] is the only valid bootstrap source for DID_1.
  2. Add a binary mixed-direction L_max=1 regression that checks the nested bootstrap object, not just the top-level overall_* fields.
  3. Add the same nested-bootstrap regression on a non-binary L_max=1 panel.

P1: Sync bootstrap_results.overall_se/ci/p_value from event_study
    horizon 1 when L_max >= 1. The nested bootstrap object now matches
    the top-level DID_1 surface.
P2: Use "Eligible switchers (N_1)" label in summary() when L_max >= 1
    (was "Switcher cells (N_S)" which is a DID_M concept).
P2: Add test_L_max_1_bootstrap_results_overall_synced asserting
    bootstrap_results.overall_* == event_study horizon 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8813170f4e76c7dae53a1cd8ab76ebba6e361d78


Overall Assessment

⚠️ Needs changes

Static review only: the sandbox does not have pytest, numpy, or pandas, so I could not execute the added tests.

Executive Summary

  • The prior L_max=1 scalar bootstrap_results.overall_* mismatch is addressed, and the new REGISTRY.md / TODO.md notes correctly mitigate the documented deferrals.
  • I did not find an unmitigated source-material mismatch in the documented non-binary DID_{g,l} route, the monotone multi-step drop rule, or the dynamic placebo SE extension; those are now explicitly registered.
  • Non-binary support did not propagate to the binary-only Phase 1 result surfaces: per_period_effects and single-period placebo_effect / placebo_available still run the hard-coded 0/1 path and can silently omit non-binary switchers.
  • The new nested-bootstrap sync block fixes L_max=1, but it also rewires bootstrap_results.overall_* for all L_max >= 1, so L_max >= 2 fits now expose DID_1 in the nested bootstrap object while top-level overall_* is delta.
  • Test coverage improved materially, but it still misses both of those public-contract failures.

Methodology

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No material findings beyond the public-contract ambiguity noted above.

Tech Debt

  • Severity P3. Impact: the remaining single-period placebo-SE limitation and mixed-direction SE-parity limitation are properly tracked in TODO.md:59 and TODO.md:60, so they are non-blocking under the repo policy. Concrete fix: none in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new regressions pin L_max=1 nested-bootstrap syncing at tests/test_chaisemartin_dhaultfoeuille.py:1866 and general non-binary success paths at tests/test_chaisemartin_dhaultfoeuille.py:2282, but there is still no coverage for non-binary per_period_effects / top-level placebo_effect, and no L_max >= 2 regression for the nested bootstrap contract. Those gaps allowed the two P1 issues above to slip through. Concrete fix: add a mixed 0->1 / 0->2 panel with placebo=True asserting the binary-only Phase 1 outputs are suppressed or warned off, and add an L_max=3, n_bootstrap>0 regression that asserts the intended meaning of bootstrap_results.overall_* and bootstrap_distribution.

Path to Approval

  1. On non-binary fits, stop surfacing binary-only Phase 1 result artifacts unless they are explicitly generalized and documented. At minimum, gate per_period_effects, the single-period placebo_* fields, and any related binary-only bootstrap fields behind a warning/NaN contract.
  2. Narrow the nested-bootstrap overall_* sync to L_max == 1 and define an explicit L_max >= 2 contract for results.bootstrap_results. Do not leave bootstrap_results.overall_* pointing at DID_1 when top-level overall_* is delta.
  3. Add regressions for both contracts: one mixed binary/non-binary placebo/per-period case, and one multi-horizon bootstrap case (L_max >= 2) covering the nested bootstrap object.

…acts

P1: Narrow bootstrap_results.overall_* sync to L_max == 1 only.
    When L_max >= 2, the cost-benefit delta overrides overall_*, so
    bootstrap_results.overall_* stays on the scalar DID_M bootstrap.
P1: Gate binary-only Phase 1 artifacts on non-binary panels:
    per_period_effects emptied, single-period placebo_* set to NaN,
    placebo_available set to False. These are DID_M concepts that
    don't apply to non-binary treatment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6f26709a57ad725ba465637b3ddbcc6bd75d2df1


Overall Assessment

Needs changes

Static review only: pytest is not installed in this sandbox, so I could not execute the added tests.

Executive Summary

  • The prior non-binary Phase 1 artifact issue looks addressed: non-binary L_max >= 1 fits now suppress binary-only per_period_effects / top-level placebo surfaces and skip the TWFE diagnostic.
  • The prior bootstrap-contract issue is only partially fixed: bootstrap_results.overall_* is now re-synced only for L_max == 1, but bootstrap_results.bootstrap_distribution still keeps the old DID_M bootstrap target.
  • I did not find an unmitigated source-material mismatch in the documented non-binary general-treatment route, multi-switch drop rule, or placebo-horizon extension itself.
  • Test coverage improved materially for placebo SEs, non-binary support, and L_max=1, but it still does not pin the nested bootstrap_distribution contract.

Methodology

  • Severity P1. Impact: results.bootstrap_results is internally inconsistent when L_max == 1. The bootstrap mixin always stores bootstrap_distribution from the scalar Phase 1 DID_M path at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:163, but fit() later overwrites only bootstrap_results.overall_se, overall_ci, and overall_p_value with horizon-1 DID_1 values at diff_diff/chaisemartin_dhaultfoeuille.py:1661. The public docstring still describes bootstrap_distribution as the stored distribution for the overall target at diff_diff/chaisemartin_dhaultfoeuille_results.py:94. On panels where documented DID_1 != DID_M, this silently exposes the wrong bootstrap distribution for the reported overall estimand. Concrete fix: when L_max == 1, either replace bootstrap_distribution with the horizon-1 event-study distribution, or clear/rename that field so it cannot be read as matching overall_*.
  • Severity P3. Impact: I did not find an unmitigated source-material defect in the non-binary general-treatment path or the placebo-horizon estimator itself. The cited dynamic paper explicitly allows nonbinary, non-absorbing treatments and defines multi-horizon DID / normalized / cost-benefit estimands; its placebo estimators are introduced separately as assumption tests, while the registry now documents placebo SE as a library extension. Concrete fix: none. See docs/methodology/REGISTRY.md:471, docs/methodology/REGISTRY.md:474, and docs/methodology/REGISTRY.md:546. (nber.org)

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

Tech Debt

  • Severity P3. Impact: the remaining single-period placebo-SE limitation and mixed-direction parity-SE limitation are properly tracked in TODO.md:59 and TODO.md:60, so they are non-blocking under the repo policy. Concrete fix: none in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new regressions pin overall_* == event_study_effects[1] and the nested overall_* sync for L_max=1 at tests/test_chaisemartin_dhaultfoeuille.py:1823 and tests/test_chaisemartin_dhaultfoeuille.py:1866, but there is still no regression for the estimand carried by bootstrap_results.bootstrap_distribution, and no L_max=1 assertion covering the cost_benefit_delta / renderer contract. Those gaps allowed the remaining inconsistencies through. Concrete fix: add a mixed-direction L_max=1, n_bootstrap>0 regression that asserts the chosen bootstrap_distribution behavior, plus an L_max=1 surface test that asserts whether cost_benefit_delta should or should not exist.

Path to Approval

  1. Make results.bootstrap_results self-consistent for L_max == 1: either store the horizon-1 DID_1 bootstrap distribution in bootstrap_distribution, or clear/rename that field instead of leaving the Phase 1 DID_M distribution attached to DID_1 summary statistics.
  2. Add a regression on a mixed-direction L_max=1, n_bootstrap>0 fit that fails if bootstrap_results.overall_* and bootstrap_distribution drift back onto different estimands.

P1: Clear bootstrap_results.bootstrap_distribution to None when
    L_max == 1. The stored DID_M distribution doesn't match the
    DID_1 summary statistics. Per-horizon bootstrap stats are
    accessible via event_study_ses/cis/p_values.
P2: Add bootstrap_distribution is None assertion to the L_max=1
    bootstrap sync regression test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fc83a528751788a6a2be8435866ebd18e9477049


Overall Assessment

Looks good

Static review only: this sandbox does not have the runtime deps needed to execute the added tests (numpy import failed), so this is based on diff inspection plus surrounding source/registry context.

Executive Summary

  • The prior blocker around L_max=1 bootstrap contract looks resolved: the code now re-syncs nested bootstrap_results.overall_* to horizon 1 and clears the stale scalar distribution at diff_diff/chaisemartin_dhaultfoeuille.py#L1685, with regression coverage added at tests/test_chaisemartin_dhaultfoeuille.py#L1866.
  • I did not find an unmitigated source-material mismatch in the changed estimator math. The non-binary path, multi-switch drop rule, and placebo-horizon SE are all explicitly documented in the registry at docs/methodology/REGISTRY.md#L471, docs/methodology/REGISTRY.md#L474, and docs/methodology/REGISTRY.md#L546.
  • Severity P2: fit(..., L_max=1) still computes/stores/renders cost_benefit_delta, even though the public contract now says delta is only a L_max >= 2 surface.
  • Severity P2: the n_switcher_cells metadata contract is still mixed for L_max >= 1; code now stores N_1 switcher-group counts in fields still documented/exported as switching-cell counts.
  • Severity P3: the placebo-bootstrap docs are partially stale; some text still says the placebo path is deferred/unchanged even though placebo_event_study now carries finite analytical/bootstrap inference.

Methodology

Code Quality

No material findings.

Performance

No material findings.

Maintainability

Tech Debt

  • Severity P3. Impact: The remaining Phase 1 per-period placebo-SE limitation and the mixed-direction parity-SE limitation are properly tracked in TODO.md#L59 and TODO.md#L60, so they are non-blocking under the repo’s deferred-work policy. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

…date placebo text

P2: Gate cost_benefit_delta computation to L_max >= 2 (was >= 1).
    Delta is only meaningful with multiple horizons.
P2: Update n_switcher_cells docstring to document dual semantics:
    N_S cell count for L_max=None, N_1 group count for L_max >= 1.
P3: Update stale placebo-bootstrap text in results docstrings and
    REGISTRY: multi-horizon placebos now have valid bootstrap SE/CI/p
    via placebo_horizon_ses/cis/p_values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 13, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7bfa4c63d582e3653bac9d28fe6ac698fee63539


Overall Assessment

Looks good

Static review only: this sandbox does not have numpy, pandas, or pytest, so I could not execute the added tests. This assessment is based on diff inspection plus cross-checking against the methodology registry and nearby source/docstrings.

Executive Summary

  • I did not find an unmitigated methodology defect in the estimator changes. The non-binary treatment support, generalized multi-switch drop rule, L_max=1 -> DID_1 behavior, and placebo-horizon SE extension are all documented in docs/methodology/REGISTRY.md.
  • The prior L_max=1 delta-surface issue appears resolved: delta is now computed/stored only for L_max >= 2.
  • The prior n_switcher_cells issue is only partially resolved: docs/summary now say N_1, but the DID_1 dataframe surface still mixes group-count, cell-count, and “changed from baseline” count semantics.
  • The PR correctly tracks the remaining single-period placebo-SE limitation and mixed-direction parity-SE gap in TODO.md, so those are non-blocking under the repo’s deferred-work policy.
  • Coverage improved materially for non-binary treatment and placebo SEs, but there is still no regression that negative-horizon bootstrap p_value / conf_int propagate onto placebo_event_study.

Methodology

Code Quality

No material findings.

Performance

No material findings.

Maintainability

Tech Debt

  • Severity: P3-informational. Impact: The PR adds the remaining dCDH deferred items to TODO.md, so the single-period placebo-SE limitation and the mixed-direction parity-SE limitation are properly tracked and non-blocking under project policy. Concrete fix: none. (TODO.md:L57-L60)

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: The new code propagates negative-horizon bootstrap percentile p_value / conf_int onto placebo_event_study, but the new tests only assert finite placebo_horizon_ses. A regression in the public inference surface could therefore slip through CI. Concrete fix: add a regression that checks, for at least one estimable placebo lag, that placebo_event_study[-l]["p_value"] and ["conf_int"] match bootstrap_results.placebo_horizon_p_values[l] and placebo_horizon_cis[l]. (diff_diff/chaisemartin_dhaultfoeuille.py:L1806-L1838, tests/test_chaisemartin_dhaultfoeuille.py:L2085-L2106)
  • Severity: P3. Impact: A couple of touched docstrings are still stale after the non-binary / DID_1 changes: _validate_and_aggregate_to_cells() still says non-binary raw treatment values raise ValueError, and DCDHBootstrapResults.overall_* is still documented as always referring to DID_M. Concrete fix: update those docstrings to match the current contracts. (diff_diff/chaisemartin_dhaultfoeuille.py:L137-L142, diff_diff/chaisemartin_dhaultfoeuille_results.py:L69-L74)

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 13, 2026
@igerber igerber merged commit 0028102 into main Apr 13, 2026
22 of 24 checks passed
@igerber igerber deleted the dcdh-phase3a-cleanup-nonbinary branch April 13, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant