Skip to content

Remove survey-based national calibration targets#831

Open
MaxGhenis wants to merge 3 commits intomainfrom
codex/remove-national-survey-spm-targets
Open

Remove survey-based national calibration targets#831
MaxGhenis wants to merge 3 commits intomainfrom
codex/remove-national-survey-spm-targets

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Apr 26, 2026

Summary

  • remove circular survey-derived/SPM-preservation targets from dense national calibration: SPM thresholds, capped housing subsidy, capped work-childcare, non-Part-B medical/OTC/premium aggregates, and gross alimony/child-support levels
  • keep defensible component anchors for SPM-relevant expenses: ACS 2024 state/national contract rent and owner real-estate taxes, BLS CE 2024 childcare expenses, and IRS SOI itemized real-estate-tax amount/count targets
  • add non-gross accounting constraints that alimony paid minus received and child support paid minus received equal zero, with absolute-error scaling so sparse ECPS keeps those zero targets
  • keep age-bucketed health calibration limited to Medicare Part B and add regression tests for national loss/ETL target inclusion/exclusion
  • add structured IRS SOI EITC-by-AGI-and-child-count target rows to policy_data.db and include that domain in local target selection
  • add structured IRS SOI taxable-filer AGI/count targets by AGI band and filing status, including joint/surviving-spouse in constraints
  • add policyengine_us_data.utils.national_target_parity, which maps every legacy national build_loss_matrix() label to a structured target ID or an explicit legacy/deprecated/lossy-label reason

Parity manifest

Generated from this branch against a rebuilt/temp policy_data.db and enhanced_cps_2024.h5:

  • 921 national legacy loss labels
  • 385 matched to structured target DB rows
  • 536 classified as legacy-only, lossy legacy labels, or intentionally omitted with explicit reasons
  • 0 accidental db_missing labels
  • New taxable AGI/status constructor adds 148 DB rows; 136 legacy labels match exactly, while 12 old labels are explicitly classified as lossy because loss.py rounded away the true SOI AGI bucket boundaries.

Tests

  • env -u UV_FROZEN uv run pytest tests/unit/calibration/test_loss_targets.py tests/unit/test_etl_national_targets.py tests/unit/datasets/test_enhanced_cps_seeding.py
  • env -u UV_FROZEN uv run ruff format --check .
  • env -u UV_FROZEN uv run towncrier check --compare-with upstream/main --staged
  • git diff --check
  • uv run ruff format --check .
  • uv run ruff check policyengine_us_data/db/etl_irs_soi.py policyengine_us_data/utils/national_target_parity.py policyengine_us_data/calibration/calibration_utils.py policyengine_us_data/utils/constraint_validation.py policyengine_us_data/db/create_field_valid_values.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/test_national_target_parity.py tests/unit/calibration/test_unified_matrix_builder_merge.py tests/unit/test_constraint_validation.py
  • uv run pytest -q tests/unit/test_national_target_parity.py tests/unit/test_etl_irs_soi_overlay.py tests/unit/test_constraint_validation.py tests/unit/calibration/test_unified_matrix_builder_merge.py
  • uv run pytest -q tests/integration/test_database_build.py
  • uv run python -m policyengine_us_data.utils.national_target_parity --dataset-path /Users/maxghenis/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 --target-db /tmp/policy_data_with_taxable_agi_status.db --period 2024 --output /tmp/policyengine-national-target-parity-with-taxable-agi-status.json

@MaxGhenis MaxGhenis force-pushed the codex/remove-national-survey-spm-targets branch from 1c12fce to 8c90de2 Compare April 26, 2026 11:16
@MaxGhenis MaxGhenis force-pushed the codex/remove-national-survey-spm-targets branch from 8c90de2 to 376767c Compare April 26, 2026 11:19
@MaxGhenis MaxGhenis force-pushed the codex/remove-national-survey-spm-targets branch 2 times, most recently from 9bf6782 to 8c9e1c7 Compare April 26, 2026 16:28
@MaxGhenis MaxGhenis force-pushed the codex/remove-national-survey-spm-targets branch from 8c9e1c7 to 1348bc9 Compare April 26, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant