Skip to content

Add TRACE case study and harden reproducibility artifacts#315

Merged
MaxGhenis merged 6 commits intomainfrom
trace-case-study-for-aea
Apr 26, 2026
Merged

Add TRACE case study and harden reproducibility artifacts#315
MaxGhenis merged 6 commits intomainfrom
trace-case-study-for-aea

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis commented Apr 21, 2026

Summary

  • Adds docs/trace-case-study.md, covering the PolicyEngine TRACE use case for the AEA / TRACE grant conversation.
  • Hardens TRACE TRO emission so bundle TROs remain schema-valid when the external data release manifest is unavailable, while marking the downgrade with pe:dataReleaseManifestStatus: unavailable.
  • Updates the US release bundle pins to policyengine-us==1.667.1 and policyengine-us-data==1.78.2, including the certified dataset sha256.
  • Extends simulation TROs to bind input.json, request.json, runtime.json, the bundle TRO, and results.json, so a published run can be hash-checked and rerun from concrete artifacts.
  • Threads the fallback through the CLI, TaxBenefitModelVersion.trace_tro, and TRACE TRO regeneration.

Why

The replication gap raised by Tara Watson and John Sabelhaus is not just package versioning; it is the interaction between policy code, calibrated data, app/runtime request payloads, and result artifacts. Lars Vilhuber pointed us toward TRACE as the right standards-based structure. This PR moves the implementation toward that structure while keeping the case study writeup for the TRACE team.

Verification

  • uv run ruff check src/policyengine/provenance/trace.py src/policyengine/results/trace_tro.py src/policyengine/cli.py src/policyengine/core/tax_benefit_model_version.py src/policyengine/provenance/bundle.py scripts/generate_trace_tros.py tests/test_trace_tro.py
  • uv run pytest tests/test_trace_tro.py tests/test_release_manifests.py tests/test_bundle_refresh.py -q (69 passed)
  • uv run policyengine trace-tro us --out /tmp/pe-repro-us-bundle.trace.tro.jsonld
  • uv run policyengine trace-tro-validate /tmp/pe-repro-us-bundle.trace.tro.jsonld
  • Local replication example under /tmp/pe-repro-example: artifact hashes matched the TRO, composition fingerprint recomputed, and household rerun matched results.json exactly.

Follow-up

  • Publish richer data-release manifests / build recipes for current data releases so bundle TROs do not need the limited-manifest path.
  • Finish API-side signed/durable web-run TRO storage and app display.

MaxGhenis and others added 2 commits April 21, 2026 11:38
Working draft describing the PolicyEngine use case for TRACE, prepared
after the 2026-04-21 meeting with Lars Vilhuber, Tara Watson, John
Sabelhaus, Tim Clark, and Casper.

Structured around the reframe that emerged in the meeting: TRACE
should wrap PolicyEngine infrastructure (the us-data build pipeline
and policyengine.org webapp runs) rather than be embedded in the
end-user Python package. Covers:

- Which PolicyEngine surfaces warrant institutional certification
- The precise claims a TRO lets us make (rules, data, reform, inputs,
  outputs including per-household frame, institutional attestation)
- UK data as the strongest TRACE case for us
- Three concrete implementation workstreams with linked issues
- What TRACE gets from us as a case study (infrastructure-certifying
  vs author-certifying; microdata provenance; pe:* extension
  discipline)
- Three open questions (per-household frame default, retention and
  durable addressing, signing and key rotation)

Lars explicitly asked for this kind of writeup during the meeting to
feed the TRACE grant proposal and vocabulary design work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…and adjacent work, clarify institution-backed self-attestation

Codex review of the 2026-04-21 meeting transcript vs. this writeup
flagged four issues:

1. UK was oversold as 'the strongest' or 'only' TRACE case. Transcript
   supports 'a strong case' but not 'the strongest' — and we are
   considering a recalibrated UK variant that would partly lift the
   restriction anyway.
2. Missing explicit non-scenario section. The meeting was emphatic
   that researcher-laptop TRO emission, transitive dep tracing, and
   plain version-identification are NOT TRACE's job for us.
3. Missing adjacent workstreams that came up but are not TRACE-solved:
   preservation-grade archiving (HuggingFace vs Zenodo), PolicyEngine-
   specific TRACE vocabulary contribution, and non-TRACE version-
   identification work (Casper's point).
4. 'Institutional certification' language oversold what PolicyEngine
   actually provides. An institution certifying its own runs 'carries
   technically no difference' from an author certifying their own runs;
   the value comes from institutional reputation and structured
   evidence, not from cryptographic equivalent of arms-length
   independence.

Also: back off the per-household frame as 'the highest-value downstream
artifact' claim the transcript doesn't support; flag it as open design
question. Drop 'transitive Python deps' from the rules-bundle section
per transcript explicitly saying TRACE has not built that in. Add three
additional open-question items (retention + preservation, key trust
model, production-runtime binding) surfaced by codex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex flagged that the writeup slipped from 'claims we want to make'
into 'claims we can make now' for service-account signatures,
durable URLs, and per-household frames — three things the transcript
does not actually settle.

Changes:
- Reframe section title from 'The precise claims a PolicyEngine TRO
  lets us make' to 'The claims a PolicyEngine TRO should let us
  make'. Every present-tense claim about what a TRO 'lets us' do is
  softened to what a TRO 'would let us' do, conditional on the
  design questions still being settled.
- Per-household frame: drop the 'for US runs the TRO includes the
  full frame' assertion; replace with explicit open-design-question
  framing. Cite the transcript exchange for traceability.
- Signing mechanism: remove the claim that a service-account
  signature is the answer. List service-account + DNS-keychain +
  Sigstore as options under consideration.
- Institutional-attestation claim gains a caveat that the service-
  account signature is 'one implementation, not the only one.'
- Workstream list for policyengine-api#3485 is rewritten from 'signed
  by a PolicyEngine service account, persisted to GCS with durable
  URL' (asserts design decisions that have not been made) to
  explicitly naming the strawman and the alternatives.
- The two workstreams the writeup describes gain an explicit
  live / not-yet-live marker: us-data build TRO emission is live
  (us-data#746 shipped); webapp-run emission + Cite UI is not
  (api#3485, app#2830, api#3486).

The open-questions section already handled this correctly; this
change aligns the main body with that section so the writeup is
internally consistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis changed the title Add TRACE case study writeup for AEA / TRACE grant team Add TRACE case study and harden reproducibility artifacts Apr 26, 2026
@MaxGhenis MaxGhenis merged commit 5484a6f into main Apr 26, 2026
12 checks passed
@MaxGhenis MaxGhenis deleted the trace-case-study-for-aea branch April 26, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant