Skip to content

Zenodo preservation-mirror foundation: client + manifest extension#811

Open
MaxGhenis wants to merge 1 commit intomainfrom
zenodo-preservation-mirror
Open

Zenodo preservation-mirror foundation: client + manifest extension#811
MaxGhenis wants to merge 1 commit intomainfrom
zenodo-preservation-mirror

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • New `policyengine_us_data/utils/zenodo_client.py`: typed wrapper around the Zenodo REST API. One public function, `create_and_publish_deposit()`, handles create-deposit + upload files + set metadata + publish, returning the version + concept DOIs plus per-file download URLs and checksums. Env-var gated: `ZENODO_ACCESS_TOKEN` required or it raises `ZenodoNotConfigured`. Callers treat that as "preservation disabled for this release," not a failure.
  • Extends `build_release_manifest()` with two new kwargs — `preservation_mirrors_by_artifact` (per-artifact mirror metadata) and `preservation_dois` (release-level DOIs) — that populate the manifest fields introduced in Add preservation-mirror fields to DataReleaseManifest policyengine.py#317.
  • 11 new zenodo-client tests + 3 new release-manifest tests. Full unit suite green (853 passed, 3 pre-existing skips).

Scope

This PR is the foundation — contract + client + tests — with no production behavior change. Modal-build wiring is a follow-up PR that requires a production Zenodo access token and a sandbox round-trip to verify. Shipping it in two PRs keeps this one reviewable without needing secrets.

Why

2026-04-21 meeting with Lars Vilhuber (AEA Data Editor): HuggingFace doesn't publish a preservation commitment. A TRO citation URL that resolves only through HF can 404 decades from now. Zenodo (CERN / OpenAIRE-operated, DOI-minting) is the reference preservation-grade host Lars pointed at. Fixes #810.

Depends on

Test plan

  • `uv run pytest tests/unit/utils/test_zenodo_client.py` → 11/11
  • `uv run pytest policyengine_us_data/tests/test_release_manifest.py` → all pass, 3 new tests added
  • `uv run pytest tests/unit/` → 853 pass, 3 pre-existing skips
  • `uv run ruff format` clean
  • Follow-up PR: wire into `modal_app/data_build.py`, plumb `ZENODO_ACCESS_TOKEN` as a Modal secret, test a real deposit against sandbox.zenodo.org

🤖 Generated with Claude Code

Groundwork for the Zenodo upload workstream (issue #810): durable
mirror of each certified microdata release to a preservation-grade
host (CERN/OpenAIRE-operated, DOI-minting) so TRO citation URLs stay
verifiable decades from now even if HuggingFace changes its hosting.

- New policyengine_us_data/utils/zenodo_client.py: typed wrapper
  around the Zenodo REST API. One public function,
  create_and_publish_deposit(), handles the four-step Zenodo flow
  (create deposit, upload files, set metadata, publish) and returns
  the version + concept DOIs plus per-file download URLs and
  checksums. Env-var gated: ZENODO_ACCESS_TOKEN must be set or the
  function raises ZenodoNotConfigured, which callers should treat
  as 'preservation mirroring disabled for this release' rather than
  a failure.
- Extends build_release_manifest() with two new kwargs:
  preservation_mirrors_by_artifact (per-artifact Zenodo or other
  mirror metadata) and preservation_dois (release-level Zenodo DOIs).
  Populates the fields introduced in PolicyEngine/policyengine.py#317
  on the emitted manifest JSON.
- 11 zenodo-client tests (happy path, missing token, missing file,
  API error wrapping, metadata payload serialization, env-var
  handling). 3 release-manifest tests (no fields when not provided,
  per-artifact mirror preserved, empty list treated as absent).
- Full unit suite green (853 passed, 3 pre-existing skips).

Modal-build wiring is deferred to a follow-up PR that requires a real
Zenodo access token and a sandbox test round-trip. This commit is
the contract + client + tests, with no production behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mirror certified releases to a preservation-grade host (Zenodo) with DOI

1 participant