refactor(solver): execution output layout uses execution_id, not dataset_hash by lewisjared · Pull Request #655 · Climate-REF/climate-ref

lewisjared · 2026-05-05T01:57:36Z

Description

Replaces the existing <provider>/<diagnostic>/<dataset_hash>/ output layout with a new <provider>/<diagnostic>/<group_short>/<execution_id>/
layout so reruns of the same diagnostic group never clobber prior outputs.

The new compute_group_short helper produces a deterministic, ASCII,
human-readable hint;
the trailing Execution.id is the uniqueness guarantee.

Solver, validate_result, and reingest now insert each Execution row
with output_fragment="_pending", flush to obtain the new id, then rewrite the fragment in place.
Reingest also drops the timestamp suffix and the
allocate_output_fragment call (the helper itself stays for now).

No DB schema change as the existing rows on disk keep working since output_fragment
is an opaque string and is resolved relative to config.paths.results /
config.paths.scratch at read time.

Why

Diagnostic reruns currently overwrite the original output directory.
The old dataset_hash segment is opaque and unbounded in length.
A stable unique per-execution segment that does not depend on a content hash is needed for versioning.

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

…/<group_short>/<execution_id> PR-1 of the diagnostic-versioning rework. Replaces the old ``<provider>/<diagnostic>/<dataset_hash>/`` output layout with a new ``<provider>/<diagnostic>/<group_short>/<execution_id>/`` layout so reruns of the same diagnostic group never clobber prior outputs. The new ``compute_group_short`` helper produces a deterministic, ASCII, human-readable hint; the trailing ``Execution.id`` is the uniqueness guarantee. Solver, ``validate_result``, and reingest now insert each ``Execution`` row with ``output_fragment="_pending"``, flush to obtain the new id, then rewrite the fragment in place. Reingest also drops the timestamp suffix and the ``allocate_output_fragment`` call (the helper itself stays for now). No DB schema change in this PR; existing rows on disk keep working since ``output_fragment`` is an opaque string. Tests cover the new helper (determinism, embedded ``g``/``v`` markers, truncation, collision resistance), assert the placeholder shape from ``DiagnosticExecution.build_execution_definition``, verify two solves on the same group emit fragments that share a prefix and differ only on the trailing ``/<execution_id>``, and rewrite the reingest "twice creates distinct fragments" test to match the new id-based layout.

…fety cap Promote magic strings to module constants (PLACEHOLDER_FRAGMENT, _TEMP_DIAGNOSTIC_VERSION), extract the repeated flush-then-rewrite block into assign_execution_fragment() in fragment.py, add a hard-slice safety net to compute_group_short so the result never exceeds _GROUP_SHORT_MAX, trim narrating comments throughout, and add a boundary test for the cap.

codecov · 2026-05-05T02:13:39Z

Codecov Report

❌ Patch coverage is 85.71429% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...s/climate-ref/src/climate_ref/executor/fragment.py	83.01%	5 Missing and 4 partials ⚠️
...s/climate-ref/src/climate_ref/executor/reingest.py	87.50%	1 Missing and 3 partials ⚠️

Flag	Coverage Δ
core	`93.21% <85.71%> (-0.09%)`	⬇️
providers	`91.85% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
packages/climate-ref/src/climate_ref/solver.py	`98.50% <100.00%> (+0.01%)`	⬆️
packages/climate-ref/src/climate_ref/testing.py	`91.07% <100.00%> (+0.16%)`	⬆️
...s/climate-ref/src/climate_ref/executor/reingest.py	`94.07% <87.50%> (-0.33%)`	⬇️
...s/climate-ref/src/climate_ref/executor/fragment.py	`85.48% <83.01%> (-14.52%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…mment The constant is the actual default for diagnostics that don't declare their own ``version`` attribute, not a temporary placeholder.

lewisjared added 3 commits May 5, 2026 11:44

docs: rename changelog placeholder to 655.improvement.md

6da6249

docs(fragment): rename _TEMP_DIAGNOSTIC_VERSION and drop future-PR co…

eebc320

…mment The constant is the actual default for diagnostics that don't declare their own ``version`` attribute, not a temporary placeholder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(solver): execution output layout uses execution_id, not dataset_hash#655

refactor(solver): execution output layout uses execution_id, not dataset_hash#655
lewisjared wants to merge 4 commits intomainfrom
feat/path-redesign-v2

lewisjared commented May 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewisjared commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why

Checklist

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewisjared commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading