refactor(solver): execution output layout uses execution_id, not dataset_hash#655
Open
lewisjared wants to merge 4 commits intomainfrom
Open
refactor(solver): execution output layout uses execution_id, not dataset_hash#655lewisjared wants to merge 4 commits intomainfrom
lewisjared wants to merge 4 commits intomainfrom
Conversation
…/<group_short>/<execution_id> PR-1 of the diagnostic-versioning rework. Replaces the old ``<provider>/<diagnostic>/<dataset_hash>/`` output layout with a new ``<provider>/<diagnostic>/<group_short>/<execution_id>/`` layout so reruns of the same diagnostic group never clobber prior outputs. The new ``compute_group_short`` helper produces a deterministic, ASCII, human-readable hint; the trailing ``Execution.id`` is the uniqueness guarantee. Solver, ``validate_result``, and reingest now insert each ``Execution`` row with ``output_fragment="_pending"``, flush to obtain the new id, then rewrite the fragment in place. Reingest also drops the timestamp suffix and the ``allocate_output_fragment`` call (the helper itself stays for now). No DB schema change in this PR; existing rows on disk keep working since ``output_fragment`` is an opaque string. Tests cover the new helper (determinism, embedded ``g``/``v`` markers, truncation, collision resistance), assert the placeholder shape from ``DiagnosticExecution.build_execution_definition``, verify two solves on the same group emit fragments that share a prefix and differ only on the trailing ``/<execution_id>``, and rewrite the reingest "twice creates distinct fragments" test to match the new id-based layout.
…fety cap Promote magic strings to module constants (PLACEHOLDER_FRAGMENT, _TEMP_DIAGNOSTIC_VERSION), extract the repeated flush-then-rewrite block into assign_execution_fragment() in fragment.py, add a hard-slice safety net to compute_group_short so the result never exceeds _GROUP_SHORT_MAX, trim narrating comments throughout, and add a boundary test for the cap.
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…mment The constant is the actual default for diagnostics that don't declare their own ``version`` attribute, not a temporary placeholder.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Replaces the existing
<provider>/<diagnostic>/<dataset_hash>/output layout with a new<provider>/<diagnostic>/<group_short>/<execution_id>/layout so reruns of the same diagnostic group never clobber prior outputs.
The new
compute_group_shorthelper produces a deterministic, ASCII,human-readable hint;
the trailing
Execution.idis the uniqueness guarantee.Solver,
validate_result, and reingest now insert eachExecutionrowwith
output_fragment="_pending", flush to obtain the new id, then rewrite the fragment in place.Reingest also drops the timestamp suffix and the
allocate_output_fragmentcall (the helper itself stays for now).No DB schema change as the existing rows on disk keep working since
output_fragmentis an opaque string and is resolved relative to
config.paths.results/config.paths.scratchat read time.Why
dataset_hashsegment is opaque and unbounded in length.Checklist
Please confirm that this pull request has done the following:
changelog/