refactor(sutta-studio): consolidate response schemas to one canonical location (PR A)#62
Merged
Merged
Conversation
… location (PR A / Phase 2c)
The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver,
typesetter, phase, morph) used to live in TWO files with the same names
but divergent bodies:
- services/compiler/schemas.ts (legacy, treated as "production")
- services/suttaStudioPassPrompts.ts (treated as "bench")
The "bench" copy had two fields the "production" copy lacked:
- wordRange (skeleton): [start, end) indices for sub-segment splitting
- refrainId (anatomist): tag for recurring-phrase styling
Both fields are PRODUCTION features (not bench artifacts):
- applyWordRangeToSegments lives in services/sutta-studio/utils.ts and
is called by the production orchestrator
- config/suttaStudioPromptContext.ts already tells the LLM to emit
wordRange for split phases
- types/suttaStudio.ts has refrainId in the top-level types
- config/suttaStudioExamples.ts uses refrainId in golden examples
- CURATION_PROTOCOL.md §217 codifies the refrainId rule
The result was a real bug: production prompted the LLM for these fields
via prompt context but didn't enforce them via schema, so the LLM emitted
them inconsistently.
This PR creates services/sutta-studio/schemas.ts as the single canonical
source. All 9 consumers (6 canonical passes + 2 legacy compiler files +
1 benchmark script) now import from there. The two legacy locations
become thin re-export shims (deleted in Phase 4 cleanup).
Adds a contract test (schemas-canonical.test.ts) asserting:
- All three locations export the SAME object reference (no future drift)
- skeletonResponseSchema phases include wordRange
- anatomistResponseSchema words include refrainId
Test results: 1425 passing / 0 failing (same as main, +9 new tests).
Per the 5-PR refactor plan from 2026-05-16 archaeology: this is PR A
(schema reconciliation). PRs B-E (LLM caller migration, dictionary/
segments/skeleton migration, orchestrator port, final shim cleanup)
remain to fully complete #40 / #45.
Closes Phase 2c (a substep of Phase 2 that was silently deferred in
c43d656; surfacing it explicitly now per CONSOLIDATION.md design).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver, typesetter, phase, morph) lived in two files with the same names but divergent bodies — production schema in
services/compiler/schemas.tswas stale; the "bench" copy inservices/suttaStudioPassPrompts.tshad two extra fields (wordRange,refrainId) that production already needed.This PR creates
services/sutta-studio/schemas.tsas the single canonical source. Both legacy locations become thin re-export shims (deleted in Phase 4 cleanup, planned PR E).Why this is a real bug fix, not just refactor purity
Both
wordRangeandrefrainIdare production features:wordRangeservices/sutta-studio/utils.ts:255(canonicalapplyWordRangeToSegments),config/suttaStudioPromptContext.ts:52-68(production prompt context),tests/services/compiler/utils.test.ts:181(5+ test cases). ~35 of MN10's 51 phases use it.refrainIdtypes/suttaStudio.ts:194,352(top-level types),config/suttaStudioExamples.ts:277,324-326(golden examples),CURATION_PROTOCOL.md:217(curation rule), ADR-003 §300/§392 (specifiesrefrainColorsfeature flag). Actively used in MN10 phases a-h curation.The production compiler prompted the LLM for these fields but didn't enforce them via schema. So the LLM emitted them inconsistently. Tightening the schema closes the gap.
Scope (mechanical, no product decision)
services/sutta-studio/schemas.ts(+434 lines, verbatim from bench copy)services/sutta-studio/passes/*.ts), 2 legacy compiler files (services/compiler/{index,skeleton}.ts), 1 benchmark script — all imports rewritten to the canonical pathservices/compiler/schemas.ts401→19 lines (re-export shim);services/suttaStudioPassPrompts.ts441→45 lines (already a shim for prompts+utils; now also re-exports schemas)tests/services/sutta-studio/schemas-canonical.test.ts— 9 tests asserting (a) all three locations export the SAME object reference, (b)wordRangeandrefrainIdare present in canonicalNet: +550 / -823 = -273 lines (mostly from de-duplication).
Context: why this PR exists
Per the 2026-05-16 archaeology (the prior compiler-consolidation Phases 1-3 were partially overstated), the original CONSOLIDATION.md design called for moving schemas as part of Phase 2 step 4, but commit
c43d656(Phase 2a+2b) shipped only the utilities and pass-runner moves — the schema reconciliation was silently deferred without a documented deferral. This PR retroactively makes Phase 2c explicit and closes the gap that was masking a real production bug.The remaining 4 PRs of the 5-PR refactor plan:
services/compiler/llm.ts→ canonical)Test plan
tsx scripts/sutta-studio/run-phase-experiment.ts --phase phase-2) produces the same output shape (manual)🤖 Generated with Claude Code