Skip to content

refactor(sutta-studio): consolidate response schemas to one canonical location (PR A)#62

Merged
anantham merged 1 commit into
mainfrom
feat/opus-schema-reconcile
May 16, 2026
Merged

refactor(sutta-studio): consolidate response schemas to one canonical location (PR A)#62
anantham merged 1 commit into
mainfrom
feat/opus-schema-reconcile

Conversation

@anantham
Copy link
Copy Markdown
Owner

Summary

The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver, typesetter, phase, morph) lived in two files with the same names but divergent bodies — production schema in services/compiler/schemas.ts was stale; the "bench" copy in services/suttaStudioPassPrompts.ts had two extra fields (wordRange, refrainId) that production already needed.

This PR creates services/sutta-studio/schemas.ts as the single canonical source. Both legacy locations become thin re-export shims (deleted in Phase 4 cleanup, planned PR E).

Why this is a real bug fix, not just refactor purity

Both wordRange and refrainId are production features:

Field Production usage
wordRange services/sutta-studio/utils.ts:255 (canonical applyWordRangeToSegments), config/suttaStudioPromptContext.ts:52-68 (production prompt context), tests/services/compiler/utils.test.ts:181 (5+ test cases). ~35 of MN10's 51 phases use it.
refrainId types/suttaStudio.ts:194,352 (top-level types), config/suttaStudioExamples.ts:277,324-326 (golden examples), CURATION_PROTOCOL.md:217 (curation rule), ADR-003 §300/§392 (specifies refrainColors feature flag). Actively used in MN10 phases a-h curation.

The production compiler prompted the LLM for these fields but didn't enforce them via schema. So the LLM emitted them inconsistently. Tightening the schema closes the gap.

Scope (mechanical, no product decision)

Change Detail
New canonical file services/sutta-studio/schemas.ts (+434 lines, verbatim from bench copy)
9 consumer updates 6 canonical passes (services/sutta-studio/passes/*.ts), 2 legacy compiler files (services/compiler/{index,skeleton}.ts), 1 benchmark script — all imports rewritten to the canonical path
Two shrinkages services/compiler/schemas.ts 401→19 lines (re-export shim); services/suttaStudioPassPrompts.ts 441→45 lines (already a shim for prompts+utils; now also re-exports schemas)
New contract test tests/services/sutta-studio/schemas-canonical.test.ts — 9 tests asserting (a) all three locations export the SAME object reference, (b) wordRange and refrainId are present in canonical

Net: +550 / -823 = -273 lines (mostly from de-duplication).

Context: why this PR exists

Per the 2026-05-16 archaeology (the prior compiler-consolidation Phases 1-3 were partially overstated), the original CONSOLIDATION.md design called for moving schemas as part of Phase 2 step 4, but commit c43d656 (Phase 2a+2b) shipped only the utilities and pass-runner moves — the schema reconciliation was silently deferred without a documented deferral. This PR retroactively makes Phase 2c explicit and closes the gap that was masking a real production bug.

The remaining 4 PRs of the 5-PR refactor plan:

  • PR B: LLM caller migration (services/compiler/llm.ts → canonical)
  • PR C: Dictionary + segments + skeleton migration
  • PR D: Orchestrator port (the big one — 773 lines)
  • PR E: Final shim cleanup (the original Phase 4)

Test plan

  • 1425 unit/integration tests pass (vitest, 47s) — same as main
  • 9 new contract tests pass (schemas-canonical.test.ts)
  • CI workflow ci: add test gate workflow — runs vitest on PRs #61 (if merged before this) gates the suite automatically
  • Smoke-test the live UI compile on a sutta route (manual; recommend before merge)
  • Benchmark CLI smoke (tsx scripts/sutta-studio/run-phase-experiment.ts --phase phase-2) produces the same output shape (manual)

🤖 Generated with Claude Code

… location (PR A / Phase 2c)

The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver,
typesetter, phase, morph) used to live in TWO files with the same names
but divergent bodies:
- services/compiler/schemas.ts (legacy, treated as "production")
- services/suttaStudioPassPrompts.ts (treated as "bench")

The "bench" copy had two fields the "production" copy lacked:
  - wordRange (skeleton): [start, end) indices for sub-segment splitting
  - refrainId (anatomist): tag for recurring-phrase styling

Both fields are PRODUCTION features (not bench artifacts):
  - applyWordRangeToSegments lives in services/sutta-studio/utils.ts and
    is called by the production orchestrator
  - config/suttaStudioPromptContext.ts already tells the LLM to emit
    wordRange for split phases
  - types/suttaStudio.ts has refrainId in the top-level types
  - config/suttaStudioExamples.ts uses refrainId in golden examples
  - CURATION_PROTOCOL.md §217 codifies the refrainId rule

The result was a real bug: production prompted the LLM for these fields
via prompt context but didn't enforce them via schema, so the LLM emitted
them inconsistently.

This PR creates services/sutta-studio/schemas.ts as the single canonical
source. All 9 consumers (6 canonical passes + 2 legacy compiler files +
1 benchmark script) now import from there. The two legacy locations
become thin re-export shims (deleted in Phase 4 cleanup).

Adds a contract test (schemas-canonical.test.ts) asserting:
  - All three locations export the SAME object reference (no future drift)
  - skeletonResponseSchema phases include wordRange
  - anatomistResponseSchema words include refrainId

Test results: 1425 passing / 0 failing (same as main, +9 new tests).

Per the 5-PR refactor plan from 2026-05-16 archaeology: this is PR A
(schema reconciliation). PRs B-E (LLM caller migration, dictionary/
segments/skeleton migration, orchestrator port, final shim cleanup)
remain to fully complete #40 / #45.

Closes Phase 2c (a substep of Phase 2 that was silently deferred in
c43d656; surfacing it explicitly now per CONSOLIDATION.md design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexicon-forge Ready Ready Preview, Comment May 16, 2026 3:24pm

@anantham anantham merged commit 0d81bec into main May 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant