refactor(sutta-studio): consolidate response schemas to one canonical location (PR A) by anantham · Pull Request #62 · anantham/LexiconForge

anantham · 2026-05-16T15:24:17Z

Summary

The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver, typesetter, phase, morph) lived in two files with the same names but divergent bodies — production schema in services/compiler/schemas.ts was stale; the "bench" copy in services/suttaStudioPassPrompts.ts had two extra fields (wordRange, refrainId) that production already needed.

This PR creates services/sutta-studio/schemas.ts as the single canonical source. Both legacy locations become thin re-export shims (deleted in Phase 4 cleanup, planned PR E).

Why this is a real bug fix, not just refactor purity

Both wordRange and refrainId are production features:

Field	Production usage
`wordRange`	`services/sutta-studio/utils.ts:255` (canonical `applyWordRangeToSegments`), `config/suttaStudioPromptContext.ts:52-68` (production prompt context), `tests/services/compiler/utils.test.ts:181` (5+ test cases). ~35 of MN10's 51 phases use it.
`refrainId`	`types/suttaStudio.ts:194,352` (top-level types), `config/suttaStudioExamples.ts:277,324-326` (golden examples), `CURATION_PROTOCOL.md:217` (curation rule), ADR-003 §300/§392 (specifies `refrainColors` feature flag). Actively used in MN10 phases a-h curation.

The production compiler prompted the LLM for these fields but didn't enforce them via schema. So the LLM emitted them inconsistently. Tightening the schema closes the gap.

Scope (mechanical, no product decision)

Change	Detail
New canonical file	`services/sutta-studio/schemas.ts` (+434 lines, verbatim from bench copy)
9 consumer updates	6 canonical passes (`services/sutta-studio/passes/*.ts`), 2 legacy compiler files (`services/compiler/{index,skeleton}.ts`), 1 benchmark script — all imports rewritten to the canonical path
Two shrinkages	`services/compiler/schemas.ts` 401→19 lines (re-export shim); `services/suttaStudioPassPrompts.ts` 441→45 lines (already a shim for prompts+utils; now also re-exports schemas)
New contract test	`tests/services/sutta-studio/schemas-canonical.test.ts` — 9 tests asserting (a) all three locations export the SAME object reference, (b) `wordRange` and `refrainId` are present in canonical

Net: +550 / -823 = -273 lines (mostly from de-duplication).

Context: why this PR exists

Per the 2026-05-16 archaeology (the prior compiler-consolidation Phases 1-3 were partially overstated), the original CONSOLIDATION.md design called for moving schemas as part of Phase 2 step 4, but commit c43d656 (Phase 2a+2b) shipped only the utilities and pass-runner moves — the schema reconciliation was silently deferred without a documented deferral. This PR retroactively makes Phase 2c explicit and closes the gap that was masking a real production bug.

The remaining 4 PRs of the 5-PR refactor plan:

PR B: LLM caller migration (services/compiler/llm.ts → canonical)
PR C: Dictionary + segments + skeleton migration
PR D: Orchestrator port (the big one — 773 lines)
PR E: Final shim cleanup (the original Phase 4)

Test plan

1425 unit/integration tests pass (vitest, 47s) — same as main
9 new contract tests pass (schemas-canonical.test.ts)
CI workflow ci: add test gate workflow — runs vitest on PRs #61 (if merged before this) gates the suite automatically
Smoke-test the live UI compile on a sutta route (manual; recommend before merge)
Benchmark CLI smoke (tsx scripts/sutta-studio/run-phase-experiment.ts --phase phase-2) produces the same output shape (manual)

🤖 Generated with Claude Code

… location (PR A / Phase 2c) The 7 LLM response schemas (skeleton, anatomist, lexicographer, weaver, typesetter, phase, morph) used to live in TWO files with the same names but divergent bodies: - services/compiler/schemas.ts (legacy, treated as "production") - services/suttaStudioPassPrompts.ts (treated as "bench") The "bench" copy had two fields the "production" copy lacked: - wordRange (skeleton): [start, end) indices for sub-segment splitting - refrainId (anatomist): tag for recurring-phrase styling Both fields are PRODUCTION features (not bench artifacts): - applyWordRangeToSegments lives in services/sutta-studio/utils.ts and is called by the production orchestrator - config/suttaStudioPromptContext.ts already tells the LLM to emit wordRange for split phases - types/suttaStudio.ts has refrainId in the top-level types - config/suttaStudioExamples.ts uses refrainId in golden examples - CURATION_PROTOCOL.md §217 codifies the refrainId rule The result was a real bug: production prompted the LLM for these fields via prompt context but didn't enforce them via schema, so the LLM emitted them inconsistently. This PR creates services/sutta-studio/schemas.ts as the single canonical source. All 9 consumers (6 canonical passes + 2 legacy compiler files + 1 benchmark script) now import from there. The two legacy locations become thin re-export shims (deleted in Phase 4 cleanup). Adds a contract test (schemas-canonical.test.ts) asserting: - All three locations export the SAME object reference (no future drift) - skeletonResponseSchema phases include wordRange - anatomistResponseSchema words include refrainId Test results: 1425 passing / 0 failing (same as main, +9 new tests). Per the 5-PR refactor plan from 2026-05-16 archaeology: this is PR A (schema reconciliation). PRs B-E (LLM caller migration, dictionary/ segments/skeleton migration, orchestrator port, final shim cleanup) remain to fully complete #40 / #45. Closes Phase 2c (a substep of Phase 2 that was silently deferred in c43d656; surfacing it explicitly now per CONSOLIDATION.md design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-16T15:24:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lexicon-forge	Ready	Preview, Comment	May 16, 2026 3:24pm

vercel Bot deployed to Preview May 16, 2026 15:24 View deployment

anantham merged commit 0d81bec into main May 16, 2026
3 checks passed

anantham mentioned this pull request May 17, 2026

refactor(sutta-studio): consolidate LLM caller (PR B / Phase 3) #63

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(sutta-studio): consolidate response schemas to one canonical location (PR A)#62

refactor(sutta-studio): consolidate response schemas to one canonical location (PR A)#62
anantham merged 1 commit into
mainfrom
feat/opus-schema-reconcile

anantham commented May 16, 2026

Uh oh!

vercel Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantham commented May 16, 2026

Summary

Why this is a real bug fix, not just refactor purity

Scope (mechanical, no product decision)

Context: why this PR exists

Test plan

Uh oh!

vercel Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 16, 2026 •

edited

Loading