Skip to content

feat(PlutusSchema): annotation-driven Plutus Data encoding#253

Open
solidsnakedev wants to merge 42 commits intomainfrom
worktree-schema-plutus-annotation
Open

feat(PlutusSchema): annotation-driven Plutus Data encoding#253
solidsnakedev wants to merge 42 commits intomainfrom
worktree-schema-plutus-annotation

Conversation

@solidsnakedev
Copy link
Copy Markdown
Collaborator

Plutus Data encoding currently requires TSchema-specific combinators (TSchema.Struct, TSchema.Variant, etc.) that users must learn on top of Effect Schema. There is no way to derive encoding from standard Effect Schema types, and no compile-time enforcement that all AST node types are handled.

This adds PlutusSchema — an annotation-driven compiler that derives Plutus Data encoding from any Effect Schema. Users write Plutus.data(Schema.Struct({...})) and the compiler walks the AST using Effect's canonical Match<A> + getCompiler pattern (same as Pretty, Arbitrary) to produce bidirectional codecs. Custom annotations (ConstrIndexId, FlatInUnionId, etc.) control encoding where inference isn't enough. The compiler handles all 22 AST node types exhaustively, throws for unsupported types (Date, Duration, String, Number) instead of silently passing through, and produces CBOR output byte-for-byte identical to TSchema. TSchema continues to work unchanged — this is additive.

New modules: PlutusAnnotation (5 annotation symbols + module augmentation), PlutusCompiler (internal AST compiler), PlutusSchema (public API). 175 tests covering annotations, compiler handlers, real-world types (Address, Credential, Value, CIP68), edge cases, and benchmarks.

Set up phased research loop to design TypeScript annotation system
for Plutus Data encoding/decoding using Effect Schema.

6 phases: annotation deep-dive, pattern catalog, candidate designs,
evaluation, prototype, edge cases.
Key findings:
- Match<A> + getCompiler() is the canonical derivation pattern
- Annotations attach to all AST nodes, custom via Symbol.for()
- Three derivation approaches: AST Compiler, Two-Phase, Annotation Hook
- Schema.suspend handles recursive types with memoized thunks
- TSchema already uses annotation-driven patterns internally
Cataloged 33 distinct encoding patterns across 8 categories:
primitives, collections, structs (8 variants), unions (6 variants),
nullable, literal, recursive, and composition patterns.

Includes real-world compositions and validation rules.
Candidate A: Annotation-Driven (AST Compiler)
Candidate B: Fluent Builder (thin TSchema wrapper)
Candidate C: Schema.Class Protocol (Haskell-like classes)
Candidate D: Hybrid (annotated Effect Schema + derive layer)

Each candidate includes full API examples covering all 33 patterns,
implementation sketches, and pros/cons analysis.
Weighted evaluation across 8 criteria (type safety, ergonomics,
completeness, recursion, compatibility, extensibility, Effect
alignment, migration).

Winner: Candidate D (Hybrid) at 48.5 — non-breaking dual-path
approach with type inference and Effect-native implementation.
Runner-up: A (39.5). Rejected: B (37.5), C (33.5).
Old phases 5-6 scrapped. Prototype was wrong: manual switch(ast._tag)
instead of Effect's Match<A> + getCompiler pattern with custom annotations.

New phases: study real Effect compiler impls, define annotation symbols,
build AST compiler, wire public API, edge cases, real-world validation.
Studied Pretty.ts (single-phase Match<A>), Arbitrary.ts (two-phase
Description), Schema.equivalence (manual switch), SchemaAST types,
and memoizeThunk. Decision: use Pretty.ts single-phase pattern.
5 Symbol.for() annotation keys following Effect conventions:
ConstrIndexId, EncodingId, FlatInUnionId, FlatFieldsId, TagFieldId.
Curried getters + convenience helpers. 15 tests passing.
Match<PlutusCodec> + getCompiler following Pretty.ts pattern.
All 22 AST tags handled. Annotation-first for ConstrIndex,
FlatInUnion. memoizeThunk for Suspend recursion. 25 tests passing.
Plutus.data(), makeIsData(), makeIsDataIndexed(), codec().
Annotation re-exports. TSchema interop via Schema.encodeSync.
24 PlutusSchema tests + all 161 evolution tests passing.
27 edge case tests: deep recursion, nested options, non-sequential
indices, tag field control, TSchema mixing, performance benchmarks.
Limitations documented: Map, flatFields, mutual recursion.
Byte-for-byte CBOR match: OutputReference, Credential,
StakeCredential, Address, Value, CIP68Metadata. 26 tests.
All 10 research phases complete. 117 new tests total.
Adversarial review phase: question compiler pattern, type safety,
annotation coverage, try to break it with edge cases, compare
with Haskell, benchmark against TSchema, review error quality.
35 challenge tests. Bug fixed: Schema.Record now throws instead of
silently dropping data. Validated: compiler pattern, type safety,
error quality, Haskell complex types (TxInfo, NativeScript).
249 total tests passing.
Repeating phase with priority-ordered backlog: encode/decode
overhead, flatFields, Schema.Class, Map auto-derivation, Effect
error channel, mutual recursion, module augmentation, docs.
tschemaFastCodec() bypasses Schema.encodeSync for known TSchema
types (Boolean, NullOr, UndefinedOr). Encode with TSchema.Boolean
field now within 3x of pure TSchema (was 5x). 250 tests passing.
TypeLiteral handler now supports FlatFieldsId annotation and
TSchema backward compat. Inner struct fields inlined into parent
Constr during encoding, reconstructed during decoding.
4 new tests, 254 total passing.
Transformation handler detects Transformation(TypeLiteral, Declaration)
pattern and compiles from-side TypeLiteral. TaggedClass _tag field
auto-stripped. 254 tests passing.
Declaration handler detects Map/MapFromSelf via Description
annotation. Compiles key/value codecs recursively. Nested maps,
maps in structs, CBOR byte-match with TSchema.Map. 259 tests.
Mutual recursion already works via memoizeThunk + Schema.suspend.
Tested Expr/BinOp and A→B→A patterns. Effect error channel
deferred — raw throws already caught by Data.withSchema. 261 tests.
declare module "effect/SchemaAST" extends Annotations interface
with ConstrIndexId, EncodingId, FlatInUnionId, FlatFieldsId,
TagFieldId. Autocomplete + type checking in .annotations() calls.
262 tests passing.
Side-by-side examples for all patterns: primitives, struct,
union, option, map, array, recursive, Schema.Class, codec.
All 8 backlog items complete.
PlutusCompiler.ts: 10→0, PlutusSchema.ts: 4→0. Used discriminated
union narrowing for AST types, narrower casts with documented
reasons for return types. TypeScript clean. 262 tests passing.
Recursive schemas use explicit encoded type annotation in suspend
thunk — matches Effect's own test pattern. Remaining 2 are
intentional wrong-type error tests. 262 tests passing.
Remove stale CRITICAL warning about wrong prototype. Add "backlog
empty" instruction. Add no-as-any rule to loop execution rules.
Handler-by-handler audit: TypeLiteral, Union, TupleType, Suspend,
Transformation, Declaration, Literal, Map, flatFields edge cases.
Handler-by-handler audit: TypeLiteral, Union, TupleType, Suspend,
Literal, Map, flatFields, Transformation, roundtrip stress.
All degenerate inputs produce correct output. 292 tests passing.
Declaration handler no longer silently passes through. Throws by
default for unrecognized types (Date, Duration, FiberId, etc.).
Added Set→Set, List/Chunk→Array, HashMap/ReadonlyMap→Map
detection via Description prefixes. 300 tests passing.
Profile hot path, eliminate Schema.transform overhead, cache
compiled codecs, benchmark realistic workloads, report actual
numbers instead of loose threshold assertions.
5000-iteration benchmarks with proper warmup. Results: 2-field 1.0x,
10-field 1.0x, Address 0.7x (faster!), decode 1.0x, CBOR 1.0x.
Earlier 3-5x was warmup artifact. 308 tests passing.
Backlog items 14-16 from cross-ecosystem comparison with Haskell
PlutusTx, Rust CML derive macros, and Scalus.
makeEnum("Red", "Green", "Blue") auto-generates makeIsDataIndexed
with empty fields and indices from declaration order. CBOR matches
manual equivalent. 4 new tests, 312 total passing.
Users should compose from primitives, not accumulate convenience
wrappers. Newtype: use raw schema directly. Auto-index: explicit
indices are clearer and less fragile than implicit key order.
Added .loop-phase file for state tracking. Added health check
section (tests + tsc before every commit). Cleaned backlog to
summary table. Added watchdog mode for empty backlog. Added
transition rules, draft-before-commit, no-convenience-wrapper
rules. Condensed completed phases to one-liners.
312 tests pass, zero TS errors, no coverage gaps in compiler
pass-through sites. Backlog empty, no regressions.
Users compose from primitives: Plutus.data() + Schema.Struct/Union
+ ConstrIndexId/FlatInUnionId annotations. No convenience wrappers.
All 6 test files rewritten. 312 tests pass, zero TS errors.
Phase 1: consolidate 8 test files → 1 polished file
Phase 2: polish production code (JSDoc, no dead code)
Phase 3: wire exports (index.ts, package.json)
Phase 4: final review (read as reviewer)
Phase 5: PR prep (squash, description, rebase)
6 sections: Annotations, Compiler, Public API, Real-world types,
Edge cases, Benchmarks. 176 tests in one file, 273 total passing.
Removed duplicates and redundant enum comparison test.
Removed fromSchema alias, duplicate Variant export. Fixed stale
header comment. Narrowed applyAnnotations from any to SchemaAST.AST.
Removed fromSchema test. Zero as any in production. 272 tests.
Added to index.ts barrel exports. Fixed applyAnnotations to use
SchemaAST.annotations() (build-safe). PlutusCompiler stays internal.
Build passes (225 files). 272 tests passing.
5 phases done: test consolidation, code polish, export wiring,
final review, PR prep. Branch ready for rebase + PR.
Copilot AI review requested due to automatic review settings April 16, 2026 02:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an annotation-driven Plutus Data derivation layer on top of Effect Schema (PlutusSchema), backed by an internal AST compiler (PlutusCompiler) and a small annotation module (PlutusAnnotation), plus a large consolidated test suite validating byte-for-byte CBOR compatibility with existing TSchema.

Changes:

  • Introduces PlutusAnnotation (symbol-based annotations + module augmentation) to control encoding details.
  • Implements PlutusCompiler (SchemaAST.Match + getCompiler) to derive bidirectional toData / fromData codecs from Effect Schema AST.
  • Adds PlutusSchema public API (data(), codec(), re-exports) and wires exports through the package barrel.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/evolution/src/PlutusAnnotation.ts Defines Plutus-specific annotation symbols, getters/helpers, and SchemaAST module augmentation.
packages/evolution/src/PlutusCompiler.ts Internal AST compiler producing Plutus Data codecs from annotated Effect Schema AST nodes.
packages/evolution/src/PlutusSchema.ts Public API wrapping the compiler as a Schema<A, Data.Data> transform and re-exporting TSchema primitives.
packages/evolution/src/index.ts Exposes PlutusAnnotation and PlutusSchema via barrel exports.
packages/evolution/test/PlutusData.test.ts Consolidated test suite for annotations, compiler handlers, public API, real-world types, edge cases, and benchmarks.
.claude/research/* Research/design logs documenting implementation rationale and cleanup loop progress.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +174 to +183
const ps = typeLiteral.propertySignatures
// Count non-tag fields (same logic as the TypeLiteral handler)
let count = 0
for (const p of ps) {
const name = p.name as string
if ((KNOWN_TAG_FIELDS as ReadonlyArray<string>).includes(name)) {
// Check if it's actually a literal tag
if (p.type._tag === "Literal") continue
if (p.type._tag === "Transformation" && p.type.to._tag === "Literal") continue
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countStructFields() is used to determine how many parent Constr fields to slice when decoding flatFields, but it hard-codes tag stripping based on KNOWN_TAG_FIELDS only. This can miscount when the nested struct uses TagFieldId overrides (custom tag field name) or explicitly disables tag stripping (TagFieldId: false), causing misaligned slicing and incorrect decoding for flatFields structs.

Suggestion: make countStructFields use the same tag detection logic as the TypeLiteral handler (i.e., respect PA.getTagField(ast) and isLiteralTag(ps, tagFieldOverride)), or derive the count from the already-compiled nested struct codec metadata instead of re-implementing tag logic.

Suggested change
const ps = typeLiteral.propertySignatures
// Count non-tag fields (same logic as the TypeLiteral handler)
let count = 0
for (const p of ps) {
const name = p.name as string
if ((KNOWN_TAG_FIELDS as ReadonlyArray<string>).includes(name)) {
// Check if it's actually a literal tag
if (p.type._tag === "Literal") continue
if (p.type._tag === "Transformation" && p.type.to._tag === "Literal") continue
}
const isLiteralTagField = (p: SchemaAST.PropertySignature): boolean =>
p.type._tag === "Literal" ||
(p.type._tag === "Transformation" && p.type.to._tag === "Literal")
const tagFieldOverride = PA.getTagField(ast)
const ps = typeLiteral.propertySignatures
// Count non-tag fields using the same tag-field semantics as the TypeLiteral handler:
// - explicit string override => only that field is considered the tag field
// - explicit false => never strip a tag field
// - otherwise fall back to well-known tag field names
let count = 0
for (const p of ps) {
const name = p.name as string
if (tagFieldOverride === false) {
count++
continue
}
if (typeof tagFieldOverride === "string") {
if (name === tagFieldOverride && isLiteralTagField(p)) continue
count++
continue
}
if ((KNOWN_TAG_FIELDS as ReadonlyArray<string>).includes(name) && isLiteralTagField(p)) {
continue
}

Copilot uses AI. Check for mistakes.
Comment on lines +526 to +544
toData: (a: any) => {
// Find matching member via tag field or trial
let memberIdx: number
if (tagField && tagMap && typeof a === "object" && a !== null) {
memberIdx = tagMap.get(String(a[tagField])) ?? 0
} else {
// Fallback: try each member's codec (first match wins)
memberIdx = 0
}

const member = memberCodecs[memberIdx]
const encoded = member.codec.toData(a)

if (member.isFlat && encoded instanceof Data.Constr) {
return new Data.Constr({ index: BigInt(member.index), fields: encoded.fields })
}

return new Data.Constr({ index: BigInt(member.index), fields: [encoded] })
},
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union encoding currently doesn’t actually select the correct member: if no tag field is detected it always picks member 0, and even when a tag field is detected it falls back to member 0 when the tag value isn’t in tagMap. This makes unions like Schema.Union(Schema.BigIntFromSelf, Schema.Boolean) encode true using the first member’s codec, producing invalid/incorrect Plutus Data without an error.

Suggestion: implement real member selection (e.g., similar to TSchema.Union using Schema.is / ParseResult.is against each member AST), and throw a descriptive error when no member matches or when a detected tag value is unknown.

Copilot uses AI. Check for mistakes.
Comment on lines +549 to +559
// Find matching member by index
const flatMember = memberCodecs.find((m) => m.isFlat && m.index === idx)
if (flatMember) {
return flatMember.codec.fromData(d) // Flat: decode directly from Constr
}

// Non-flat: member at position idx, unwrap one level
const member = memberCodecs[idx]
if (!member) {
throw new Error(`PlutusCompiler: invalid union index ${idx}, expected 0..${memberCodecs.length - 1}`)
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union decoding assumes constr.index is a positional index for non-flat members (memberCodecs[idx]), but encoding allows arbitrary ConstrIndexId on any member (memberIndex = getConstrIndex(t) ?? i). If a non-flat member has a custom index, encoding will emit Constr(customIndex, [encoded]) but decoding will either pick the wrong member or throw because idx is out of range.

Suggestion: decode by looking up the member by its declared member.index (for both flat and non-flat), rather than indexing into memberCodecs by position. Alternatively, constrain ConstrIndexId usage to flat members only and enforce/throw if a non-flat member has a custom index.

Suggested change
// Find matching member by index
const flatMember = memberCodecs.find((m) => m.isFlat && m.index === idx)
if (flatMember) {
return flatMember.codec.fromData(d) // Flat: decode directly from Constr
}
// Non-flat: member at position idx, unwrap one level
const member = memberCodecs[idx]
if (!member) {
throw new Error(`PlutusCompiler: invalid union index ${idx}, expected 0..${memberCodecs.length - 1}`)
}
// Find matching member by its declared constructor index
const member = memberCodecs.find((m) => m.index === idx)
if (!member) {
throw new Error(`PlutusCompiler: invalid union index ${idx}`)
}
if (member.isFlat) {
return member.codec.fromData(d) // Flat: decode directly from Constr
}
// Non-flat: unwrap the single encoded payload field

Copilot uses AI. Check for mistakes.
const booleanCodec: PlutusCodec = {
toData: (a: boolean) =>
a ? new Data.Constr({ index: 1n, fields: [] }) : new Data.Constr({ index: 0n, fields: [] }),
fromData: (d: Data.Data) => (d as Data.Constr).index === 1n
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

booleanCodec.fromData treats any Constr index other than 1n as false and doesn’t validate the Constr shape (index must be 0/1 and fields must be empty). This diverges from TSchema.Boolean which throws on invalid indices/fields, and can cause malformed data (e.g., Constr(3, [])) to silently decode as false.

Suggestion: add the same validation as TSchema.Boolean (reject indices other than 0/1 and non-empty fields) so invalid Plutus data doesn’t silently coerce to a boolean.

Suggested change
fromData: (d: Data.Data) => (d as Data.Constr).index === 1n
fromData: (d: Data.Data) => {
if (!(d instanceof Data.Constr)) {
throw new Error("Invalid boolean data: expected Constr")
}
if (d.fields.length !== 0) {
throw new Error("Invalid boolean data: expected empty Constr fields")
}
if (d.index !== 0n && d.index !== 1n) {
throw new Error(`Invalid boolean data: expected Constr index 0 or 1, got ${String(d.index)}`)
}
return d.index === 1n
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants