Skip to content

fix(selector): arg-move cycles via the parallel-move resolver — exhaustion gone AND a latent swap miscompile killed (#326)#327

Merged
avrabe merged 1 commit into
mainfrom
fix/326-arg-move-resolver
Jun 11, 2026
Merged

fix(selector): arg-move cycles via the parallel-move resolver — exhaustion gone AND a latent swap miscompile killed (#326)#327
avrabe merged 1 commit into
mainfrom
fix/326-arg-move-resolver

Conversation

@avrabe

@avrabe avrabe commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

gale's #326, root-caused two layers deep

The exhaustion site is emit_arg_moves' cycle-breaking scratch: with #311's (correct) pair tagging saturating R4–R8, a genuine arg cycle had nowhere to park and the Err propagated. The principled fix is wiring VCR-RA-004's resolver into its designed first consumer: cycles now break via a register scratch when one is free, else a stack-scratch cell — the demand for a callee-saved register is gone structurally.

And the investigation found the old cycle-breaker was wrong-code: emit_parallel_move parked the cycle source but clobbered the next move's still-needed value — a 2-swap produced R0=v1, R1=v1 (verified by simulation). Latent because marshal cycles are rare; killed by the replacement. (The RISC-V copy does not share the bug — checked, untouched.)

Evidence

  • NEW scripts/repro/mutex_pressure.wat reproduces gale's exact error message on main, first try (three live i64 pairs + a param/result swap) → compiles + 7/7 unicorn-vs-wasmtime on the branch (marshal visible as str/mov/ldr capstone-verified).
  • Fixtures sha-identical ×7 (incl. all pressure lanes — acyclic with-scratch output byte-identical by pinning the resolver's emission order); frozen differentials PASS; 387/387 lib + workspace green; clippy/fmt clean.
  • Two more latent holes closed en route: free_callee_saved could return an arg source (resolver filters against the move set); the no-spill-area i32 cycle edge now returns the ladder-recoverable Err instead of aliasing frame slots.

Honest limits

Resolver slot draws from the 8-slot pool (pool exhaustion stays a clean Err); >4/i64 args remain the pre-existing #195 bound; the repro pins pairs via i64 constants rather than gale's import-call results — same pressure class, same site, same message.

Rides v0.11.40 with #325 — the acceptance release unblocks gale's k_mutex_unlock lane (native ref 124 cyc, staged for same-day silicon).

🤖 Generated with Claude Code

… no callee-saved demand (#326)

gale's dissolved z_impl_k_mutex_unlock stopped compiling on v0.11.36: the
#311 call-result pair tagging legitimately keeps i64 pairs live across the
surrounding code, and when a later call's argument marshalling contains a
genuine register cycle, emit_arg_moves demanded a free callee-saved cycle
scratch (free_callee_saved -> Err) with R4-R8 all pinned — an exhaustion
class the 3b-lite retry ladder never matched.

Fix: emit_arg_moves now builds the same move set and hands it to the
v0.11.38 parallel-move resolver (synth_synthesis::parallel_move), which
breaks cycles WITHOUT a register when none is free: one SpillState slot,
lowered as `str rX, [sp, #slot]; mov...; ldr rY, [sp, #slot]` (slot freed
after the sequence). A free callee-saved register, when one exists, is
still passed as the scratch candidate and produces pure MOVs.

Two latent bugs die with the old emitter:
  * its cycle-break was WRONG CODE — it parked cycle_src but clobbered
    cycle_dst whose old value the next move still needed (a 2-swap left
    arg1 = arg0's value; verified by simulation). The RISC-V copy in
    synth-backend-riscv does NOT share the bug (it defers through scratch
    correctly) and is untouched.
  * free_callee_saved could hand back a register that WAS an arg source
    (args are popped before the scratch query); the resolver filters its
    scratch against the move set instead.

Bit-identity: the resolver's phase 1 now pops the lowest-destination ready
move first (BTreeSet), which is exactly the legacy scan order for the
ascending-destination arg lists — acyclic marshals (every function that
compiled on main) emit identical bytes. Cycle marshals previously either
Err'd (this bug) or miscompiled (above), so no correct output changes.
7 fixtures sha256-identical vs origin/main (control_step, flight_seam,
flight_seam_flat, high_pressure_i32/i64, u64_unpack, u64_unpack_inlined);
the 3 frozen differentials PASS (13/13, 0x07FDF307 x2).

i32-only edge: a cycle needing the slot in a function whose first pass
reserved NO spill area fails with the ladder-recoverable exhaustion Err
(SpillState.area_reserved, mirrored from compute_local_layout) instead of
silently aliasing the param-backing slots; the backend retry reserves the
area and the resolver then succeeds.

Repro: scripts/repro/mutex_pressure.wat — three live i64 pairs pin
(r3,r4)/(r5,r6)/(r7,r8) across two calls, param reload lands r1, call
result lands r0, swap2(param, result) is a genuine r0/r1 swap. On
v0.11.36..39: the exact #326 Err. After this fix: compiles, marshal is
`str r0,[sp,#0x18]; mov r0,r1; ldr r1,[sp,#0x18]; bl`, and
mutex_pressure_differential.py (wasmtime vs unicorn, BL relocs resolved,
order-sensitive 2a-b callee) passes 7/7.

Tests: 387 lib (+5: swap-under-saturation spills-not-errs, with-scratch
register path, acyclic-saturated plain MOVs, no-area ladder Err, resolver
lowest-dst-first order pin); workspace green; clippy/fmt clean.

Honest bounds: >4 args / i64 args are still outside emit_arg_moves' scope
(pre-existing); the resolver slot comes from the 8-slot spill pool — pool
exhaustion mid-marshal remains a hard Err (same class as the #320 bound).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.78673% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/instruction_selector.rs 93.52% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit b53c807 into main Jun 11, 2026
14 checks passed
@avrabe avrabe deleted the fix/326-arg-move-resolver branch June 11, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant