fix(selector): arg-move cycles via the parallel-move resolver — exhaustion gone AND a latent swap miscompile killed (#326)#327
Merged
Conversation
… no callee-saved demand (#326) gale's dissolved z_impl_k_mutex_unlock stopped compiling on v0.11.36: the #311 call-result pair tagging legitimately keeps i64 pairs live across the surrounding code, and when a later call's argument marshalling contains a genuine register cycle, emit_arg_moves demanded a free callee-saved cycle scratch (free_callee_saved -> Err) with R4-R8 all pinned — an exhaustion class the 3b-lite retry ladder never matched. Fix: emit_arg_moves now builds the same move set and hands it to the v0.11.38 parallel-move resolver (synth_synthesis::parallel_move), which breaks cycles WITHOUT a register when none is free: one SpillState slot, lowered as `str rX, [sp, #slot]; mov...; ldr rY, [sp, #slot]` (slot freed after the sequence). A free callee-saved register, when one exists, is still passed as the scratch candidate and produces pure MOVs. Two latent bugs die with the old emitter: * its cycle-break was WRONG CODE — it parked cycle_src but clobbered cycle_dst whose old value the next move still needed (a 2-swap left arg1 = arg0's value; verified by simulation). The RISC-V copy in synth-backend-riscv does NOT share the bug (it defers through scratch correctly) and is untouched. * free_callee_saved could hand back a register that WAS an arg source (args are popped before the scratch query); the resolver filters its scratch against the move set instead. Bit-identity: the resolver's phase 1 now pops the lowest-destination ready move first (BTreeSet), which is exactly the legacy scan order for the ascending-destination arg lists — acyclic marshals (every function that compiled on main) emit identical bytes. Cycle marshals previously either Err'd (this bug) or miscompiled (above), so no correct output changes. 7 fixtures sha256-identical vs origin/main (control_step, flight_seam, flight_seam_flat, high_pressure_i32/i64, u64_unpack, u64_unpack_inlined); the 3 frozen differentials PASS (13/13, 0x07FDF307 x2). i32-only edge: a cycle needing the slot in a function whose first pass reserved NO spill area fails with the ladder-recoverable exhaustion Err (SpillState.area_reserved, mirrored from compute_local_layout) instead of silently aliasing the param-backing slots; the backend retry reserves the area and the resolver then succeeds. Repro: scripts/repro/mutex_pressure.wat — three live i64 pairs pin (r3,r4)/(r5,r6)/(r7,r8) across two calls, param reload lands r1, call result lands r0, swap2(param, result) is a genuine r0/r1 swap. On v0.11.36..39: the exact #326 Err. After this fix: compiles, marshal is `str r0,[sp,#0x18]; mov r0,r1; ldr r1,[sp,#0x18]; bl`, and mutex_pressure_differential.py (wasmtime vs unicorn, BL relocs resolved, order-sensitive 2a-b callee) passes 7/7. Tests: 387 lib (+5: swap-under-saturation spills-not-errs, with-scratch register path, acyclic-saturated plain MOVs, no-area ladder Err, resolver lowest-dst-first order pin); workspace green; clippy/fmt clean. Honest bounds: >4 args / i64 args are still outside emit_arg_moves' scope (pre-existing); the resolver slot comes from the 8-slot spill pool — pool exhaustion mid-marshal remains a hard Err (same class as the #320 bound). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
gale's #326, root-caused two layers deep
The exhaustion site is
emit_arg_moves' cycle-breaking scratch: with #311's (correct) pair tagging saturating R4–R8, a genuine arg cycle had nowhere to park and theErrpropagated. The principled fix is wiring VCR-RA-004's resolver into its designed first consumer: cycles now break via a register scratch when one is free, else a stack-scratch cell — the demand for a callee-saved register is gone structurally.And the investigation found the old cycle-breaker was wrong-code:
emit_parallel_moveparked the cycle source but clobbered the next move's still-needed value — a 2-swap producedR0=v1, R1=v1(verified by simulation). Latent because marshal cycles are rare; killed by the replacement. (The RISC-V copy does not share the bug — checked, untouched.)Evidence
scripts/repro/mutex_pressure.watreproduces gale's exact error message on main, first try (three live i64 pairs + a param/result swap) → compiles + 7/7 unicorn-vs-wasmtime on the branch (marshal visible asstr/mov/ldrcapstone-verified).free_callee_savedcould return an arg source (resolver filters against the move set); the no-spill-area i32 cycle edge now returns the ladder-recoverableErrinstead of aliasing frame slots.Honest limits
Resolver slot draws from the 8-slot pool (pool exhaustion stays a clean
Err); >4/i64 args remain the pre-existing #195 bound; the repro pins pairs via i64 constants rather than gale's import-call results — same pressure class, same site, same message.Rides v0.11.40 with #325 — the acceptance release unblocks gale's
k_mutex_unlocklane (native ref 124 cyc, staged for same-day silicon).🤖 Generated with Claude Code