You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for the general codegen-optimization effort (not a gale-specific tweak — improvements must help every --relocatable compile). gale to post on-target statistics + regression feedback here; I post research/findings + land general optimizations validated against the wasmtime-vs-unicorn oracle (scripts/repro/wake_path_differential.py).
~35% of instructions are loads/stores/register-moves (memory traffic + shuffling)
18% are SP-relative frame spill/reload (ldr/str [sp,#k])
0adjacent redundancies (str[k];ldr[k], ldr[k];ldr[k], mov rX,rX) → the waste is non-local (a local/param reloaded on each local.get; values re-materialized across the function), so a naive adjacent-peephole won't help — it needs a small local dataflow pass.
Root cause
--relocatable routes to select_with_stack (the direct selector, #197) which bypasses the synth-opt IR optimizer (CSE/DCE/const-fold/regalloc). It emits straight-line stack-machine code: every operand materialized to a register, every local.get a fresh reload, param frame-backing (#204) adds a reload per read.
General optimization options (ranked by leverage vs risk)
Local redundant-memory elimination on the selector output — track which register currently holds each frame slot; rewrite a reload to a mov (or drop it) when the value is still live in a register; drop dead stores. General, local, low-risk, oracle-checkable. (Recommended first.)
Keep multi-read locals/params in registers in select_with_stack (load once, reuse) instead of reload-per-local.get. Higher payoff, touches the allocator.
Post the on-target measurement (cycles/instructions for the hot functions, ideally a per-function or per-region breakdown) so I optimize what's actually hot, not what merely looks redundant.
Flag any correctness regression from an optimization immediately (the oracle harness is my pre-merge guard, but on-hardware is ground truth).
Each optimization ships as a normal bugfix-cadence release with a falsification statement; correctness is gated by the differential oracle + full suite + the #193/#186 fuzz.
Tracking issue for the general codegen-optimization effort (not a gale-specific tweak — improvements must help every
--relocatablecompile). gale to post on-target statistics + regression feedback here; I post research/findings + land general optimizations validated against the wasmtime-vs-unicorn oracle (scripts/repro/wake_path_differential.py).Baseline (gale's
z_impl_k_sem_give,--target cortex-m4 --relocatable, v0.11.15).text: 219 instructions / 694 bytesldr/str [sp,#k])str[k];ldr[k],ldr[k];ldr[k],mov rX,rX) → the waste is non-local (a local/param reloaded on eachlocal.get; values re-materialized across the function), so a naive adjacent-peephole won't help — it needs a small local dataflow pass.Root cause
--relocatableroutes toselect_with_stack(the direct selector, #197) which bypasses thesynth-optIR optimizer (CSE/DCE/const-fold/regalloc). It emits straight-line stack-machine code: every operand materialized to a register, everylocal.geta fresh reload, param frame-backing (#204) adds a reload per read.General optimization options (ranked by leverage vs risk)
mov(or drop it) when the value is still live in a register; drop dead stores. General, local, low-risk, oracle-checkable. (Recommended first.)select_with_stack(load once, reuse) instead of reload-per-local.get. Higher payoff, touches the allocator.--relocatablethroughsynth-opt(the big lever — reuse the real optimizer) once its ABI is made relocatable-correct (the reason v0.11.9: pointer param live across calls not preserved in a complex/register-heavy fn (sem read from 0x20000100+clobbered r0) — minimal cases pass (follow-up to #188) #197 bypassed it: absolute linmem base + non-preserving calls). Highest payoff, highest risk.Asks for gale
Each optimization ships as a normal bugfix-cadence release with a falsification statement; correctness is gated by the differential oracle + full suite + the #193/#186 fuzz.