Skip to content

fix: make Base onboarding work — chain id, register deploy gas, worker RPC rate-limit + PublicNode#311

Open
hanwencheng wants to merge 6 commits into
mainfrom
claude/naughty-benz-f1ace0
Open

fix: make Base onboarding work — chain id, register deploy gas, worker RPC rate-limit + PublicNode#311
hanwencheng wants to merge 6 commits into
mainfrom
claude/naughty-benz-f1ace0

Conversation

@hanwencheng

@hanwencheng hanwencheng commented Jun 16, 2026

Copy link
Copy Markdown
Member

Independent bugs blocked Base web-app onboarding, peeled in order via live testing (each fix revealed the next). Deploy together: setup-broker-host.sh --base --ref main after merge (or --ref claude/naughty-benz-f1ace0 to test pre-merge).

Bug 1 — bundler on the wrong chain id (44035d4)

Bundler defaulted chain_id to Heima 212013 when AGENTKEYS_CHAIN_ID was unset → signed Base handleOps for the wrong chain → never broadcast. Fix: resolve chain_id from the compiled chain profile keyed by AGENTKEYS_CHAIN, no silent default. + un-swallow the submit reason (broker tracing::warn + daemon surfaces the real body) — which made the whole diagnosis possible.

Bug 2 — under-sized register deploy gas (fcccbf1)

Register reverted AA13: the D6 register deploys the P256Account inside the UserOp (~1.3M gas) within verificationGasLimit, but base.json pinned 200k. Fix: verification_gas_limit 200k→1.6M; broker gas_fees now profile-driven; Heima byte-identical.

Bug 3 — chain-verify dies on the public-RPC rate limit (ba9901f + d5c3e94)

Reads (credentials/memory/config) 502'd on -32016 "over rate limit". ba9901f retries rate-limit errors (backstop). d5c3e94 = the real fix: the broker's AWS IP is throttled by free mainnet.base.org (measured from the host: 15/20 concurrent eth_calls rate-limited, vs 0/20 on PublicNode) → switch rpc.http to https://base-rpc.publicnode.com (already the profile's WSS; free, keyless; covers worker + broker + bundler).

Bug 4 — AA31, paymaster deposit below one-op prefund (9408903)

Once register lands once, the paymaster deposit drains below the next op's prefund → every later register reverts AA31 (deposit 0.0062 ETH < one-op prefund 0.0079 ETH). Two causes: the prefund (Σgas × maxFeePerGas) was 2× higher than Base needs (base fee ~0.005 gwei) — max_fee_gwei 2→1 halves it to 0.00395 ETH, fitting the existing deposit (no new ETH); and the funding targets were 0.001 ETH (below one prefund) — paymaster_deposit_wei 0.001→0.02, min 0.001→0.008, so the deposit ceremony keeps a multi-op buffer.

Debug runbook (d70fd5f)

docs/operator-runbook-base-onboarding.md — error→cause→fix table, host log commands, on-chain forensics. Profile-anchored.

Verification

cargo test bundler/core/broker/daemon/worker-creds green; clippy 0; fmt clean; contracts-sync ok. Live: register lands ✅; the AA31 fix verified by arithmetic (prefund 0.00395 < deposit 0.0062) — final live onboarding pending the redeploy.

🤖 Generated with Claude Code

The in-house bundler defaulted chain_id to Heima's 212013 when no
AGENTKEYS_CHAIN_ID[_<CHAIN>] env was set. setup-broker-host.sh never sets it, so
the Base broker host's bundler signed the outer handleOps EIP-155 tx for Heima
(212013) and broadcast it to the Base RPC, which rejects a tx signed for another
chain. The master-register UserOp therefore NEVER broadcast (submitter nonce
stayed 0) and the broker returned a fast, silent 502 ("handleOps did not
broadcast") -- onboarding step 4/5 failed and config/init cascaded to
device_not_active. Base-only: on Heima the 212013 default was accidentally right.

- bundler: resolve chain_id from the compiled chain profile keyed by
  AGENTKEYS_CHAIN (the SoT the broker already uses); explicit
  AGENTKEYS_CHAIN_ID[_<CHAIN>] still overrides for profileless chains. No silent
  Heima default -- an unresolved id now fails loud.
- broker: tracing::warn on the two accept/submit failure paths (bundler
  broadcast error + handleOps revert); they were returned without logging, so
  the broker journal was empty during the incident.
- daemon: surface the broker's real error body + status on register/submit
  instead of the generic "broker /v1/register/submit failed" fallback.
- setup-broker-host.sh: document why the bundler unit has no AGENTKEYS_CHAIN_ID
  line (it would duplicate the profile SoT -- the gap that hid this).
…(AA13/AA31)

The #278 D6 register deploys the P256Account inside the UserOp (initCode), and the
EntryPoint runs that ~1.3M deploy within verificationGasLimit. Base's profile pinned
verificationGasLimit=200k -- correct for the cheap RIP-7212 precompile verify on
accept/scope/revoke (which DON'T deploy), but the register's account deploy OOG'd at
200k -> AA13 "initCode failed or OOG". Separately, gas_fees was hardcoded to
DEF_MAX_FEE=40 gwei (Heima's base-fee scale), ignoring the profile, so once AA13
cleared the prefund reserve (~0.16 ETH at 40 gwei) blew past the paymaster's 0.010 ETH
deposit -> AA31.

- base.json gas.verification_gas_limit 200000 -> 1_600_000 (covers the ~1.3M deploy,
  confirmed via eth_estimateGas of factory.createAccount on Base mainnet).
- base.json gas.max_fee_gwei 50 -> 2 (Base base fee ~0.005 gwei; keeps the prefund at
  ~0.0079 ETH, under the existing 0.010 ETH deposit -- no re-funding needed).
- broker accept.rs: gas_fees now read from the resolved chain profile (gwei->wei),
  DEF fallback when a profile carries no usable fee. Was hardcoded DEF_MAX_FEE.
- heima.json gas.max_fee_gwei 1000 -> 40 = the existing DEF, so Heima UserOp fees are
  byte-identical after the wiring (no behavior change).

Verified: core 194 + broker tests green, clippy 0, fmt clean, contracts-sync ok.
@hanwencheng hanwencheng changed the title fix(bundler): derive chain id from chain profile, not the Heima default (Base register never broadcast) fix: make Base master register land — wrong bundler chain id + under-sized deploy gas Jun 16, 2026
…nfig 502)

After the master register lands on Base, config/init's worker chain re-check hit the
free public RPC's rate limit (-32016 "over rate limit") and failed -- because eth_call
treated EVERY JSON-RPC error as a deterministic revert and returned it WITHOUT retry.
A rate limit is transient, not a revert: back off + retry it like a 5xx
(is_rate_limit_error covers -32016 / -32005 / -32029 + rate-limit messages). ATTEMPTS
4 -> 5 for burst headroom. A dedicated (non-throttled) Base RPC is the systemic fix;
this keeps a public-endpoint burst from failing onboarding. +unit test.
@hanwencheng hanwencheng changed the title fix: make Base master register land — wrong bundler chain id + under-sized deploy gas fix: make Base onboarding work — bundler chain id, register deploy gas, worker RPC rate-limit Jun 16, 2026
Operator cheat-sheet for the onboarding/accept failure classes surfaced while landing
PR #311: an error->cause->fix table (wrong chain id / AA13 deploy-gas / AA31 prefund /
-32016 rate-limit / device_not_active), the host log commands (the broker submit-relay
tracing PR #311 added, bundler eth_chainId, worker journal, nginx), and read-only
on-chain forensics (account code, sponsor nonce, paymaster deposit, reverted-tx
gasUsed%). Profile-anchored -- no literal addresses (passes the doc-literal gate).
…hrottles the broker IP)

The broker's AWS egress IP is hard-throttled by the free mainnet.base.org: from the
host, 15/20 concurrent eth_calls return -32016 "over rate limit" (vs 0/20 on PublicNode
and 0/25 from a fresh laptop IP). The worker retry (#ba9901f) can't ride out a 75%
rejection, so onboarding's read burst (credentials + memory + config, each a 3-read
chain verify) all 502'd with "eth_call failed after 5 attempts: rate-limited".

Switch rpc.http to https://base-rpc.publicnode.com -- already the profile's WSS endpoint,
free + keyless, full headroom from the broker IP. setup-broker-host.sh derives
AGENTKEYS_CHAIN_RPC_HTTP from this, so the redeploy points workers + broker + bundler at
it in one shot. The retry stays as the burst backstop; a keyed RPC is the long-term
hardening if PublicNode ever throttles.
@hanwencheng hanwencheng changed the title fix: make Base onboarding work — bundler chain id, register deploy gas, worker RPC rate-limit fix: make Base onboarding work — chain id, register deploy gas, worker RPC rate-limit + PublicNode Jun 16, 2026
…r deposit (AA31)

After register lands once, the paymaster's EntryPoint deposit drains below the NEXT
op's prefund -> AA31 "deposit below threshold" on every subsequent register (observed
live: deposit 0.0062 ETH < one-op prefund 0.0079 ETH; the first register dropped it
from ~0.010). Two causes:
- the one-op prefund (Σgas × maxFeePerGas) was 2x higher than Base needs -- Base base
  fee is ~0.005 gwei, so maxFeePerGas 2->1 halves the prefund to 0.00395 ETH, which
  fits the EXISTING deposit (no new ETH -- immediate unblock).
- base.json funding targets were 0.001 ETH, BELOW a single op's prefund, so the deposit
  was never sized for the workload: paymaster_deposit_wei 0.001->0.02,
  paymaster_min_deposit_wei 0.001->0.008 (the deposit ceremony then keeps a multi-op
  buffer when run with a funded deployer).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant