fix: make Base onboarding work — chain id, register deploy gas, worker RPC rate-limit + PublicNode#311
Open
hanwencheng wants to merge 6 commits into
Open
fix: make Base onboarding work — chain id, register deploy gas, worker RPC rate-limit + PublicNode#311hanwencheng wants to merge 6 commits into
hanwencheng wants to merge 6 commits into
Conversation
The in-house bundler defaulted chain_id to Heima's 212013 when no
AGENTKEYS_CHAIN_ID[_<CHAIN>] env was set. setup-broker-host.sh never sets it, so
the Base broker host's bundler signed the outer handleOps EIP-155 tx for Heima
(212013) and broadcast it to the Base RPC, which rejects a tx signed for another
chain. The master-register UserOp therefore NEVER broadcast (submitter nonce
stayed 0) and the broker returned a fast, silent 502 ("handleOps did not
broadcast") -- onboarding step 4/5 failed and config/init cascaded to
device_not_active. Base-only: on Heima the 212013 default was accidentally right.
- bundler: resolve chain_id from the compiled chain profile keyed by
AGENTKEYS_CHAIN (the SoT the broker already uses); explicit
AGENTKEYS_CHAIN_ID[_<CHAIN>] still overrides for profileless chains. No silent
Heima default -- an unresolved id now fails loud.
- broker: tracing::warn on the two accept/submit failure paths (bundler
broadcast error + handleOps revert); they were returned without logging, so
the broker journal was empty during the incident.
- daemon: surface the broker's real error body + status on register/submit
instead of the generic "broker /v1/register/submit failed" fallback.
- setup-broker-host.sh: document why the bundler unit has no AGENTKEYS_CHAIN_ID
line (it would duplicate the profile SoT -- the gap that hid this).
…(AA13/AA31) The #278 D6 register deploys the P256Account inside the UserOp (initCode), and the EntryPoint runs that ~1.3M deploy within verificationGasLimit. Base's profile pinned verificationGasLimit=200k -- correct for the cheap RIP-7212 precompile verify on accept/scope/revoke (which DON'T deploy), but the register's account deploy OOG'd at 200k -> AA13 "initCode failed or OOG". Separately, gas_fees was hardcoded to DEF_MAX_FEE=40 gwei (Heima's base-fee scale), ignoring the profile, so once AA13 cleared the prefund reserve (~0.16 ETH at 40 gwei) blew past the paymaster's 0.010 ETH deposit -> AA31. - base.json gas.verification_gas_limit 200000 -> 1_600_000 (covers the ~1.3M deploy, confirmed via eth_estimateGas of factory.createAccount on Base mainnet). - base.json gas.max_fee_gwei 50 -> 2 (Base base fee ~0.005 gwei; keeps the prefund at ~0.0079 ETH, under the existing 0.010 ETH deposit -- no re-funding needed). - broker accept.rs: gas_fees now read from the resolved chain profile (gwei->wei), DEF fallback when a profile carries no usable fee. Was hardcoded DEF_MAX_FEE. - heima.json gas.max_fee_gwei 1000 -> 40 = the existing DEF, so Heima UserOp fees are byte-identical after the wiring (no behavior change). Verified: core 194 + broker tests green, clippy 0, fmt clean, contracts-sync ok.
…nfig 502) After the master register lands on Base, config/init's worker chain re-check hit the free public RPC's rate limit (-32016 "over rate limit") and failed -- because eth_call treated EVERY JSON-RPC error as a deterministic revert and returned it WITHOUT retry. A rate limit is transient, not a revert: back off + retry it like a 5xx (is_rate_limit_error covers -32016 / -32005 / -32029 + rate-limit messages). ATTEMPTS 4 -> 5 for burst headroom. A dedicated (non-throttled) Base RPC is the systemic fix; this keeps a public-endpoint burst from failing onboarding. +unit test.
Operator cheat-sheet for the onboarding/accept failure classes surfaced while landing PR #311: an error->cause->fix table (wrong chain id / AA13 deploy-gas / AA31 prefund / -32016 rate-limit / device_not_active), the host log commands (the broker submit-relay tracing PR #311 added, bundler eth_chainId, worker journal, nginx), and read-only on-chain forensics (account code, sponsor nonce, paymaster deposit, reverted-tx gasUsed%). Profile-anchored -- no literal addresses (passes the doc-literal gate).
…hrottles the broker IP) The broker's AWS egress IP is hard-throttled by the free mainnet.base.org: from the host, 15/20 concurrent eth_calls return -32016 "over rate limit" (vs 0/20 on PublicNode and 0/25 from a fresh laptop IP). The worker retry (#ba9901f) can't ride out a 75% rejection, so onboarding's read burst (credentials + memory + config, each a 3-read chain verify) all 502'd with "eth_call failed after 5 attempts: rate-limited". Switch rpc.http to https://base-rpc.publicnode.com -- already the profile's WSS endpoint, free + keyless, full headroom from the broker IP. setup-broker-host.sh derives AGENTKEYS_CHAIN_RPC_HTTP from this, so the redeploy points workers + broker + bundler at it in one shot. The retry stays as the burst backstop; a keyed RPC is the long-term hardening if PublicNode ever throttles.
…r deposit (AA31) After register lands once, the paymaster's EntryPoint deposit drains below the NEXT op's prefund -> AA31 "deposit below threshold" on every subsequent register (observed live: deposit 0.0062 ETH < one-op prefund 0.0079 ETH; the first register dropped it from ~0.010). Two causes: - the one-op prefund (Σgas × maxFeePerGas) was 2x higher than Base needs -- Base base fee is ~0.005 gwei, so maxFeePerGas 2->1 halves the prefund to 0.00395 ETH, which fits the EXISTING deposit (no new ETH -- immediate unblock). - base.json funding targets were 0.001 ETH, BELOW a single op's prefund, so the deposit was never sized for the workload: paymaster_deposit_wei 0.001->0.02, paymaster_min_deposit_wei 0.001->0.008 (the deposit ceremony then keeps a multi-op buffer when run with a funded deployer).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Independent bugs blocked Base web-app onboarding, peeled in order via live testing (each fix revealed the next). Deploy together:
setup-broker-host.sh --base --ref mainafter merge (or--ref claude/naughty-benz-f1ace0to test pre-merge).Bug 1 — bundler on the wrong chain id (
44035d4)Bundler defaulted
chain_idto Heima212013whenAGENTKEYS_CHAIN_IDwas unset → signed BasehandleOpsfor the wrong chain → never broadcast. Fix: resolvechain_idfrom the compiled chain profile keyed byAGENTKEYS_CHAIN, no silent default. + un-swallow the submit reason (brokertracing::warn+ daemon surfaces the real body) — which made the whole diagnosis possible.Bug 2 — under-sized register deploy gas (
fcccbf1)Register reverted
AA13: the D6 register deploys the P256Account inside the UserOp (~1.3M gas) withinverificationGasLimit, butbase.jsonpinned200k. Fix:verification_gas_limit200k→1.6M; brokergas_feesnow profile-driven; Heima byte-identical.Bug 3 — chain-verify dies on the public-RPC rate limit (
ba9901f+d5c3e94)Reads (
credentials/memory/config) 502'd on-32016 "over rate limit".ba9901fretries rate-limit errors (backstop).d5c3e94= the real fix: the broker's AWS IP is throttled by freemainnet.base.org(measured from the host: 15/20 concurrenteth_calls rate-limited, vs 0/20 on PublicNode) → switchrpc.httptohttps://base-rpc.publicnode.com(already the profile's WSS; free, keyless; covers worker + broker + bundler).Bug 4 — AA31, paymaster deposit below one-op prefund (
9408903)Once register lands once, the paymaster deposit drains below the next op's prefund → every later register reverts
AA31(deposit 0.0062 ETH < one-op prefund 0.0079 ETH). Two causes: the prefund (Σgas × maxFeePerGas) was 2× higher than Base needs (base fee ~0.005 gwei) —max_fee_gwei2→1 halves it to 0.00395 ETH, fitting the existing deposit (no new ETH); and the funding targets were0.001 ETH(below one prefund) —paymaster_deposit_wei0.001→0.02,min0.001→0.008, so the deposit ceremony keeps a multi-op buffer.Debug runbook (
d70fd5f)docs/operator-runbook-base-onboarding.md— error→cause→fix table, host log commands, on-chain forensics. Profile-anchored.Verification
cargo testbundler/core/broker/daemon/worker-creds green; clippy 0; fmt clean; contracts-sync ok. Live: register lands ✅; the AA31 fix verified by arithmetic (prefund 0.00395 < deposit 0.0062) — final live onboarding pending the redeploy.🤖 Generated with Claude Code