feat: wire autobahn state sync with naive provider (CON-252)#3304
Draft
wen-coding wants to merge 1 commit intowen/fix_autobahn_restart_no_statesyncfrom
Draft
feat: wire autobahn state sync with naive provider (CON-252)#3304wen-coding wants to merge 1 commit intowen/fix_autobahn_restart_no_statesyncfrom
wen-coding wants to merge 1 commit intowen/fix_autobahn_restart_no_statesyncfrom
Conversation
Extends the autobahn restart fix (PR #3300) to cover the "new validator joining" and "disk-wiped node" recovery paths via CometBFT state sync. - node.go: stop force-disabling stateSync in giga mode. Gate on a direct app.Info() check since CometBFT's state.LastBlockHeight never advances under autobahn. postSyncHook is a no-op in giga mode (no block-sync reactor to hand control to). ssReactor wired only when stateSync is actually enabled. - statesync/reactor.go: new optional stateProviderFactory param, used in place of RPC/P2P selection when set. node.go injects it in giga mode. - statesync/giga_stateprovider.go (new): naive provider that returns empty AppHash (opt out of pre-verification), minimal Commit, and an sm.State built from GenesisDoc + static committee. Peers are trusted optimistically for this PR — see TODO(autobahn-snapshot-proof). - statesync/syncer.go: skip the post-restore AppHash check when the provider returned an empty trustedAppHash. Vanilla RPC/P2P providers always return non-empty, so their behaviour is unchanged. - autobahn/data/state.go: expose SkipTo public method wrapping internal skipTo, used post-state-sync to align data cursors with the app height. - p2p/giga_router.go: three-way branch in runExecute: fresh (InitChain), state-sync restart (SkipTo, no PushAppHash), plain restart (PushAppHash). Tests: GigaStateProvider unit tests (5), SkipTo unit tests (2), syncer empty-trustedAppHash skip test (2). Non-giga behaviour preserved. Follow-up: integration test for JoinFromStateSync (wipe node, restart with statesync.enable=true, verify catchup) is deferred to a subsequent PR along with the snapshot-proof mechanism (peers bundling AppQC in snapshot metadata) that closes the loop on malicious-snapshot retry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## wen/fix_autobahn_restart_no_statesync #3304 +/- ##
=======================================================================
Coverage 58.31% 58.31%
=======================================================================
Files 2085 2086 +1
Lines 209065 209185 +120
=======================================================================
+ Hits 121907 121993 +86
- Misses 78366 78398 +32
- Partials 8792 8794 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
pompon0
reviewed
Apr 23, 2026
| // State just after NewState, before any Push*). The caller is responsible | ||
| // for ensuring no concurrent Push* races with this call — giga state sync | ||
| // uses it between data.NewState and GigaRouter.Run. | ||
| func (s *State) SkipTo(n types.GlobalBlockNumber) { |
Contributor
There was a problem hiding this comment.
IMO this method should do a proper pruning instead, instead of imposing on the caller that State is empty.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends #3300 to cover the new-validator joining and disk-wiped node recovery paths via CometBFT state sync. A joiner state-syncs the app to some height
M, thenrunExecuteresumes atM+1pulling subsequent blocks from giga peers.Changes
node.go— stop force-disablingstateSyncin giga mode. Gate on a directapp.Info()check since CometBFT'sstate.LastBlockHeightnever advances under autobahn.postSyncHookis a no-op in giga mode (no block-sync reactor to hand control to).ssReactorwired only whenstateSyncis actually enabled.statesync/reactor.go— new optionalstateProviderFactoryparam toNewReactor, used in place of the RPC/P2P selection when set. Non-giga callers passnil; existing behaviour preserved.statesync/giga_stateprovider.go(new) — naive provider: emptyAppHash(opts out of pre-verification), minimalCommit,sm.StatefromGenesisDoc+ static committee.statesync/syncer.go— skip the post-restoreAppHashcomparison whentrustedAppHashis empty. Vanilla providers always return non-empty, so their behaviour is unchanged.autobahn/data/state.go— expose publicSkipTowrapping the existing internalskipTo. Used post-state-sync to align data cursors so peer-streamed blocks fromM+1onward insert correctly.p2p/giga_router.go— new state-sync branch inrunExecute(last > 0 && NextBlock() <= last):SkipTo(last+1), noPushAppHash. Fresh / plain-restart branches unchanged from Support autobahn node restart by skipping CometBFT handshaker (CON-252) #3300.New startup path (case E, added to #3300's A–D)
shouldHandshakestateSyncInitChainbyFinalizeBlockdeliverStatelast>0skips it)Trust model — naive for v1, AppQC-bundled for v2
This PR trusts the snapshot producer optimistically:
stateProvider.AppHashreturns empty,syncer.verifyAppskips the comparison, and a corrupt snapshot wedges the joiner until external restart-with-wipe. The vanillaSyncAnyretry loop only helps against honest-but-unavailable peers, not malicious ones — a malicious peer can loop-trap a joiner.The planned v2 mechanism (tracked by
TODO(autobahn-snapshot-proof)ingiga_stateprovider.go):Why not in this PR:
avail.latestAppQCis a singleutils.Option, not a queue). Historical AppQCs aren't persisted.AppQC@Mforms after the app commits blockM(committee votes are asynchronous), so the snapshot taken at Commit(M) doesn't yet have its anchor.Until v2 lands: operators should only point joiners at known-honest peers; a bad snapshot is an ops issue, not a safety issue (the cluster is unaffected, only the joiner is stuck).
Test plan
go build ./...clean;gofmt -s -l .cleango test ./internal/statesync/... ./internal/p2p/... ./internal/autobahn/... ./node/... ./internal/consensus/...— all greenTestGigaStateProvider_*, 2 ×TestState_SkipTo_*, 2 ×TestSyncer_verifyApp_EmptyTrustedAppHashSkipsCheckNot yet:
JoinFromStateSyncintegration subtest — wipe a node's data dir,statesync.enable = true, restart, verify catchup.snapshot-interval=100already configured in the docker cluster. Deferred to land together with v2 (snapshot-proof), since integration-testing the naive-trust path without proper verification is of limited value.Scope / Follow-ups
TODO(autobahn-snapshot-proof)) — bundleAppQCinsnapshot.Metadata, verify cryptographically, enable in-loop retry.JoinFromStateSync) — lands with v2.TODO(epoch)ingiga_stateprovider.go— committee / validator set derivation moves to epoch lookup once autobahn supports dynamic committees.🤖 Generated with Claude Code