Skip to content

fix: align fork-choice finality and aggregate validation with leanSpec#437

Merged
pablodeymo merged 5 commits into
mainfrom
fix/forkchoice-spectest-failures
Jun 16, 2026
Merged

fix: align fork-choice finality and aggregate validation with leanSpec#437
pablodeymo merged 5 commits into
mainfrom
fix/forkchoice-spectest-failures

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

Summary

CI (make test) is failing against the latest leanSpec fixtures. This fixes the 3 failing fork-choice / state-transition spec-test fixtures.

Fixture Root cause Fix
test_losing_fork_higher_finalized_does_not_latch finalized was latched monotonically (max-seen) in on_block_core Derive finalized from the chosen head's post-state in update_head and pin it to the head's own ancestry (matches leanSpec update_head); keep justified monotonic via advance_to
test_gossip_aggregated_attestation_empty_participants_rejected An aggregate naming no participants slipped through the mocked-proof path Reject empty participants (EmptyAggregationBits / EMPTY_AGGREGATION_BITS) before bounds-check and verification, mirroring the spec's to_validator_indices()
test_genesis_registry_holds_exact_validator_entries The stf runner's PostState (deny_unknown_fields) rejected the new post-state validators registry check Parse validators and compare each entry's index and public keys

Why the finality change matters

Finality is not a monotonic max over everything ever seen. The spec derives the finalized checkpoint from the chosen head's post-state and pins it to the head's ancestry. When fork choice switches to a fork that finalized less than a now-losing fork, finalized must follow the head downward; otherwise a losing fork's finalization persists on a chain that never contained it.

Notes

  • Only the first two failures appeared in the most recent CI run (forkchoice spectests). The third (stf genesis) comes from an even-newer fixtures drop and is a test-runner schema gap, not a client bug; fixed here so the next CI run stays green.

Testing

  • cargo test --workspace --release — all green
  • make fmt, make lint (-D warnings) — clean

Three spec-test fixtures from the latest leanSpec release failed:

- test_losing_fork_higher_finalized_does_not_latch: finalized was latched
  monotonically in on_block_core. The spec derives the finalized checkpoint
  from the chosen head's post-state in update_head and pins it to the head's
  own ancestry, so a fork switch can move finalization downward. Move that
  computation into update_head; keep justified monotonic (advance_to) as the
  spec does.

- test_gossip_aggregated_attestation_empty_participants_rejected: an aggregate
  naming no participants slipped through the mocked-proof path. Reject empty
  participants (EMPTY_AGGREGATION_BITS) before bounds-check and verification,
  matching the spec's to_validator_indices().

- test_genesis_registry_holds_exact_validator_entries: the stf runner rejected
  the new post-state 'validators' registry check (deny_unknown_fields). Parse
  it and compare each validator's index and public keys.
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

  1. crates/blockchain/src/store.rs and crates/storage/src/store.rs introduce a consensus-sensitive stale-finalized case. recompute_finalized() explicitly returns None when the finalized block is not stored locally, and update_checkpoints() interprets None as “keep the old finalized checkpoint”. After a head switch, that can leave latest_finalized pointing at a block that is not on the new head’s chain. That is not just metadata drift: crates/blockchain/src/lib.rs uses latest_finalized().slot as a hard reject for block import, so a stale finalized slot/root can cause valid blocks on the selected branch to be discarded. This path needs an explicit policy for unresolved finalized ancestry rather than silently reusing prior metadata.

  2. crates/blockchain/src/store.rs and crates/blockchain/src/store.rs add an O(distance(head, finalized)) parent walk to every update_head() call. That is now in the fork-choice hot path for block imports and slot-driven head recomputation. In long non-finalizing stretches, this turns head updates into repeated ancestry scans and extra DB reads. Since the post-state already carries latest_finalized, I would avoid rescanning on every update or at least cache the resolved checkpoint.

No other clear correctness problems stood out in the diff. The empty-aggregation-bits check is a good validation hardening step, and the extra state-transition validator assertions improve spec-test coverage.

I could not run cargo test here because the toolchain tried to write under /home/runner/.rustup, which is read-only in this environment.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Here is my full code review of PR #437:


Code Review: PR #437 — fix: align fork-choice finality and aggregate validation with leanSpec

Overview

Three focused spec-test fixes: non-monotonic finality tracking, empty-participant aggregate rejection, and a test-runner schema gap for the validator registry. All three are correctness fixes, not refactors. The changes are narrow and well-scoped.


Fix 1: recompute_finalized and non-monotonic finality (store.rs)

Correctness: correct.

The core semantic change — deriving finalized from the head's post-state rather than latching the max-seen checkpoint — is sound. I verified the key invariants:

  • update_checkpoints at storage/src/store.rs:764 treats None finalized as "leave unchanged" (if let Some(finalized)). So when recompute_finalized returns None (checkpoint-sync anchor with missing pre-anchor history), the store's finalized checkpoint is preserved rather than cleared. This is the correct fallback.

  • prune_live_chain (the lightweight pruning triggered by update_checkpoints) operates on the LiveChain table only, not on BlockHeaders. Since recompute_finalized walks back via get_block_header (reading the BlockHeaders table, which is pruned lazily by prune_old_data), the ancestor-walk remains valid even after a prior finalization advance on a losing fork.

  • The pruning guard at storage/src/store.rs:776 (finalized.slot > old_finalized_slot) correctly skips pruning when finalization regresses. No data is lost when switching to a less-finalized fork.

One edge case worth noting in the loop:

// store.rs:112-117
let mut finalized_root = head;
while let Some(header) = store.get_block_header(&finalized_root) {
    if header.slot <= finalized_slot {
        break;
    }
    finalized_root = header.parent_root;
}

When finalized_slot falls in a slot gap (no block produced at that exact slot), the loop breaks at the first ancestor with slot < finalized_slot. The subsequent filter(|h| h.slot == finalized_slot) then correctly returns None. This is handled gracefully, but the doc comment (line 105-107) only mentions the checkpoint-sync case. A one-liner noting the slot-gap case would prevent future confusion:

// Returns `None` for checkpoint-sync anchors with missing pre-anchor history,
// and also when the finalized slot falls in a gap (no block at that exact slot,
// which is technically invalid state but handled conservatively).

Not a bug, just a documentation gap.

justified handling in update_head:

Passing justified: None at line 65 is correct — on_block_core already applies the monotonic advance_to logic when processing each block, so by the time update_head runs, the store's justified checkpoint is already at or above any fork's justified slot. Leaving it unchanged in update_head matches the leanSpec contract.

Performance: The loop is O(depth since last finalization). With 3SF-mini finalizing every few slots at most, the typical depth is < 10. No practical concern.


Fix 2: Empty aggregation bits rejection (store.rs:443-448)

Correctness: correct and properly placed.

// store.rs:449-452
if participant_indices.is_empty() {
    return Err(StoreError::EmptyAggregationBits);
}

Positioned before the bounds-check loop and before any signature verification, matching the spec's to_validator_indices() ordering. The error variant name and message ("Aggregated attestation references no participants") are clear and consistent with the rest of StoreError.

No issues.


Fix 3: Validator registry check in stf_spectests.rs

Correctness: correct.

The length equality check before indexing ensures no out-of-bounds panics at post_validators[i]. Comparing all three fields (index, attestation_pubkey, proposal_pubkey) is thorough.

One minor style point: expected_validator.clone().into() allocates a full Validator for each comparison just to access its fields. This is fine in test code, but if the FixtureValidator type exposes the fields directly, it would be slightly cleaner to compare without the allocation. Not worth changing unless there's a reason.

The validators field is Option<Container<Validator>>, so existing fixtures without this field are unaffected. Correct.


Summary

  • The finality fix is semantically correct and the invariants hold up under scrutiny. The recompute_finalized function handles all edge cases (slot gaps, checkpoint-sync, head == finalized block) through the filter + None fallback chain.
  • The empty-participant guard is a straightforward and correctly-placed correctness fix.
  • The test runner addition is clean and safe.

One minor suggestion: add a note in recompute_finalized's doc comment about the slot-gap case (line 101-107 of store.rs). Everything else looks good to merge.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes three failing leanSpec spec-test fixtures by aligning fork-choice finality tracking and aggregate-attestation validation with the spec. The finalized checkpoint is now derived from the chosen head's post-state ancestry on every update_head call (replacing the previous monotonic max-seen latch in on_block_core), and aggregated attestations with zero participants are rejected before bounds-checking.

  • store.rs: Removes the monotonic finalized latch from on_block_core; adds recompute_finalized to walk the head's ancestor chain to the finalized slot on each update_head invocation; adds EmptyAggregationBits guard to gossip-aggregate validation.
  • stf_spectests.rs / types.rs: Extends the STF test runner's PostState schema to accept and compare the new validators registry field from the latest fixtures, fixing a deny_unknown_fields parse failure.

Confidence Score: 3/5

Safe to merge for the spec-test fixtures; the skipped-slot handling in recompute_finalized warrants verification before deploying against a live chain where epoch-boundary blocks are occasionally missing.

The finality ancestry walk returns None — leaving the stored finalized checkpoint unchanged — whenever the block at the finalized slot is absent from the store. The comment attributes this only to checkpoint-sync anchors, but skipped slots at epoch boundaries are real in a live network. If that None path triggers while a stale wrong-fork finalized value is already stored, the client silently misreports finality.

crates/blockchain/src/store.rs — specifically the recompute_finalized function and its None return path

Important Files Changed

Filename Overview
crates/blockchain/src/store.rs Core fork-choice fix: moves finalized-checkpoint derivation from monotonic latching in on_block_core to a per-call ancestry walk in update_head; also adds EmptyAggregationBits guard. Logic is sound; one concern around the skipped-slot None path in recompute_finalized.
crates/blockchain/state_transition/tests/stf_spectests.rs Test-runner update to destructure and compare the new validators registry field; length check guards the indexed access correctly.
crates/blockchain/state_transition/tests/types.rs Schema extension adds optional validators field to PostState; deny_unknown_fields is now satisfied and the From impl in test-fixtures handles deserialization correctly.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Gossip
    participant on_block_core
    participant update_head
    participant recompute_finalized
    participant Store

    Gossip->>on_block_core: new block arrives
    on_block_core->>Store: "store block & state"
    note over on_block_core: advance justified (monotonic)<br/>finalized NOT latched here
    on_block_core->>update_head: trigger head update

    update_head->>Store: compute_lmd_ghost_head(latest_justified.root)
    Store-->>update_head: new_head
    update_head->>recompute_finalized: recompute_finalized(store, new_head)
    recompute_finalized->>Store: get_state(new_head) → finalized_slot
    recompute_finalized->>Store: walk ancestors until slot ≤ finalized_slot
    Store-->>recompute_finalized: ancestor block at finalized_slot
    recompute_finalized-->>update_head: "Option<Checkpoint>"
    update_head->>Store: update_checkpoints(head, None, finalized)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Gossip
    participant on_block_core
    participant update_head
    participant recompute_finalized
    participant Store

    Gossip->>on_block_core: new block arrives
    on_block_core->>Store: "store block & state"
    note over on_block_core: advance justified (monotonic)<br/>finalized NOT latched here
    on_block_core->>update_head: trigger head update

    update_head->>Store: compute_lmd_ghost_head(latest_justified.root)
    Store-->>update_head: new_head
    update_head->>recompute_finalized: recompute_finalized(store, new_head)
    recompute_finalized->>Store: get_state(new_head) → finalized_slot
    recompute_finalized->>Store: walk ancestors until slot ≤ finalized_slot
    Store-->>recompute_finalized: ancestor block at finalized_slot
    recompute_finalized-->>update_head: "Option<Checkpoint>"
    update_head->>Store: update_checkpoints(head, None, finalized)
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/blockchain/src/store.rs:64-65
`recompute_finalized` is now called on every `update_head` invocation (including the steady-state where the head hasn't moved). Each call issues `get_block_header` calls for every slot between the head and the finalized epoch — ~64 store reads per slot (every 4 s) on a live chain. Consider short-circuiting when `new_head == old_head` and the stored finalized slot already matches what the walk would return.

```suggestion
    // Skip the ancestry walk when the head is unchanged and finalized is already
    // up-to-date; avoids O(epoch_depth) store reads + a DB write every slot.
    let finalized = if new_head != old_head
        || store
            .get_state(&new_head)
            .map(|s| s.latest_finalized.slot)
            != Some(store.latest_finalized().slot)
    {
        recompute_finalized(store, new_head)
    } else {
        None
    };
    store.update_checkpoints(ForkCheckpoints::new(new_head, None, finalized));
```

### Issue 2 of 2
crates/blockchain/src/store.rs:108-127
**Skipped-slot edge case silently preserves stale finalized**

When `finalized_slot` falls on a skipped slot, `filter(|header| header.slot == finalized_slot)` returns `None`, causing `recompute_finalized` to return `None` and `update_head` to leave the stored finalized checkpoint unchanged. The comment attributes this only to checkpoint-sync anchors, but skipped slots at epoch boundaries occur in normal operation too. If the previous finalized checkpoint was on a losing fork, silently keeping it leaves a stale wrong-fork value. Consider falling back to the nearest ancestor with `slot ≤ finalized_slot`, matching how the beacon spec's `get_block_root_at_slot` resolves epoch-boundary roots.

Reviews (1): Last reviewed commit: "fix: align fork-choice finality and aggr..." | Re-trigger Greptile

Comment thread crates/blockchain/src/store.rs Outdated
Comment thread crates/blockchain/src/store.rs Outdated
The head's post-state already records latest_finalized with both root and
slot, and that root is an ancestor of the head by construction. Read it
directly and confirm the DB holds the block, rather than re-deriving the root
by walking the parent chain. Returns None (keep existing checkpoint) when the
finalized block is absent, e.g. a checkpoint-sync anchor's pre-anchor history.
The single-use recompute_finalized helper added little over an inline
expression; fold it into update_head.
Comment thread crates/blockchain/src/store.rs Outdated
Comment thread crates/blockchain/src/store.rs Outdated
@pablodeymo pablodeymo merged commit aae20ed into main Jun 16, 2026
2 checks passed
@pablodeymo pablodeymo deleted the fix/forkchoice-spectest-failures branch June 16, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants