Skip to content

fix(rewards): recover missing reward disbursements and prevent future loss#826

Merged
rickyrombo merged 1 commit into
mainfrom
mp/fix-reward-indexer-gap-recovery
May 19, 2026
Merged

fix(rewards): recover missing reward disbursements and prevent future loss#826
rickyrombo merged 1 commit into
mainfrom
mp/fix-reward-indexer-gap-recovery

Conversation

@rickyrombo
Copy link
Copy Markdown
Contributor

@rickyrombo rickyrombo commented May 19, 2026

Summary

  • Stops the program indexer from silently dropping reward_manager transactions when ProcessTransaction returns an error: solana/indexer/program/indexer.go:113 now surfaces the error so it lands on the existing retry queue, and solana/indexer/program/backfiller.go:210 now logs failures instead of advancing the cursor in silence.
  • Adds a gap-detection cron (jobs/checkpoint_gap_backfill.go, scheduled every 1h in solana/indexer/solana_indexer.go) that scans sol_slot_checkpoints, finds uncovered slot ranges per indexer name, and dispatches Backfill(fromSlot, toSlot) on anything satisfying the new jobs.Backfillable interface. Each indexer owns its own backfill strategy; the program indexer's lives next to it via the moved Backfiller.
  • Adds migration 0201_backfill_missing_reward_disbursements.sql, a one-shot recovery of ~29k legacy challenge_disbursements rows excluded by migration 0152's INNER JOIN user_bank_accounts. Uses a LATERAL against sol_claimable_accounts to pick the current AUDIO claimable account per user. Rows for hard-deleted users are intentionally skipped (no recoverable relational state - ~20k rows).

Background

~55k challenge_disbursements rows were missing from sol_reward_disbursements. Investigation split the gap into three causes, each addressed independently:

  1. Per-tx silent drops (unknown rows): error returns from ProcessTransaction were discarded by both the live HandleUpdate path and Backfiller.backfillAddressTransactions. The slot checkpoint kept advancing past the failed tx, leaving no record. Fixed at the two source lines above.
  2. Subscription gaps (~hundreds of rows, e.g. 2026-03-23 outage): the existing Backfiller type was never invoked in production — no caller existed outside its unit tests. The new cron job wires it up via the Backfillable interface and writes a checkpoint row after each successful gap fill so subsequent runs don't re-trigger.
  3. 0152 backfill exclusion (~29k rows pre-Go-indexer): the original challenge_disbursementssol_reward_disbursements backfill inner-joined user_bank_accounts, dropping any disbursement whose author lacked a current user_bank entry. Migration 0201 re-runs that backfill using sol_claimable_accounts (the modern source for user banks) and LATERAL deduping.

Test plan

  • Run go test ./solana/indexer/program/ ./jobs/ ./solana/indexer/... locally — all pass.
  • After deploy, confirm the gap-detection cron logs "Job started"/"Job completed successfully" on its first tick (CheckpointGapBackfillJob in indexer logs).
  • Apply migration 0201 against staging; verify sol_reward_disbursements row count increases by ~29k and that v_challenge_disbursements now returns the previously-missing rewards for affected users.
  • Verify the next reward_manager EvaluateAttestations transaction with a per-tx processing error lands in sol_retry_queue instead of being silently dropped (can be confirmed against sol_retry_queue.error after deploy).
  • Watch for repeated firings on the same gap — markGapFilled should write a checkpoint row that suppresses re-triggering. If you see duplicate runs, the subscription_hash logic in markGapFilled needs review.

🤖 Generated with Claude Code

… loss

Resolves a ~55k-row gap between challenge_disbursements and
sol_reward_disbursements traced to three independent causes:

1. ProcessTransaction errors were silently discarded in the live program
   indexer and Backfiller, so per-tx failures advanced the slot checkpoint
   without ever reaching the retry queue. Both paths now surface errors so
   the retry queue (live) and zap logs (backfill) see them.

2. No cron ever invoked the existing Backfiller, so subscription gaps
   (e.g. the 17h45m outage on 2026-03-23) were never recovered. Added a
   gap-detection job that scans sol_slot_checkpoints, merges intervals,
   and dispatches Backfill on each gap via a Backfillable interface that
   indexers can implement per their own subscription shape.

3. Migration 0152's INNER JOIN on user_bank_accounts excluded ~29k rows
   from the original challenge_disbursements -> sol_reward_disbursements
   backfill. Migration 0201 re-runs the backfill using the latest
   sol_claimable_account per (wallet, AUDIO mint), recovering rows whose
   user is still current. Rows for hard-deleted users are intentionally
   skipped (no recoverable relational state).

Also moves the Backfiller into the program package since it's
program-indexer-specific (walks GetSignaturesForAddress for hardcoded
program IDs); other indexers will own their own Backfill implementations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rickyrombo rickyrombo merged commit 9c1bac2 into main May 19, 2026
5 checks passed
@rickyrombo rickyrombo deleted the mp/fix-reward-indexer-gap-recovery branch May 19, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants