perf(migrations): speed up reward disbursements backfill by rickyrombo · Pull Request #829 · AudiusProject/api

rickyrombo · 2026-05-19T16:36:45Z

Summary

Adds two CREATE INDEX CONCURRENTLY statements at the top of 0201_backfill_missing_reward_disbursements.sql (outside the BEGIN/COMMIT):
- sol_reward_disbursements (challenge_id, specifier) — lets the dedup LEFT JOIN find existing rows by index instead of a per-row sequential scan.
- sol_claimable_accounts (ethereum_address, mint, slot DESC) — supports the "latest claimable account per wallet" lookup pattern (used by this migration and the live reward_manager indexer).
Replaces the per-row LATERAL subquery with a WITH user_banks AS MATERIALIZED CTE that pre-computes DISTINCT ON (ethereum_address) once and hash-joins against the result.
SET LOCAL session_replication_role = replica inside the backfill transaction to suppress the on_sol_reward_disbursement trigger, which fires per row to create challenge_reward notifications + pg_notify. For a one-shot backfill of months-old historical rewards we don't want to spam users, and the trigger work was a meaningful chunk of the per-row cost.

Why

The 0201 backfill is taking over an hour against prod. Diagnosis:

The LEFT JOIN on (challenge_id, specifier) had no index — sol_reward_disbursements is keyed by (signature, instruction_index), and the only other indexes (from 0198) are on recipient_eth_address and created_at.
The LATERAL against sol_claimable_accounts reran ORDER BY slot DESC LIMIT 1 per row.
The row-level trigger added DB work and unwanted historical notifications.

With the new index alone, the LEFT JOIN goes from O(n×m) to O(n log m). With the trigger off and the CTE substitution, the per-row work drops correspondingly. Expected runtime: well under a minute, vs >1h currently.

Migration idempotency

CREATE INDEX CONCURRENTLY IF NOT EXISTS — safe to re-run; existing valid indexes are no-ops, existing invalid indexes (from a previous failed CONCURRENTLY run) require manual DROP INDEX first.
INSERT … ON CONFLICT (signature, instruction_index) DO NOTHING — unchanged; safe on re-run.
Since the migration was never committed in prod (the in-flight one is what we're killing), changing the SQL body just bumps the md5 in pg_migrate.sh's check; the next deploy will run the new shape.

Test plan

Cancel the in-flight 0201 backfill (pg_cancel_backend(<pid>) on the stuck session).
Confirm both indexes don't already exist as invalid: SELECT indexname, indisvalid FROM pg_indexes JOIN pg_class ON relname = indexname JOIN pg_index USING (indexrelid) WHERE indexname IN ('sol_reward_disbursements_challenge_specifier_idx', 'sol_claimable_accounts_eth_mint_slot_idx'); — drop any invalid ones.
Deploy via the migration Job; expect the Job to complete in seconds rather than hours.
Verify recovered row count: SELECT COUNT(*) FROM challenge_disbursements cd LEFT JOIN sol_reward_disbursements rd ON rd.challenge_id = cd.challenge_id AND rd.specifier = cd.specifier WHERE rd.signature IS NULL AND cd.slot > 355300886; — should drop from ~29k toward 0 (modulo the no-current-user bucket which is intentionally not recoverable).

🤖 Generated with Claude Code

The 0201 backfill is taking over an hour in prod. Three structural issues account for the slowdown: 1. The dedup LEFT JOIN on (challenge_id, specifier) has no index. sol_reward_disbursements is keyed by (signature, instruction_index) and only indexed on recipient_eth_address and created_at. The join degenerates to a sequential scan per challenge_disbursements row. 2. The LATERAL subquery against sol_claimable_accounts re-runs an "ORDER BY slot DESC LIMIT 1" filter per row, without an index on (ethereum_address, mint). 3. The on_sol_reward_disbursement trigger fires for every insert, doing three SELECTs and possibly an INSERT into notification — 29k rows × that overhead is significant, and notifying users about months-old historical rewards is undesirable anyway. Fixes: - Add sol_reward_disbursements (challenge_id, specifier) index. Useful permanently, not just for this migration. CREATE CONCURRENTLY so the live indexer's writes aren't blocked; moved outside the BEGIN/COMMIT since CONCURRENTLY can't run inside an explicit transaction (psql runs each statement in its own implicit tx when not wrapped). - Add sol_claimable_accounts (ethereum_address, mint, slot DESC) index. Same reasoning — the live indexer also benefits from this lookup shape for user_bank resolution. - Replace the per-row LATERAL with a MATERIALIZED CTE that pre-computes DISTINCT ON (ethereum_address) once, then hash-joins. One indexed scan instead of N LATERAL invocations. - SET LOCAL session_replication_role = replica inside the backfill transaction to suppress on_sol_reward_disbursement. LOCAL keeps the setting scoped to this transaction so concurrent indexer writes still fire the trigger normally. Both index creations use IF NOT EXISTS so re-running is safe; the backfill INSERT is already idempotent via ON CONFLICT DO NOTHING.

## Summary - Switches `0201_backfill_missing_reward_disbursements.sql` from `CREATE INDEX CONCURRENTLY` to plain `CREATE INDEX` inside the migration's `BEGIN/COMMIT`. - Both indexes (`sol_reward_disbursements (challenge_id, specifier)` and `sol_claimable_accounts (ethereum_address, mint, slot DESC)`) are now atomic with the backfill INSERT — if anything fails, the schema rolls back cleanly. ## Why `CREATE INDEX CONCURRENTLY` waits on a `virtualxid` lock for every transaction open during its build phases — not just transactions that touch the target table, but every one in the cluster. The legacy Python `index_rewards_manager` Celery task on discovery-provider keeps ~3-minute transactions open against `challenge_disbursements` continuously. As fast as one ends, another is already open. So the CONCURRENTLY build can wait indefinitely without ever seeing a quiet moment — and it did, for 10+ minutes blocked on `Lock/virtualxid` in tonight's deploy. Trade-off accepted: regular `CREATE INDEX` takes a `ShareLock` on the target table for the duration of the build, blocking writes. But both target tables are written only by the Go indexer, and only on reward_manager `EvaluateAttestations` and claimable token `Create` instructions — sparse on-chain. At current row counts each build completes in seconds; the blocked writes just queue on pgxpool and resume right after. ## Test plan - [ ] Cancel any in-flight 0201 attempt and drop any invalid index it left behind: ```sql SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query ILIKE 'CREATE INDEX CONCURRENTLY%'; DROP INDEX IF EXISTS sol_reward_disbursements_challenge_specifier_idx; DROP INDEX IF EXISTS sol_claimable_accounts_eth_mint_slot_idx; ``` - [ ] Roll the new image; migration Job's `bridge migrate` should complete in well under a minute. - [ ] Verify both indexes exist as `indisvalid = true`: ```sql SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE indexrelid::regclass::text IN ( 'sol_reward_disbursements_challenge_specifier_idx', 'sol_claimable_accounts_eth_mint_slot_idx' ); ``` - [ ] Verify missing-row count drops as expected (per #829's test plan). 🤖 Generated with [Claude Code](https://claude.com/claude-code)

rickyrombo requested a review from raymondjacobson May 19, 2026 16:42

rickyrombo merged commit 89c8794 into main May 19, 2026
5 checks passed

rickyrombo deleted the mp/speed-up-reward-disbursements-backfill branch May 19, 2026 16:45

rickyrombo mentioned this pull request May 19, 2026

fix(migrations): drop CONCURRENTLY from 0201 indexes #830

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(migrations): speed up reward disbursements backfill#829

perf(migrations): speed up reward disbursements backfill#829
rickyrombo merged 1 commit into
mainfrom
mp/speed-up-reward-disbursements-backfill

rickyrombo commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rickyrombo commented May 19, 2026

Summary

Why

Migration idempotency

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant