Skip to content

perf(sqlite): add secondary indexes for frontend queries#77

Merged
Androz2091 merged 1 commit intomainfrom
perf/sqlite-indexes
May 2, 2026
Merged

perf(sqlite): add secondary indexes for frontend queries#77
Androz2091 merged 1 commit intomainfrom
perf/sqlite-indexes

Conversation

@Androz2091
Copy link
Copy Markdown
Member

Summary

Closes #14.

The client-side SQLite database (the one shipped to the browser via `/blob`) had only PRIMARY KEYs on its tables. Common frontend reads — top guilds, top channels, top DMs, sending times, usage stats, etc. — were doing full table scans on the `activity` table (which can have 100k+ rows on heavy users) plus the `sessions` and `voice_sessions` tables (which had no usable index for their date-range filters at all).

What's added

Six secondary indexes, built after bulk insert so each row only writes once, and before `VACUUM` so vacuum repacks the index pages too:

Table Index Why
`activity` `(event_name, associated_guild_id, day)` Guild-scoped reads (use-guild-data, use-related-guild, top-guilds, top-channels filtered by guild)
`activity` `(event_name, associated_channel_id, day)` Channel-scoped reads (use-channel-data, use-dm-data, top-dms, top-channels)
`sessions` `(started_date)` Every `sessions` query in use-usage-stats-data filters on this; table had no PK
`voice_sessions` `(started_date)` Every `voice_sessions` query filters only on `started_date`; existing PK leads with `channel_id`
`dm_channels_data` `(dm_user_id)` Top-DM and DM-detail queries group/filter by this
`guild_channels_data` `(guild_id)` Guild-scoped channel listings filter by this

The picks were validated against every `SELECT` in `dumpus-app/src/hooks/data/use-*.ts` and `stores/database.ts`. Patterns the existing PKs already covered (e.g. global `event_name + day` time-range scans hit the activity PK directly) didn't get a redundant index.

Tradeoffs

  • Slightly larger DB before `gzip` (indexes are repetitive, so they compress well — net size impact is small).
  • Worker write time goes up marginally (one-shot index build over all activity rows). On the test heavy package, the build is well under a second.
  • Frontend cold-start parse time is unchanged (sql.js doesn't lazy-load indexes).

Test plan

  • CI deploys; `scripts/reprocess.sh` against a real heavy package.
  • Verify the resulting SQLite has the 6 indexes (`SELECT name FROM sqlite_master WHERE type='index'`).
  • Open the package in the web app — every page that runs aggregation queries (Top Guilds, Top Channels, Top DMs, Usage Stats, Guild Detail, Channel Detail, DM Detail) should render the same data, just faster on heavy packages.

The exported SQLite database had only PRIMARY KEYs on its tables, so
common frontend reads ended up doing full scans. Worst offenders:

- activity: PK starts (event_name, day, ...), but most reads also
  filter on associated_guild_id or associated_channel_id, which the
  PK couldn't help with.
- sessions: no PK at all; every read filters on started_date.
- voice_sessions: PK leads with channel_id, but reads only filter on
  started_date.
- dm_channels_data / guild_channels_data: PK on channel_id, but reads
  also group/filter by dm_user_id / guild_id.

Add 6 secondary indexes built after bulk insert (so each row only
writes once) and before VACUUM (so vacuum repacks the index pages too).

Closes #14.
@Androz2091 Androz2091 merged commit 3058f46 into main May 2, 2026
1 check passed
@Androz2091 Androz2091 deleted the perf/sqlite-indexes branch May 2, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

optimize queries with indexes in the SQLite db

1 participant