feat(eap-items): add generic name column with bloom_filter index by phacops · Pull Request #8016 · getsentry/snuba

phacops · 2026-06-10T18:19:34Z

Summary

Adds a generic name column to the eap_items_1 tables, with a bloom_filter index, to store and index a per-item-type "name" attribute that we frequently filter on. Promoting this commonly-filtered attribute to a dedicated indexed column lets those queries skip granules via the bloom_filter instead of scanning the attribute maps.

The column is intentionally generic (name) rather than tied to a single item type, since each item type has its own notion of a primary name attribute it gets filtered by.

Changes

New column name: LowCardinality(String) CODEC(ZSTD(1)), positioned after retention_days so it sits ahead of the entire attributes block (attributes_bool, attributes_int, and the string/float buckets).
- Added to all eap_items tables: eap_items_1_local/_dist, the three downsampled tables (_downsample_8/64/512, local + dist), and the read-only distributed tables (_dist_ro) under the EVENTS_ANALYTICS_PLATFORM_RO storage set.
- retention_days is used as the placement anchor because it's the only pre-attributes column present on every target table (the downsampled and _dist_ro variants lack downsampled_retention_days). Column order has no effect on correctness since ClickHouse inserts/selects are by name.
New bf_name bloom_filter index (granularity 1) on every local table: eap_items_1_local and the three _downsample_{8,64,512}_local tables, so name filtering is accelerated on the full-resolution and downsampled read paths alike.
Fully reversible: backwards_ops drops the indexes first, then the column from every table.

Generated SQL (forwards)

ALTER TABLE eap_items_1_local ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
-- + dist, the 3 downsampled local/dist tables, and the 4 _dist_ro tables
ALTER TABLE eap_items_1_local ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
-- + the 3 _downsample_{8,64,512}_local tables

Notes

_dist_ro tables are created via CREATE TABLE ... AS and don't inherit later schema changes, so the column is added explicitly there for the read path.
LowCardinality keeps its dictionary per data part (not global), so the encoding stays effective even as the global set of names grows.
This extends the bloom_filter index to the downsampled local tables, which goes a step beyond the existing bf_trace_id/bf_hashed_keys indexes (those live on the main local table only) — intentional here so name filtering benefits on downsampled queries too.

Testing

Verified the migration loads and renders the expected forward/backward SQL via EventsAnalyticsPlatformLoader.

🤖 Generated with Claude Code

Add a `trace_metric_name` LowCardinality(String) column to the eap_items_1 tables (local, dist, and the downsampled 8/64/512 tables) to store the trace metric name. This sets up a future bloom_filter index on the column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/NYTrkkz-0_0mKttPxUNymU3l-z042ptNDua5SgEa-Wk

github-actions · 2026-06-10T18:25:03Z

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
-- end forward migration events_analytics_platform : 0057_add_name_column_and_index




-- backward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
-- end backward migration events_analytics_platform : 0057_add_name_column_and_index

The _dist_ro tables are created via `CREATE TABLE ... AS` and do not inherit schema changes from their source tables, so the column must be added explicitly for the read path to see it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rename the new column from `trace_metric_name` to `name` and add a `bf_name` bloom_filter index on `eap_items_1_local`, matching the existing bloom_filter index convention (local table only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/qdL1I1HW5uIk6IHkW6Dx9PXWP25TprCOs-FEFCgoxQg

Add the bf_name bloom_filter index to the downsampled local tables (eap_items_1_downsample_{8,64,512}_local) in addition to the main eap_items_1_local table, so name lookups are accelerated on the downsampled read paths as well. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Place the name column after attributes_int (before attributes_string_0) so it sits ahead of all the string/float attribute bucket columns instead of after attributes_array. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/D5xH-5PKdtHd7RX0JgQzXhCz8wJnAXR_d5sDw_M9dcw

Anchor the name column after retention_days so it precedes the entire attributes block (attributes_bool, attributes_int, and the string/float buckets). retention_days is used as the anchor because it exists on every target table, including the downsampled and _dist_ro variants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.}

cursor · 2026-06-10T19:34:18Z

+        ),
+    ),
+]
+after = "retention_days"


Wrong column placement anchor

Medium Severity

The migration sets after to retention_days, so every AddColumn emits AFTER retention_days. The PR and intended DDL place name immediately after attributes_array at the end of the attribute columns, not next to retention metadata.

^{Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.}

The AFTER retention_days placement is intentional — the column was deliberately moved to sit before all the attribute columns (attributes_bool/attributes_int and the string/float buckets), not after attributes_array. retention_days is used as the anchor because it's the only pre-attributes column present on every target table (the downsampled and _dist_ro variants don't have downsampled_retention_days). The PR description referenced the older placement and has been updated to match. Column order doesn't affect correctness anyway, since ClickHouse inserts/selects are by name.

— Claude Code

Populate the new generic `name` column (added in #8016) in the EAP items consumer. The value is sourced per item type from the primary name attribute we commonly filter on: `sentry.op` for spans and `sentry.name` for metrics. The source attribute is still written into the attribute maps as before; this only promotes it into the dedicated indexed column. Declare the `name` column in the eap_items storage and all sibling storages (read-only, downsampled, and DLQ replay) so the read path sees it as well. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/RnmbvjU4NSsHdJEJwL_H0KFTCm_2y3okwyxmphZmzms

phacops requested review from a team as code owners June 10, 2026 18:19

github-actions Bot added the migrations label Jun 10, 2026

sentry Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread snuba/snuba_migrations/events_analytics_platform/0057_add_name_column_and_index.py

phacops and others added 2 commits June 10, 2026 11:28

phacops changed the title ~~feat(eap-items): add trace_metric_name column~~ feat(eap-items): add name column with bloom_filter index Jun 10, 2026

phacops changed the title ~~feat(eap-items): add name column with bloom_filter index~~ feat(eap-items): add generic name column with bloom_filter index Jun 10, 2026

phacops and others added 2 commits June 10, 2026 12:24

cursor Bot reviewed Jun 10, 2026

View reviewed changes

phacops mentioned this pull request Jun 10, 2026

feat(eap-items): populate name column from sentry.op and sentry.name #8017

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eap-items): add generic name column with bloom_filter index#8016

feat(eap-items): add generic name column with bloom_filter index#8016
phacops wants to merge 6 commits into
masterfrom
feat/add-trace-metric-name-column

phacops commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 10, 2026

Uh oh!

phacops Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

phacops commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Generated SQL (forwards)

Notes

Testing

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 10, 2026

Choose a reason for hiding this comment

Wrong column placement anchor

Uh oh!

phacops Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

phacops commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading