feat(eap-items): add generic name column with bloom_filter index#8016
feat(eap-items): add generic name column with bloom_filter index#8016phacops wants to merge 6 commits into
Conversation
Add a `trace_metric_name` LowCardinality(String) column to the eap_items_1 tables (local, dist, and the downsampled 8/64/512 tables) to store the trace metric name. This sets up a future bloom_filter index on the column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/NYTrkkz-0_0mKttPxUNymU3l-z042ptNDua5SgEa-Wk
|
This PR has a migration; here is the generated SQL for -- start migrations
-- forward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
-- end forward migration events_analytics_platform : 0057_add_name_column_and_index
-- backward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
-- end backward migration events_analytics_platform : 0057_add_name_column_and_index |
The _dist_ro tables are created via `CREATE TABLE ... AS` and do not inherit schema changes from their source tables, so the column must be added explicitly for the read path to see it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the new column from `trace_metric_name` to `name` and add a `bf_name` bloom_filter index on `eap_items_1_local`, matching the existing bloom_filter index convention (local table only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/qdL1I1HW5uIk6IHkW6Dx9PXWP25TprCOs-FEFCgoxQg
Add the bf_name bloom_filter index to the downsampled local tables
(eap_items_1_downsample_{8,64,512}_local) in addition to the main
eap_items_1_local table, so name lookups are accelerated on the
downsampled read paths as well.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Place the name column after attributes_int (before attributes_string_0) so it sits ahead of all the string/float attribute bucket columns instead of after attributes_array. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/D5xH-5PKdtHd7RX0JgQzXhCz8wJnAXR_d5sDw_M9dcw
Anchor the name column after retention_days so it precedes the entire attributes block (attributes_bool, attributes_int, and the string/float buckets). retention_days is used as the anchor because it exists on every target table, including the downsampled and _dist_ro variants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.
| ), | ||
| ), | ||
| ] | ||
| after = "retention_days" |
There was a problem hiding this comment.
Wrong column placement anchor
Medium Severity
The migration sets after to retention_days, so every AddColumn emits AFTER retention_days. The PR and intended DDL place name immediately after attributes_array at the end of the attribute columns, not next to retention metadata.
Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.
There was a problem hiding this comment.
The AFTER retention_days placement is intentional — the column was deliberately moved to sit before all the attribute columns (attributes_bool/attributes_int and the string/float buckets), not after attributes_array. retention_days is used as the anchor because it's the only pre-attributes column present on every target table (the downsampled and _dist_ro variants don't have downsampled_retention_days). The PR description referenced the older placement and has been updated to match. Column order doesn't affect correctness anyway, since ClickHouse inserts/selects are by name.
— Claude Code
Populate the new generic `name` column (added in #8016) in the EAP items consumer. The value is sourced per item type from the primary name attribute we commonly filter on: `sentry.op` for spans and `sentry.name` for metrics. The source attribute is still written into the attribute maps as before; this only promotes it into the dedicated indexed column. Declare the `name` column in the eap_items storage and all sibling storages (read-only, downsampled, and DLQ replay) so the read path sees it as well. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/RnmbvjU4NSsHdJEJwL_H0KFTCm_2y3okwyxmphZmzms


Summary
Adds a generic
namecolumn to theeap_items_1tables, with abloom_filterindex, to store and index a per-item-type "name" attribute that we frequently filter on. Promoting this commonly-filtered attribute to a dedicated indexed column lets those queries skip granules via the bloom_filter instead of scanning the attribute maps.The column is intentionally generic (
name) rather than tied to a single item type, since each item type has its own notion of a primary name attribute it gets filtered by.Changes
name:LowCardinality(String) CODEC(ZSTD(1)), positioned afterretention_daysso it sits ahead of the entire attributes block (attributes_bool,attributes_int, and the string/float buckets).eap_items_1_local/_dist, the three downsampled tables (_downsample_8/64/512, local + dist), and the read-only distributed tables (_dist_ro) under theEVENTS_ANALYTICS_PLATFORM_ROstorage set.retention_daysis used as the placement anchor because it's the only pre-attributes column present on every target table (the downsampled and_dist_rovariants lackdownsampled_retention_days). Column order has no effect on correctness since ClickHouse inserts/selects are by name.bf_namebloom_filterindex (granularity 1) on every local table:eap_items_1_localand the three_downsample_{8,64,512}_localtables, so name filtering is accelerated on the full-resolution and downsampled read paths alike.backwards_opsdrops the indexes first, then the column from every table.Generated SQL (forwards)
Notes
_dist_rotables are created viaCREATE TABLE ... ASand don't inherit later schema changes, so the column is added explicitly there for the read path.LowCardinalitykeeps its dictionary per data part (not global), so the encoding stays effective even as the global set of names grows.bf_trace_id/bf_hashed_keysindexes (those live on the main local table only) — intentional here so name filtering benefits on downsampled queries too.Testing
Verified the migration loads and renders the expected forward/backward SQL via
EventsAnalyticsPlatformLoader.🤖 Generated with Claude Code