Skip to content

feat(eap-items): add generic name column with bloom_filter index#8016

Open
phacops wants to merge 6 commits into
masterfrom
feat/add-trace-metric-name-column
Open

feat(eap-items): add generic name column with bloom_filter index#8016
phacops wants to merge 6 commits into
masterfrom
feat/add-trace-metric-name-column

Conversation

@phacops

@phacops phacops commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a generic name column to the eap_items_1 tables, with a bloom_filter index, to store and index a per-item-type "name" attribute that we frequently filter on. Promoting this commonly-filtered attribute to a dedicated indexed column lets those queries skip granules via the bloom_filter instead of scanning the attribute maps.

The column is intentionally generic (name) rather than tied to a single item type, since each item type has its own notion of a primary name attribute it gets filtered by.

Changes

  • New column name: LowCardinality(String) CODEC(ZSTD(1)), positioned after retention_days so it sits ahead of the entire attributes block (attributes_bool, attributes_int, and the string/float buckets).
    • Added to all eap_items tables: eap_items_1_local/_dist, the three downsampled tables (_downsample_8/64/512, local + dist), and the read-only distributed tables (_dist_ro) under the EVENTS_ANALYTICS_PLATFORM_RO storage set.
    • retention_days is used as the placement anchor because it's the only pre-attributes column present on every target table (the downsampled and _dist_ro variants lack downsampled_retention_days). Column order has no effect on correctness since ClickHouse inserts/selects are by name.
  • New bf_name bloom_filter index (granularity 1) on every local table: eap_items_1_local and the three _downsample_{8,64,512}_local tables, so name filtering is accelerated on the full-resolution and downsampled read paths alike.
  • Fully reversible: backwards_ops drops the indexes first, then the column from every table.

Generated SQL (forwards)

ALTER TABLE eap_items_1_local ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
-- + dist, the 3 downsampled local/dist tables, and the 4 _dist_ro tables
ALTER TABLE eap_items_1_local ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
-- + the 3 _downsample_{8,64,512}_local tables

Notes

  • _dist_ro tables are created via CREATE TABLE ... AS and don't inherit later schema changes, so the column is added explicitly there for the read path.
  • LowCardinality keeps its dictionary per data part (not global), so the encoding stays effective even as the global set of names grows.
  • This extends the bloom_filter index to the downsampled local tables, which goes a step beyond the existing bf_trace_id/bf_hashed_keys indexes (those live on the main local table only) — intentional here so name filtering benefits on downsampled queries too.

Testing

Verified the migration loads and renders the expected forward/backward SQL via EventsAnalyticsPlatformLoader.

🤖 Generated with Claude Code

Add a `trace_metric_name` LowCardinality(String) column to the
eap_items_1 tables (local, dist, and the downsampled 8/64/512 tables)
to store the trace metric name. This sets up a future bloom_filter
index on the column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/NYTrkkz-0_0mKttPxUNymU3l-z042ptNDua5SgEa-Wk
@phacops phacops requested review from a team as code owners June 10, 2026 18:19
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' ADD COLUMN IF NOT EXISTS name LowCardinality(String) CODEC (ZSTD(1)) AFTER retention_days;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' ADD INDEX IF NOT EXISTS bf_name name TYPE bloom_filter GRANULARITY 1;
-- end forward migration events_analytics_platform : 0057_add_name_column_and_index




-- backward migration events_analytics_platform : 0057_add_name_column_and_index
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP INDEX IF EXISTS bf_name;
Distributed op: ALTER TABLE eap_items_1_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_8_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_64_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Local op: ALTER TABLE eap_items_1_downsample_512_local ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_8_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_64_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
Distributed op: ALTER TABLE eap_items_1_downsample_512_dist_ro ON CLUSTER 'cluster_one_sh' DROP COLUMN IF EXISTS name;
-- end backward migration events_analytics_platform : 0057_add_name_column_and_index

phacops and others added 2 commits June 10, 2026 11:28
The _dist_ro tables are created via `CREATE TABLE ... AS` and do not
inherit schema changes from their source tables, so the column must be
added explicitly for the read path to see it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the new column from `trace_metric_name` to `name` and add a
`bf_name` bloom_filter index on `eap_items_1_local`, matching the
existing bloom_filter index convention (local table only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/qdL1I1HW5uIk6IHkW6Dx9PXWP25TprCOs-FEFCgoxQg
@phacops phacops changed the title feat(eap-items): add trace_metric_name column feat(eap-items): add name column with bloom_filter index Jun 10, 2026
Add the bf_name bloom_filter index to the downsampled local tables
(eap_items_1_downsample_{8,64,512}_local) in addition to the main
eap_items_1_local table, so name lookups are accelerated on the
downsampled read paths as well.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@phacops phacops changed the title feat(eap-items): add name column with bloom_filter index feat(eap-items): add generic name column with bloom_filter index Jun 10, 2026
phacops and others added 2 commits June 10, 2026 12:24
Place the name column after attributes_int (before attributes_string_0)
so it sits ahead of all the string/float attribute bucket columns
instead of after attributes_array.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/D5xH-5PKdtHd7RX0JgQzXhCz8wJnAXR_d5sDw_M9dcw
Anchor the name column after retention_days so it precedes the entire
attributes block (attributes_bool, attributes_int, and the string/float
buckets). retention_days is used as the anchor because it exists on
every target table, including the downsampled and _dist_ro variants.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.

),
),
]
after = "retention_days"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong column placement anchor

Medium Severity

The migration sets after to retention_days, so every AddColumn emits AFTER retention_days. The PR and intended DDL place name immediately after attributes_array at the end of the attribute columns, not next to retention metadata.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ea9a3e7. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AFTER retention_days placement is intentional — the column was deliberately moved to sit before all the attribute columns (attributes_bool/attributes_int and the string/float buckets), not after attributes_array. retention_days is used as the anchor because it's the only pre-attributes column present on every target table (the downsampled and _dist_ro variants don't have downsampled_retention_days). The PR description referenced the older placement and has been updated to match. Column order doesn't affect correctness anyway, since ClickHouse inserts/selects are by name.

— Claude Code

phacops added a commit that referenced this pull request Jun 10, 2026
Populate the new generic `name` column (added in #8016) in the EAP items
consumer. The value is sourced per item type from the primary name
attribute we commonly filter on: `sentry.op` for spans and `sentry.name`
for metrics. The source attribute is still written into the attribute
maps as before; this only promotes it into the dedicated indexed column.

Declare the `name` column in the eap_items storage and all sibling
storages (read-only, downsampled, and DLQ replay) so the read path sees
it as well.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/RnmbvjU4NSsHdJEJwL_H0KFTCm_2y3okwyxmphZmzms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant