Skip to content

perf(eap-items): Index-prune and dedupe sentry.timestamp range filters#8018

Draft
phacops wants to merge 2 commits into
masterfrom
feat/eap-items-timestamp-filter-index
Draft

perf(eap-items): Index-prune and dedupe sentry.timestamp range filters#8018
phacops wants to merge 2 commits into
masterfrom
feat/eap-items-timestamp-filter-index

Conversation

@phacops

@phacops phacops commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

EAP item queries carry two overlapping timestamp conditions: the mandatory time-range bound (on the raw timestamp column) and any client-supplied sentry.timestamp filter. The latter was emitted as CAST(timestamp, 'Float64'), which can't drive granule or partition pruning, so it degenerated into a per-row scan layered on top of the mandatory bound — showing up as one shard being consistently slow on skewed data.

Range filters compare the raw column

sentry.timestamp is a normalized column, and attribute_key_to_expression maps every normalized column to CAST(<col>, <type>). Since timestamp is in the primary key (organization_id, project_id, item_type, timestamp) and the partition key toMonday(timestamp), the CAST defeats pruning. Range comparisons (<, <=, >, >=) on sentry.timestamp are now rewritten to target the raw DateTime column:

-- before
(CAST(timestamp, 'Float64') AS `sentry.timestamp_TYPE_DOUBLE`) >= 1781040732

-- after
timestamp >= toDateTime('2026-06-09 21:32:12')

The column is second-resolution DateTime, so CAST(timestamp, 'Float64') already yields whole seconds — comparing against toDateTime(value) is equivalent for the integer unix-second values clients send. EQUALS/IN and SELECTing sentry.timestamp (which must still return a number) keep the CAST path.

Dedupe of identical conditions

The rewrite reuses the same canonical toDateTime('YYYY-MM-DD HH:MM:SS') form as the mandatory time-range bound, so a client filter whose bounds equal the request window becomes a byte-identical expression. treeify_or_and_conditions now drops structurally-identical conjuncts from the top-level WHERE AND (A AND A == A) before re-nesting, collapsing those duplicates. Distinct ranges are left untouched — the AND of them yields the tightest window — and the pass is a no-op (no rebuild) when there are no duplicates.

Nothing downstream dedupes conditions (the ClickHouse formatter joins AND args verbatim), so this is the only place the duplicate gets removed. Because every RPC endpoint calls treeify_or_and_conditions, they all get both the raw-column rewrite (shared filter path) and the dedupe.

Tests build the query AST via build_query and assert on the generated WHERE clause (no live ClickHouse needed): equal ranges collapse to one bound, different ranges keep both, and 3+ distinct bounds are all kept and never use a Float cast.

phacops and others added 2 commits June 10, 2026 17:04
`sentry.timestamp` is a normalized column that attribute_key_to_expression
maps to `CAST(timestamp, 'Float64')`. Wrapping the primary-key/partition
column in a CAST stops ClickHouse from using it for granule and partition
pruning, so client-supplied range filters on sentry.timestamp degenerate into
per-row scans on top of the mandatory time-range condition already applied on
the raw column.

Rewrite range comparisons (<, <=, >, >=) on sentry.timestamp to compare the
raw DateTime `timestamp` column against toDateTime(value) so the condition is
index- and partition-prunable and folds with the mandatory time bounds. This
mirrors the pattern the trace-item-table resolver already uses to enable
optimize_read_in_order. EQUALS/IN and SELECTing sentry.timestamp are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nge bound

Builds on the sentry.timestamp range-filter rewrite: emit the rewritten bound
using the same canonical toDateTime('YYYY-MM-DD HH:MM:SS') form as the mandatory
time-range condition, so a client filter whose bounds equal the request window
produces a byte-identical expression.

treeify_or_and_conditions now drops structurally-identical conjuncts from the
top-level WHERE AND (A AND A == A) before re-nesting. Nothing downstream in the
query pipeline dedupes conditions (the ClickHouse formatter joins AND args
verbatim), so without this the duplicate bound would reach ClickHouse. Because
every RPC endpoint treeifies, they all get the dedupe. Distinct ranges are left
untouched -- the AND of them yields the tightest window -- and the pass is a
no-op (no rebuild) when there are no duplicates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@phacops phacops changed the title perf(eap-items): Make sentry.timestamp range filters index-prunable perf(eap-items): Index-prune and dedupe sentry.timestamp range filters Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant