perf(eap-items): Index-prune and dedupe sentry.timestamp range filters#8018
Draft
phacops wants to merge 2 commits into
Draft
perf(eap-items): Index-prune and dedupe sentry.timestamp range filters#8018phacops wants to merge 2 commits into
phacops wants to merge 2 commits into
Conversation
`sentry.timestamp` is a normalized column that attribute_key_to_expression maps to `CAST(timestamp, 'Float64')`. Wrapping the primary-key/partition column in a CAST stops ClickHouse from using it for granule and partition pruning, so client-supplied range filters on sentry.timestamp degenerate into per-row scans on top of the mandatory time-range condition already applied on the raw column. Rewrite range comparisons (<, <=, >, >=) on sentry.timestamp to compare the raw DateTime `timestamp` column against toDateTime(value) so the condition is index- and partition-prunable and folds with the mandatory time bounds. This mirrors the pattern the trace-item-table resolver already uses to enable optimize_read_in_order. EQUALS/IN and SELECTing sentry.timestamp are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nge bound
Builds on the sentry.timestamp range-filter rewrite: emit the rewritten bound
using the same canonical toDateTime('YYYY-MM-DD HH:MM:SS') form as the mandatory
time-range condition, so a client filter whose bounds equal the request window
produces a byte-identical expression.
treeify_or_and_conditions now drops structurally-identical conjuncts from the
top-level WHERE AND (A AND A == A) before re-nesting. Nothing downstream in the
query pipeline dedupes conditions (the ClickHouse formatter joins AND args
verbatim), so without this the duplicate bound would reach ClickHouse. Because
every RPC endpoint treeifies, they all get the dedupe. Distinct ranges are left
untouched -- the AND of them yields the tightest window -- and the pass is a
no-op (no rebuild) when there are no duplicates.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
EAP item queries carry two overlapping timestamp conditions: the mandatory time-range bound (on the raw
timestampcolumn) and any client-suppliedsentry.timestampfilter. The latter was emitted asCAST(timestamp, 'Float64'), which can't drive granule or partition pruning, so it degenerated into a per-row scan layered on top of the mandatory bound — showing up as one shard being consistently slow on skewed data.Range filters compare the raw column
sentry.timestampis a normalized column, andattribute_key_to_expressionmaps every normalized column toCAST(<col>, <type>). Sincetimestampis in the primary key(organization_id, project_id, item_type, timestamp)and the partition keytoMonday(timestamp), the CAST defeats pruning. Range comparisons (<,<=,>,>=) onsentry.timestampare now rewritten to target the rawDateTimecolumn:The column is second-resolution
DateTime, soCAST(timestamp, 'Float64')already yields whole seconds — comparing againsttoDateTime(value)is equivalent for the integer unix-second values clients send.EQUALS/INandSELECTingsentry.timestamp(which must still return a number) keep the CAST path.Dedupe of identical conditions
The rewrite reuses the same canonical
toDateTime('YYYY-MM-DD HH:MM:SS')form as the mandatory time-range bound, so a client filter whose bounds equal the request window becomes a byte-identical expression.treeify_or_and_conditionsnow drops structurally-identical conjuncts from the top-level WHERE AND (A AND A == A) before re-nesting, collapsing those duplicates. Distinct ranges are left untouched — the AND of them yields the tightest window — and the pass is a no-op (no rebuild) when there are no duplicates.Nothing downstream dedupes conditions (the ClickHouse formatter joins AND args verbatim), so this is the only place the duplicate gets removed. Because every RPC endpoint calls
treeify_or_and_conditions, they all get both the raw-column rewrite (shared filter path) and the dedupe.Tests build the query AST via
build_queryand assert on the generated WHERE clause (no live ClickHouse needed): equal ranges collapse to one bound, different ranges keep both, and 3+ distinct bounds are all kept and never use a Float cast.