Skip to content

📝 CodeRabbit Chat: Fix ClickHouse trigger in adaptive export service#67

Closed
coderabbitai[bot] wants to merge 37 commits into
mainfrom
coderabbitai/chat/274f6f4
Closed

📝 CodeRabbit Chat: Fix ClickHouse trigger in adaptive export service#67
coderabbitai[bot] wants to merge 37 commits into
mainfrom
coderabbitai/chat/274f6f4

Conversation

@coderabbitai

@coderabbitai coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code changes was requested by @entlein.

The following files were modified:

  • src/e2e_test/adaptive_export_loadtest/k8s/gen-pod.tmpl.yaml
  • src/vizier/services/adaptive_export/internal/trigger/clickhouse.go
  • src/vizier/services/adaptive_export/internal/trigger/clickhouse_internal_test.go

Entlein and others added 30 commits June 8, 2026 12:13
New env-gated background loop that runs the same PxL shape AE's
anomaly-gated path uses, but with an empty Target (no ns/pod predicate)
and over a configurable rolling window. Writes via the existing sink so
the byte-shape of forensic_db rows is comparable between the
PASSTHROUGH=1 phase (EVERYTHING) and the PASSTHROUGH=0 phase
(AE-FILTER). One-shot A/B that yields the per-table capture fraction of
the adaptive write path.

- internal/passthrough/passthrough.go — Loop + Config; defaults to
  30s window / 30s refresh / clickhouse.PixieTables() table list.
- internal/passthrough/passthrough_test.go — 6 tests; the load-bearing
  one is TestLoop_EmitsEmptyTargetPxL (asserts neither df.namespace nor
  df.pod predicates appear in the emitted PxL).
- cmd/main.go: ADAPTIVE_PASSTHROUGH + _WINDOW_SEC + _REFRESH_SEC env
  knobs. Adapter is constructed unconditionally when passthrough is on
  (joins the existing PushPixie / streaming construction path so the
  same pxapi grpc stream is reused). Loop is registered with the
  shutdown WaitGroup so SIGTERM waits for the in-flight tick.
- cmd/BUILD.bazel: drop @px// load (other AE BUILD.bazel files use
  //bazel — sticking out as the only one with @px is a leftover from a
  prior gazelle run; align). Add passthrough dep.
Stand-alone workflow that builds entlein/dx (private Active-Diagnosis
Framework) into ghcr.io/k8sstormcenter/dx-daemon. Separates the dx image
publish from the bazel-based vizier_release pipeline; the dx repo ships
its own Dockerfile.dxd (Go cross-compile + distroless final stage) so it
doesn't need to live as a submodule inside src/vizier/services/dx.

Triggers:
- tag push 'release/dx/v*' on this repo cuts a release build, image tag
  derived from the tag suffix (release/dx/v0.1.0 -> image tag 0.1.0).
- workflow_dispatch lets us build any dx ref on demand with a custom tag
  (default: short sha of the resolved dx commit).

Pulls dx via DX_ENTLEIN_PAT (already configured on the repo). Multi-arch
build (linux/amd64, linux/arm64); Dockerfile.dxd cross-compiles in the
native BUILDPLATFORM stage and the final stage is COPY-only, so target
emulation isn't required.
The dx image build pipeline lives in entlein/dx itself (PR #53,
branch feat/bazel-release): bazel-based with @px external pin to
pixie's ae-prod tip, pushes to docker.io/entlein/dx-daemon on a
release/dx/v* tag in the dx repo. The pixie-side buildx workflow
this reverts duplicated that intent in the wrong repo + the wrong
build system (docker buildx instead of bazel + pl_go_image macros)
+ the wrong registry (ghcr.io/k8sstormcenter instead of
docker.io/entlein).
adaptive_export: ADAPTIVE_PASSTHROUGH firehose loop (A/B capture-fraction measurement)
Consolidates the recurring content_type silent-drop incident class
into one default-suite test gate (6 tests, ~15ms):

  I1 TestContract_ContentTypeIsInt64InSchema
  I2 TestContract_FastEncodeContentTypeAsInt
  I3 TestContract_SilentDropDetected
  I3.b TestContract_SilentDropNotTriggeredOnSuccess
  I3.c TestContract_SilentDropToleratesMissingSummaryHeader
  I4 TestContract_HTTPEventsRoundTrip

Top-of-file docstring chronicles the incident timeline so future
operators can grep their way to the contract.
…affordances

Fixes the silent-halt bug: the trigger gated on a RAW event_time high-water-mark,
so a single anomaly in a larger unit (ms/ns) drove the watermark past all real
seconds rows and AE stopped processing forever (data still on Pixie). Normalize
event_time to canonical nanoseconds in the poll SELECT filter+order and in the
in-memory/persisted cursor, boundary-dedup, and maxSeen (normalizeEventTimeNanos
+ chNormEventTimeNanos). Validated at the data layer: vs a poisoned watermark the
raw filter returns 0 rows, the normalized filter recovers all 60.

Also adds ADAPTIVE_PUSH_REFRESH_SEC (negative = single-shot pull) for the
reproducible load-test harness, an in-package trigger unit test, and an e2e
hermetic load test (mock PixieQuerier, exact rows+bytes).
…ness

Consolidate the adaptive_export load-test harness under src/e2e_test/ (matching
vzconn_loadtest / px_cluster conventions). Control-plane experiments (E1-E4, E6,
E8 sustained) are proven exactly-reproducible on a live rig; the data-plane
experiments (E5, E8 data-mode) are authored and pending live validation on a
vizier-registered rig (status documented in README + FINDINGS_AND_BACKLOG).

- harness/: shell + python (inject, exp_control, exp_e8, stats, ...) + lib helpers
- fixtures/EXPERIMENTS.md: curated kubescape_logs data-set catalog + expected outputs
- k8s/: isolated sinks + per-rep generator pod (no probes)
- tools/loadgen/: cleanloadgen + httpsink (docker-built test tool; .bazelignore'd
  pending a bazel target — lib/pq is already vendored in the module)
- FINDINGS_AND_BACKLOG.md: F8 watermark-poison bug + the fix + AE backlog

The AE Go unit/e2e tests live with the service (internal/{trigger,e2e}).
…iagram; gen sustained-DNS mode

C15 = AE must keep re-pulling+writing an active pod until t_end or DX stop (the
contract DX steers on; last week's 'wrote then stopped' is its violation). Add
DX-steering sequence diagram. Generator gains SUSTAIN_SEC (distinct-DNS trickle,
a Pixie-traced protocol) + configurable SETTLE_PRE_MS warm-up for fresh-pod
capture; harness wires GEN_SUSTAIN_SEC/GEN_SETTLE_MS.
…lisation

The 700821d trigger unit-normalisation wrapped event_time in a
multiIf(...) inside both the WHERE filter and the ORDER BY. Three
existing tests in watermark_test.go + one in clickhouse_test.go pinned
the raw 'event_time >= N' substring and broke at HEAD.

Update each test's expected substring to match the new normalized form
(') >= <ns-scaled N>' — the closing paren of multiIf, then the value
in canonical nanoseconds). Per-test ns-scaling:

  watermark_test.go:94  1744000000000000000  already ns -> unchanged
  watermark_test.go:125 InitialWatermark=42  < 1e10 sec -> * 1e9
  watermark_test.go:156 InitialWatermark=7   < 1e10 sec -> * 1e9
  watermark_test.go:297 event_time='5000'    < 1e10 sec -> * 1e9
  clickhouse_test.go:82 ref.T=1744..303e9 ns already ns -> unchanged

go test ./src/vizier/services/adaptive_export/... all green.
Records one forensic_db.ae_reconcile row per data-plane pull (read_count vs
wrote_count, window, ns/pod) across ALL three write paths — controller fan-out
(filter), passthrough firehose, and streaming scanner — so a reconcile run
localizes loss to query (R5: read<PEM) vs sink (R6: wrote<read) and quantifies
re-pull dup (C8). Counts alone (write >= read) were proven insufficient.

- new internal/reconcile leaf package (Row, Recorder, Nop) — no import cycle
- sink.Record: CH-backed recorder (INSERT INTO forensic_db.ae_reconcile)
- ae_reconcile table: schema.sql + KnownTables + OperatorOwnedTables (synced);
  not a pixie table (absent from PixieTables, so VerifyPixieSchema ignores it)
- wired: passthrough.tick, controller.pushPixieRows (deferred, all return
  paths), streaming.scanner.Run; gated by ADAPTIVE_RECONCILE=true, else Nop
- unit test proves read/wrote capture incl. the sink-drop read>wrote shape
- fixed apply_test trailing-tables guard for the new operator table
- harness: exp_row_reconcile.sh (row-level PEM<->CH), ae_vs_all.sh, exp_datavolume_extreme.sh
px -o json empty result previously printed one blank line → counted as a phantom
LOSS=1. Guard: empty set → 0-byte keys file; drop all-empty-field keys. Confirmed
against the controlled log4j run (backend http 14/14 exact, conn 66>=12, no loss).
…log4shell

Reliable BY CONSTRUCTION against bob#140 (stateful/unreliable exploit on re-fire):
fresh-JVM backend (delete pod) + attacker-before-backend + the WORKING resolvable FQDN
attacker.<ns>.svc.cluster.local:1389 (NOT the bare attacker-ns.svc which NXDOMAINs and
gets dropped), then VERIFY the actual backend->:1389 LDAP egress in forensic_db.conn_stats
and RETRY until confirmed (the validity gate). Never assumes the exploit fired. Node-side.
…tion)

Reword from offensive 'exploit' to detection-signal-generation language: this validates
the kubescape->DX->AE detection chain. No logic change.
… http2

- pxl.CompilePassthrough/Render: precompile per-table PxL once (fixed
  window => constant relative start_time), only the two time_ bounds are
  stamped per tick. Rendered output is byte-identical to QueryFor with an
  empty Target (TestCompilePassthrough_MatchesQueryFor), so this is a
  structural change, not a capture change. upid->pod/ns stays in PxL.
- passthrough: tickConcurrent fans every table out at once (was a serial
  loop); shared pull() helper. Sink/recorder are pool/HTTP-backed and
  already called concurrently elsewhere.
- drop http2_messages.beta from the firehose set (not materialised on
  every cluster => ""Table not found"" every tick); shared PixieTables/DDL
  lists untouched.
- toggle ADAPTIVE_PASSTHROUGH_COMPILED (default on; =false reverts to the
  legacy serial QueryFor path).
…e.go

Fixes "missing strict dependencies: import of .../internal/reconcile" that
broke the AE image build for passthrough, sink, streaming, controller, cmd
(pre-existing since the ADAPTIVE_RECONCILE commit added the package + imports
without bazel deps; never CI-built). Also wires the new pxl/compile.go srcs
+ passthrough/pxl test srcs (compiled_test.go, reconcile_test.go, compile_test.go).

- new internal/reconcile/BUILD.bazel (go_library, stdlib-only)
- +//internal/reconcile dep: passthrough, sink, streaming, controller, cmd
- pxl go_library +compile.go; pxl_test +compile_test.go; passthrough_test +compiled_test.go,reconcile_test.go (+reconcile dep)
F1 RCA: Pixie caps every px.display at max_output_rows_per_table (default
10000, query_flags.go) — the planner add_limit_to_batch_result_sink_rule
silently truncates wide firehose windows / busy pods at the READ (write path
is clean: ae_reconcile shows read==wrote). Fix uses Pixie own native knob —
prepend `#px:set max_output_rows_per_table=1000000` to every generated PxL
(QueryFor + CompilePassthrough) so all pull paths are uncapped. Validated on
rig: a 14208-row window returned 10000 (capped) vs 14298 (with flag). No
pagination loop, no extra round-trips. See memory project-ae-passthrough-10k-cap.
Measures the AdaptiveExport value prop: datavolume REDUCTION of DX-steered AE
(rev-3 streaming, AE writes only DX-steered activeSet pods over the control
surface) vs saving ALL data (passthrough firehose). Two arms, same fixed load,
forensic_db active-part deltas (rows+bytes) per table; reduction = 1 - DX/ALL.
Uses the canonical resolvable JNDI FQDN (attacker.attacker-ns.svc.cluster.local)
so the chain fires + DX can classify (a malformed host → NXDOMAIN → no steer).
Successor to ae_vs_all.sh, whose AE arm used the rev-2 controller gate + stale JNDI.
Measures all AE non-functional requirements under steady load on the rig:
throughput (rows+bytes/sec), capture completeness (AE read vs broker count =
F1 cap proof), write fidelity (read==wrote + write-error count), end-to-end
freshness latency (now - max(time_) in CH), resource footprint (AE pod cpu/mem
idle vs loaded), per-cycle cadence. Emits a structured report; companion to
exp_dx_steering_reduction.sh. Real-data only.
…ring + live-pod guard)

Run-1 reported false 100% reduction because stale adaptive_attribution windows
rehydrated DEAD pods (deleted loaders) into the activeSet → AE streamed dead pods
→ 0 rows. Clear adaptive_attribution before the DX arm so the activeSet only gets
freshly-steered LIVE pods; add a guard that prints the steered pods + marshalsec
fire count + live log4j-poc pods so a dead-arm result is caught, not reported as
a reduction.
lag query used now()-DateTime64 (type error -> na); use dateDiff(second,...).
Capture-completeness vs broker was window-misaligned (623% artifact) -> drop it;
report tot_read vs tot_wrote (read==wrote) + errs instead. The 10k-cap/completeness
proof is the dedicated F1 test (max_read>10000 vs broker for the SAME window).
…ary)

Run-2 byte-delta reduction came out negative because system.parts byte delta is
compaction-noisy (merges land mid-window). Report rows reduction as primary
(actual captured-row count, noise-free); keep bytes as secondary context.
Eviction-RCA finding (PR #63 NFR campaign): AE had NO memory limit (only cpu
300m) and was CPU-pinned at 300m under concurrent passthrough. AE measured tiny
(16-38Mi steady), but the raised 1M-row passthrough cap can spike, so cap at 1Gi
so AE can never memory-pressure a node; raise cpu limit 300m->1 core (was
throttling). NOTE node evictions were NOT AE/OOM — node-01 went NotReady
(network/heartbeat); the memory consumer is PEM (1365Mi, OOMs at the 2Gi default).
Root cause of the recurring "AE unauthenticated / writes 0 / crashloop" reverts:
kustomization.yaml bundled adaptive_export_secrets.yaml (placeholder pixie-api-key)
with the role+deployment, so EVERY infra re-apply (make log4j) clobbered the real
key that ae-auth had written. Separation of concerns: remove the secret from the
kustomization — infra (role+deployment) stays re-appliable; the secret holds real
creds and is owned solely by `make ae-auth`, created once, never touched by infra
re-applies. Secret manifest kept as a hand-applied seed-only template (documented).
The DX-steered arm was failing because the harness only fired stage-1 (JNDI/LDAP).
That generates ldap-egress but NO kubescape R0001 → backend never flagged → DX no
case → indeterminate → AE steers wrong/no pods. R0001 comes from stage-2 (post-
exploitation exec). fire() now does stage-1 (JNDI) + stage-2 (whoami/shadow/token/
getent in the backend) → kubescape R0001+R0006 → DX rules backend MALIGNANT →
backend enters AE activeSet → reduction is measurable. Verified live: DX evidence
unexpected-spawn+sensitive-file-read → verdict ruled_in generic=MALIGNANT.
…dd DX-steering diagnostics

Standing terminology rule: allowlist/blocklist, never whitelist/blacklist.
Pure rename (no behavior change) of the rev-3 streaming filter:
  FilterModeWhitelist  → FilterModeAllowlist
  MaxWhitelistSize     → MaxAllowlistSize
  ADAPTIVE_STREAM_MAX_WHITELIST → ADAPTIVE_STREAM_MAX_ALLOWLIST  (env)
  mode=whitelist log string → mode=allowlist
plus all comments/identifiers/tests in streaming, activeset, cmd/main.

DX-steering diagnostics (the reason DX-arm-writes-0 has been hard to RCA —
we could not tell "empty ActiveSet" from "broker returned 0 rows"):
  - scanner: log the empty-allowlist short-circuit (was silent) so an
    empty ActiveSet is visible in logs, distinct from "query completed rows=0".
  - FilterUpdater: emitted-filter log Debug→Info so the steered pod count
    per ActiveSet change is visible without debug logging.

NOTE: ADAPTIVE_STREAM_MAX_WHITELIST env renamed → tooling that sets the old
name must switch to ADAPTIVE_STREAM_MAX_ALLOWLIST.
The Pixie dx_evidence_graph UI reads dx_attack_graph via px.DataFrame
clickhouse_dsn, whose query template hardcodes event_time + hostname and
ORDER BY event_time. A table without those columns fails 'Unknown
identifier event_time'; a table created by hand (local, not via the
operator) isn't globally registered. Fix: make AE own it like the other
forensic tables.

- schema.sql: dx_attack_graph DDL with event_time(UInt64 nanos) + hostname,
  edge columns, fromUnixTimestamp64Nano partition/TTL (nanos-correct).
- KnownTables + OperatorOwnedTables: register it so Apply creates it at boot.
- apply_test: assert last-applied DDL == last OperatorOwnedTables entry
  (robust to appended operator tables) instead of hardcoding trigger_watermark.

go test ./.../clickhouse green.
Pixie's clickhouse_dsn type mapper reads UInt8 as BOOLEAN and does not
handle UInt16/UInt32/Float32 -> px fails with 'Column[N] given incorrect
type' rendering the dx_evidence_graph. weight/max_severity/num_findings
-> Int64, confidence -> Float64 (map cleanly to INT64/FLOAT64). Verified
live: px run returns all 6 edges with every column. event_time stays
UInt64 (matches kubescape_logs, which px reads).
Entlein and others added 7 commits June 18, 2026 15:42
…canner buildPxL

The DX/streaming arm silently capped each per-table pull at Pixie's default
10000-row limit while the passthrough/ALL arm (pxl.CompilePassthrough /
QueryFor) already raises it to 1,000,000 via the broker's #px:set query flag.
Validated live on 6a33dac0: a single streaming http_events pull returned
exactly rows=10000 (the cap). Left unfixed this UNDER-counts the DX arm and
OVERSTATES the DX-vs-ALL volume reduction. Prepend the same #px:set directive
to the streaming scanner's PxL so both arms are uncapped and comparable.
Adds the rule-ins-only view (condition != '') to the canonical schema.sql,
registers it in KnownTables + OperatorOwnedTables (after dx_attack_graph), and
teaches DDL() to extract CREATE VIEW headers. AE now creates it on boot so the
dx_evidence_graph UI's default malicious-only read is standard, not a per-rig
manual step. Tests updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dx POSTs a JSON array of edges to /dx/attack_graph; AE writes them to
forensic_db.dx_attack_graph via JSONEachRow (Applier.WriteAttackGraph). Wired in
main.go when CONTROL_ADDR is set; 501 if the sink is unset. This is the AE half
of the dx->AE->CH attack-graph write path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntract

adaptive_export/sink: content_type silent-drop contract suite
AE datavolume: DX-steered vs save-ALL reduction measurement
…licious-view

ae(clickhouse): dx_attack_graph_malicious view at boot
@coderabbitai coderabbitai Bot requested a review from entlein June 20, 2026 16:00
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Author

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f9d0bbb9-de8f-4bad-b565-bea80bacc066

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

@entlein entlein force-pushed the ae-prod branch 2 times, most recently from 3b5f551 to e0a19dd Compare June 21, 2026 20:50
Base automatically changed from ae-prod to main June 21, 2026 21:03
entlein pushed a commit that referenced this pull request Jun 22, 2026
…#67)

ae-prod's boundary handling accumulates the seen-fingerprint set on a no-progress
tick, but if >PollLimit rows share one normalized event_time the SQL
(>= watermark ORDER BY time LIMIT N, no secondary key) returns the same N boundary
rows every poll → rows beyond N at that timestamp are never emitted (infinite
boundary). Detect all-skipped-at-capacity and advance the watermark by 1ns to make
forward progress (fingerprint dedup already tolerates the 1ns overlap).

Cherry-picked the trigger fix + its test from the stale CodeRabbit-chat PR #67
(687851d); dropped that PR's unrelated gen-pod.tmpl.yaml churn. #67 itself is
NOT mergeable (77 commits behind ae-prod, re-adds deleted terraform).
@entlein

entlein commented Jun 22, 2026

Copy link
Copy Markdown

Closing — not mergeable and superseded.

This is a CodeRabbit-chat snapshot 77 commits behind ae-prod, CONFLICTING against main, and would re-add ~12k lines of terraform that ae-prod already deleted — merging/rebasing it would regress the AE branch.

Its one genuine change — the ClickHouse trigger watermark boundary-escape (687851d8c: advance the watermark 1ns when >PollLimit rows share one normalized event_time and all are boundary-skips, so the inclusive >= watermark LIMIT N query can't loop on the same N rows forever) — is a real gap ae-prod's accumulate-seen approach doesn't cover. I cherry-picked just that fix + its test onto the AE followup #68 (dropped this PR's unrelated gen-pod.tmpl.yaml churn). So the fix is preserved; this stale PR is not needed.

@entlein entlein closed this Jun 22, 2026
ConstanzeTU pushed a commit that referenced this pull request Jun 24, 2026
… → 73.7% cov)

The trigger's incremental kubescape-events pump has 3 moving parts
that must agree: watermark advancement, boundary fingerprint dedup,
and PollLimit-saturated draining (PR #67 fix). Existing tests cover
each in isolation. This new file pins them TOGETHER against the
simplest possible reference ("drain everything in event_time order,
dedupe by fingerprint, advance the cursor to max(event_time)"). If
the iterative trigger and the reference disagree on the set of
emitted rows for ANY poll sequence, one of the three parts is wrong.

Four oracle tests:

  TestOracle_TriggerEmitsNaiveSet_StaggeredCorpus — 50 rows scattered
  across distinct event_times, PollLimit=10 forces ≥5 polls; trigger
  must emit exactly the 50 rows the naive reference computes.

  TestOracle_PollLimitSaturation_AtCapacity — regression guard for PR
  #67 (dfdc465): when EXACTLY PollLimit rows share a boundary
  event_time, every one of them must emit, and the cursor must clear
  the boundary for the next-event_time row that follows.

  TestOracle_PollLimitOverflow_DocumentsLossBound — pins PR #67's
  intentional trade-off: when >PollLimit rows share a boundary
  event_time, the first PollLimit emit, then the 1ns escape advances
  the cursor past the surplus. Lock-in test: if a future fix recovers
  the surplus (good!) this test fails loudly so both can update in
  lock-step instead of regressing silently.

  TestOracle_BoundaryDedup_NoDuplicates — when CH returns the same
  boundary row in two consecutive polls (real production case: a new
  row lands at the same event_time after a previous poll's cursor
  advanced past it), the seenAtBoundary map must filter the duplicate.

Coverage: trigger 71.6% → 73.7% (statement-level). The oracle's value
isn't in statement count — it's in property-level coverage on
invariants the per-feature tests can't enforce together.

Build.bazel: adds oracle_test.go to the existing trigger_test target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant