Skip to content

feat(clickhouse): support configurable storage engine for events table#4271

Open
pdeaudney wants to merge 7 commits into
openmeterio:mainfrom
pdeaudney:feat/clickhouse-replicated-engine
Open

feat(clickhouse): support configurable storage engine for events table#4271
pdeaudney wants to merge 7 commits into
openmeterio:mainfrom
pdeaudney:feat/clickhouse-replicated-engine

Conversation

@pdeaudney

@pdeaudney pdeaudney commented May 3, 2026

Copy link
Copy Markdown

Adds aggregation.clickhouse.eventsTableEngine config block so self-hosted multi-node ClickHouse deployments can create the events table with ReplicatedMergeTree (with optional ON CLUSTER) instead of the hardcoded MergeTree. Default behavior is unchanged: ClickHouse Cloud users continue to rely on Cloud's transparent rewrite to SharedMergeTree.

The connector now plumbs an EventsTableEngine value through to the SQL builder. Cluster names are backtick-quoted in the emitted DDL so any identifier ClickHouse permits in <remote_servers> is accepted (including hyphens) without re-implementing identifier validation. ZooKeeper path and replica name are escaped against single-quote injection and validated for non-whitespace content.

Coverage:

  • Unit tests on rendered SQL for legacy default, explicit MergeTree, ReplicatedMergeTree with macros, ON CLUSTER rendering, hyphenated cluster names, embedded backtick escaping, and single-quote escaping in zk path
  • Validation tests for required fields, whitespace-only inputs, and unknown engine types in both the config and connector layers
  • YAML unmarshal coverage in TestComplete pinning the camelCase Viper key naming for all four engine fields
  • Live single-node integration tests (gated by TEST_CLICKHOUSE_DSN) exercising explicit MergeTree, default-zero-value, and invalid-config paths through the connector constructor
  • Live 2-node replicated integration test (gated by TEST_CLICKHOUSE_REPLICATED=1) that inserts on node-01 and reads back from node-02 to prove Keeper substituted {shard}/{replica} and replication is wired end-to-end

Adds docker-compose.replicated.yaml + make up-replicated/down-replicated targets bringing up a Keeper-backed 2-replica cluster on ports 38123/39000 and 38124/39001 (namespaced clear of the default stack), with server logs streaming to stdout via docker logs.

Adds docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md with the config reference, an offline RENAME + INSERT SELECT recipe for converting an existing MergeTree table, and the
system.distributed_ddl_queue / system.replicas checks operators should run between steps.

Refs #3050

Summary by CodeRabbit

  • New Features

    • Configure ClickHouse events table engine (MergeTree or ReplicatedMergeTree) with startup validation and MergeTree default.
    • Make targets to start/stop a local 2-node replicated ClickHouse test cluster.
  • Documentation

    • Migration guide for switching om_events to ReplicatedMergeTree, offline migration steps, single-shard alternative, and local test stack usage.
  • Tests

    • New unit and integration tests covering engine validation, SQL rendering, and replicated end-to-end behavior.

Adds aggregation.clickhouse.eventsTableEngine config block so self-hosted
multi-node ClickHouse deployments can create the events table with
ReplicatedMergeTree (with optional ON CLUSTER) instead of the hardcoded
MergeTree. Default behavior is unchanged: ClickHouse Cloud users continue
to rely on Cloud's transparent rewrite to SharedMergeTree.

The connector now plumbs an EventsTableEngine value through to the SQL
builder. Cluster names are backtick-quoted in the emitted DDL so any
identifier ClickHouse permits in <remote_servers> is accepted (including
hyphens) without re-implementing identifier validation. ZooKeeper path
and replica name are escaped against single-quote injection and validated
for non-whitespace content.

Coverage:

- Unit tests on rendered SQL for legacy default, explicit MergeTree,
  ReplicatedMergeTree with macros, ON CLUSTER rendering, hyphenated cluster
  names, embedded backtick escaping, and single-quote escaping in zk path
- Validation tests for required fields, whitespace-only inputs, and
  unknown engine types in both the config and connector layers
- YAML unmarshal coverage in TestComplete pinning the camelCase Viper key
  naming for all four engine fields
- Live single-node integration tests (gated by TEST_CLICKHOUSE_DSN)
  exercising explicit MergeTree, default-zero-value, and invalid-config
  paths through the connector constructor
- Live 2-node replicated integration test (gated by
  TEST_CLICKHOUSE_REPLICATED=1) that inserts on node-01 and reads back
  from node-02 to prove Keeper substituted {shard}/{replica} and
  replication is wired end-to-end

Adds docker-compose.replicated.yaml + make up-replicated/down-replicated
targets bringing up a Keeper-backed 2-replica cluster on ports
38123/39000 and 38124/39001 (namespaced clear of the default stack), with
server logs streaming to stdout via docker logs.

Adds docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md
with the config reference, an offline RENAME + INSERT SELECT recipe for
converting an existing MergeTree table, and the
system.distributed_ddl_queue / system.replicas checks operators should
run between steps.

Refs openmeterio#3050

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pdeaudney pdeaudney requested a review from a team as a code owner May 3, 2026 00:36
@coderabbitai

coderabbitai Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds configurable ClickHouse events table engine support (MergeTree or ReplicatedMergeTree) with validation and SQL/ON CLUSTER rendering, wires the engine through the streaming connector, adds unit and integration tests (gated replicated E2E), provides a 2-node replicated Docker Compose stack plus Make targets, and publishes a migration guide.

Changes

ClickHouse ReplicatedMergeTree Engine Support

Layer / File(s) Summary
Config Schema & Validation
app/config/aggregation.go
Adds ClickhouseEventsTableEngineConfig, engine type constants, Validate() and ResolvedType(), and imports strings for trimming.
Config Defaults & Fixtures
app/config/aggregation.go, app/config/testdata/complete.yaml, app/config/config_test.go
Defaults aggregation.clickhouse.eventsTableEngine.type to MergeTree; testdata and expected config updated with a ReplicatedMergeTree example.
SQL Types & CREATE TABLE Wiring
openmeter/streaming/clickhouse/event_query.go
Introduces EventsTableEngine type, Validate(), engineClause(), escaping/backtick helpers, and threads Engine into createEventsTable/toSQL() with optional ON CLUSTER.
Connector Integration
openmeter/streaming/clickhouse/connector.go, app/common/streaming.go
Connector Config gains EventsTableEngine; Config.Validate() validates it; NewStreamingConnector forwards resolved engine fields into the ClickHouse connector config.
Unit Tests (engine + SQL)
openmeter/streaming/clickhouse/event_query_test.go, app/config/aggregation_test.go
Adds table-driven tests for SQL rendering and for engine config validation/resolution, including escaping/quoting cases and defaulting behavior.
Integration Tests
openmeter/streaming/clickhouse/events_table_engine_test.go
New suite asserting table engine in system.tables, validation-failure behavior, and a gated replicated E2E test that creates the DB/table across replicas and verifies replication.
Local Replicated Stack & Make targets
docker-compose.replicated.yaml, Makefile
Adds 2-node replicated ClickHouse compose stack (Keeper + 2 nodes) and make up-replicated / make down-replicated targets.
Documentation / Migration Guide
docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md
New migration guide with config schema, new-deploy and offline migration procedures, backfill/verification steps, local test stack usage, and notes on defaults/Cloud behavior.

Sequence Diagram(s)

sequenceDiagram
    participant OM as OpenMeter
    participant Keeper as ClickHouse Keeper
    participant CH1 as ClickHouse Node 1
    participant CH2 as ClickHouse Node 2
    OM->>OM: Validate EventsTableEngine config
    OM->>Keeper: Connect / coordinate (ZooKeeper path)
    OM->>CH1: CREATE TABLE (ENGINE clause / ON CLUSTER if set)
    CH1->>CH2: Distributed DDL replication
    CH1->>Keeper: ReplicatedMergeTree metadata registration
    CH2->>Keeper: Replica registration
    OM->>CH1: Insert event (streaming.BatchInsert)
    CH1->>CH2: Replicate inserted row
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • tothandras
  • gergely-kurucz-konghq
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(clickhouse): support configurable storage engine for events table' clearly and concisely describes the main change: adding support for configurable storage engines (ReplicatedMergeTree vs MergeTree) for the ClickHouse events table.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docker-compose.replicated.yaml (1)

1-220: 💤 Low value

Well-designed test cluster!

A few nice touches: the imok healthcheck comment explaining BusyBox nc quirks, stdout logging so docker logs is actually useful, and internal_replication: true which is the right setting for this topology. The distributed_ddl path is also correctly wired so CREATE TABLE ... ON CLUSTER propagates properly.

One small note: the Keeper image on line 27 (clickhouse/clickhouse-keeper:25.12.3-alpine) isn't SHA-pinned, while both server images are. Totally fine for a dev stack, but pinning it would make the setup fully reproducible.

📌 Pin keeper image for full reproducibility
-    image: clickhouse/clickhouse-keeper:25.12.3-alpine
+    image: clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<keeper-image-sha>

Run docker pull clickhouse/clickhouse-keeper:25.12.3-alpine and grab the digest from docker inspect to fill in the SHA.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.replicated.yaml` around lines 1 - 220, The clickhouse-keeper
image in the clickhouse-keeper-01 service is not SHA-pinned (image:
clickhouse/clickhouse-keeper:25.12.3-alpine); update that image field to a
digest-pinned form (e.g.,
clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<digest>) to make the stack
fully reproducible—obtain the digest by running docker pull
clickhouse/clickhouse-keeper:25.12.3-alpine and docker inspect to copy the
sha256 digest, then replace the image value under the clickhouse-keeper-01
service.
openmeter/streaming/clickhouse/events_table_engine_test.go (1)

100-196: 💤 Low value

Clever defer trick and thorough end-to-end coverage. A couple of optional things to consider:

  1. Hardcoded node-02 address (line 176): The DSN 127.0.0.1:39001 is fine given the skip gate, but you could read it from an env var (e.g., TEST_CLICKHOUSE_REPLICA2_DSN) to make the test stack more flexible without breaking anything.

  2. Polling loop ceiling (lines 184–194): 3 s max wait is tight on a slow VM. A TEST_CLICKHOUSE_REPLICATION_TIMEOUT env var or bumping to 50 iterations wouldn't hurt, though it's probably fine for local use.

Neither is blocking — just flagging for future ergonomics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openmeter/streaming/clickhouse/events_table_engine_test.go` around lines 100
- 196, The test TestReplicatedMergeTree hardcodes the secondary ClickHouse DSN
and uses a fixed 30-iteration poll loop; change it to read node-02 DSN from an
env var (e.g., TEST_CLICKHOUSE_REPLICA2_DSN) when creating node02Opts and
fallback to the current "clickhouse://default:default@127.0.0.1:39001/openmeter"
if unset, and make the replication polling ceiling configurable via an env var
(e.g., TEST_CLICKHOUSE_REPLICATION_TIMEOUT or
TEST_CLICKHOUSE_REPLICATION_ATTEMPTS) used in the for-loop that scans into seen
(the loop around row := node02.QueryRow(...)); ensure default behavior remains
the same when env vars are absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md`:
- Around line 86-146: Update the migration steps that use the ClickHouse ON
CLUSTER macro (specifically the statements "RENAME TABLE openmeter.om_events TO
openmeter.om_events_legacy ON CLUSTER {cluster};", the "CREATE TABLE ... ON
CLUSTER {cluster} (...)" statement, and "DROP TABLE openmeter.om_events_legacy
ON CLUSTER {cluster} SYNC;") to include a short clarifying note that {cluster}
is a ClickHouse server-side <macros> value (not an auto-substituted placeholder)
and advise operators to either define the cluster macro in their ClickHouse
config or replace {cluster} with their literal cluster name (e.g.
openmeter_cluster) when running the DDL; add the note inline where each ON
CLUSTER usage appears so readers see the guidance for steps 2, 3 and 7.

---

Nitpick comments:
In `@docker-compose.replicated.yaml`:
- Around line 1-220: The clickhouse-keeper image in the clickhouse-keeper-01
service is not SHA-pinned (image: clickhouse/clickhouse-keeper:25.12.3-alpine);
update that image field to a digest-pinned form (e.g.,
clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<digest>) to make the stack
fully reproducible—obtain the digest by running docker pull
clickhouse/clickhouse-keeper:25.12.3-alpine and docker inspect to copy the
sha256 digest, then replace the image value under the clickhouse-keeper-01
service.

In `@openmeter/streaming/clickhouse/events_table_engine_test.go`:
- Around line 100-196: The test TestReplicatedMergeTree hardcodes the secondary
ClickHouse DSN and uses a fixed 30-iteration poll loop; change it to read
node-02 DSN from an env var (e.g., TEST_CLICKHOUSE_REPLICA2_DSN) when creating
node02Opts and fallback to the current
"clickhouse://default:default@127.0.0.1:39001/openmeter" if unset, and make the
replication polling ceiling configurable via an env var (e.g.,
TEST_CLICKHOUSE_REPLICATION_TIMEOUT or TEST_CLICKHOUSE_REPLICATION_ATTEMPTS)
used in the for-loop that scans into seen (the loop around row :=
node02.QueryRow(...)); ensure default behavior remains the same when env vars
are absent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f5546f08-5193-4e71-84f2-156111b962a8

📥 Commits

Reviewing files that changed from the base of the PR and between ff379af and 1bbb53a.

📒 Files selected for processing (12)
  • Makefile
  • app/common/streaming.go
  • app/config/aggregation.go
  • app/config/aggregation_test.go
  • app/config/config_test.go
  • app/config/testdata/complete.yaml
  • docker-compose.replicated.yaml
  • docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md
  • openmeter/streaming/clickhouse/connector.go
  • openmeter/streaming/clickhouse/event_query.go
  • openmeter/streaming/clickhouse/event_query_test.go
  • openmeter/streaming/clickhouse/events_table_engine_test.go

Comment thread docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md
pdeaudney and others added 2 commits May 3, 2026 10:47
Note added at the first ON CLUSTER {cluster} occurrence in the migration
guide, telling operators that {cluster} is a ClickHouse <macros> value
they may need to define (or replace with the literal cluster name).
The note extends to every subsequent ON CLUSTER {cluster} in the steps,
so we don't repeat it for each.

Addresses CodeRabbit feedback on PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The replicated compose stack was pinning every image to tag@sha256:...
except clickhouse-keeper, which was tag-only. Match the convention used
for clickhouse-server, kafka, postgres, redis, and svix in
docker-compose.base.yaml so this stack is reproducible across hosts and
not silently affected if the upstream :25.12.3-alpine tag is rebuilt.

Digest is the multi-platform manifest index, so the same line works on
both arm64 (local Mac dev) and amd64 (CI / Linux), matching the existing
clickhouse-server pin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@GAlexIHU GAlexIHU left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdeaudney for your contrib, it's likely something that will be well received by users!

I only have a few comments, once those are resolved I'd be happy to merge this :)

Comment thread openmeter/streaming/clickhouse/event_query_test.go
Comment thread openmeter/streaming/clickhouse/events_table_engine_test.go Outdated
Comment thread openmeter/streaming/clickhouse/event_query.go Outdated
… env-drive replicated tests

- Reject `Cluster` set with non-Replicated engine in both event_query.go
  and app/config/aggregation.go validation. MergeTree + ON CLUSTER
  produces independent (non-replicated) tables per node, which is
  almost never intended for the events table.
- Replace handrolled single-quote doubling with a NewReplacer that
  follows the ClickHouse string-literal grammar (mirrors clickhouse-go's
  stringQuoteReplacer): `\` -> `\\`, `'` -> `\'`.
- Drive the ReplicatedMergeTree integration test entirely from env vars
  (cluster, zk path, replica name, node-02 DSN, expected replica count)
  using the same skip-on-missing pattern as the existing
  TEST_CLICKHOUSE_DSN gate, so the test decouples from the local
  docker-compose stack.

Addresses PR review comments from @GAlexIHU.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md

@GAlexIHU GAlexIHU left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me now, thanks for addressing the earlier points, just one minor thing i found.

i'll gather another review from the team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/feature Release note: Exciting New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants