From 51743243e5008ca0f23845ba6adf91aaac2d414e Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Fri, 13 Mar 2026 15:37:47 +0000 Subject: [PATCH 1/7] DOC-6370 started Snowflake source prep docs --- .../data-pipelines/prepare-dbs/snowflake.md | 302 ++++++++++++++++++ 1 file changed, 302 insertions(+) create mode 100644 content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md new file mode 100644 index 0000000000..f0daa18abd --- /dev/null +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -0,0 +1,302 @@ +--- +Title: Prepare Snowflake for RDI +alwaysopen: false +categories: +- docs +- integrate +- rs +- rdi +description: Prepare Snowflake databases to work with RDI +group: di +linkTitle: Prepare Snowflake +summary: Redis Data Integration keeps Redis in sync with the primary database in near + real time. +type: integration +weight: 20 +--- + +This guide describes the steps required to prepare a Snowflake database as a source for Redis Data Integration (RDI) pipelines. + +RDI uses the [RIOTX](https://redis.github.io/riotx/) collector to stream data from Snowflake to Redis. +During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the +[streaming]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) +phase, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to +capture changes related to the monitored tables. Note that RIOTX will automatically create and manage +the required streams. + +## Setup + +The following checklist shows the steps to prepare a Snowflake database for RDI, +with links to the sections that explain the steps in full detail. +You may find it helpful to track your progress with the checklist as you +complete each step. + +{{< note >}} +Snowflake is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Snowflake as a source database. +{{< /note >}} + +```checklist {id="snowflakelist"} +- [ ] [Set up Snowflake permissions](#1-set-up-snowflake-permissions) +- [ ] [Configure authentication](#2-configure-authentication) +- [ ] [Set up secrets for Kubernetes deployment](#3-set-up-secrets-for-kubernetes-deployment) +- [ ] [Configure RDI for Snowflake](#4-configure-rdi-for-snowflake) +``` + +## 1. Set up Snowflake permissions + +The RDI user requires the following permissions to connect and capture data from Snowflake: + +- `SELECT` on source tables +- `CREATE STREAM` permission (RIOTX automatically creates and manages Snowflake Streams for CDC) +- `USAGE` permission on the warehouse for query execution + +Grant the required permissions to your RDI user: + +```sql +-- Grant usage on the warehouse +GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE rdi_role; + +-- Grant usage on the database and schema +GRANT USAGE ON DATABASE MYDB TO ROLE rdi_role; +GRANT USAGE ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role; + +-- Grant SELECT on tables to capture +GRANT SELECT ON TABLE MYDB.PUBLIC.customers TO ROLE rdi_role; +GRANT SELECT ON TABLE MYDB.PUBLIC.orders TO ROLE rdi_role; + +-- Grant CREATE STREAM permission for CDC +GRANT CREATE STREAM ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role; + +-- Assign the role to your RDI user +GRANT ROLE rdi_role TO USER rdi_user; +``` + +## 2. Configure authentication + +RDI supports two authentication methods for Snowflake. You must configure one of these methods. + +### Password authentication + +Use standard username and password credentials. Store these securely using Kubernetes secrets (see step 3). + +### Private key authentication + +For enhanced security, use key-pair authentication: + +1. Generate a private key: + + ```bash + openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt + ``` + +1. Generate the public key: + + ```bash + openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub + ``` + +1. Register the public key with your Snowflake user: + + ```sql + ALTER USER rdi_user SET RSA_PUBLIC_KEY=''; + ``` + +## 3. Set up secrets for Kubernetes deployment + +Before deploying the RDI pipeline, configure the necessary secrets. + +### Password authentication + +```bash +kubectl create secret generic source-db \ + --namespace=rdi \ + --from-literal=SOURCE_DB_USERNAME=your_username \ + --from-literal=SOURCE_DB_PASSWORD=your_password +``` + +### Private key authentication + +Create a secret with the private key file: + +```bash +kubectl create secret generic source-db-ssl \ + --namespace=rdi \ + --from-file=client.key=/path/to/rsa_key.p8 +``` + +Also create the source-db secret with the username: + +```bash +kubectl create secret generic source-db \ + --namespace=rdi \ + --from-literal=SOURCE_DB_USERNAME=your_username +``` + +## 4. Configure RDI for Snowflake + +Use the following example configuration in your `config.yaml` file: + +```yaml +sources: + snowflake: + type: riotx + connection: + type: snowflake + url: "jdbc:snowflake://myaccount.snowflakecomputing.com/" + username: "${SOURCE_DB_USERNAME}" + password: "${SOURCE_DB_PASSWORD}" # Omit for key-pair auth + database: "MYDB" + schema: "PUBLIC" + warehouse: "COMPUTE_WH" + # role: "RDI_ROLE" # Optional: Snowflake role + # cdcDatabase: "CDC_DB" # Optional: Separate database for CDC streams + # cdcSchema: "CDC_SCHEMA" # Optional: Separate schema for CDC streams + tables: + customers: {} + orders: {} + advanced: + riotx: + poll: "30s" + snapshot: "INITIAL" # Or "NEVER" to skip initial snapshot + # streamLimit: 100000 # Optional: Max stream length + # clearOffset: false # Optional: Clear offset on start + +targets: + target: + connection: + type: redis + host: ${TARGET_DB_HOST} + port: ${TARGET_DB_PORT} + user: ${TARGET_DB_USERNAME} + password: ${TARGET_DB_PASSWORD} + +processors: + target_data_type: json +``` + +{{< note >}} +The Snowflake connector supports connecting to exactly one database and schema. All table names in the `tables` section are assumed to be in the configured database and schema. +{{< /note >}} + +### Snowflake connection properties + +| Property | Type | Required | Description | +|---------------|--------|----------|----------------------------------------------------------------| +| `type` | string | Yes | Must be `"snowflake"` | +| `url` | string | Yes | JDBC URL: `jdbc:snowflake://.snowflakecomputing.com/` | +| `username` | string | Yes | Snowflake username | +| `password` | string | No* | Snowflake password | +| `database` | string | Yes | Snowflake database name | +| `schema` | string | Yes | Snowflake schema name | +| `warehouse` | string | Yes | Snowflake warehouse name | +| `role` | string | No | Snowflake role name | +| `cdcDatabase` | string | No | Database for CDC streams (if different from source) | +| `cdcSchema` | string | No | Schema for CDC streams (if different from source) | + +* Either `password` or private key authentication is required. See [Configure authentication](#2-configure-authentication) for details. + +### Advanced RIOTX options + +Configure under `sources..advanced.riotx`: + +| Property | Type | Default | Description | +|---------------|---------|-------------|----------------------------------------| +| `poll` | string | `"30s"` | Polling interval for stream changes | +| `snapshot` | string | `"INITIAL"` | Snapshot mode: `INITIAL` or `NEVER` | +| `streamLimit` | integer | - | Maximum stream length (XTRIM MAXLEN) | +| `keyColumns` | array | - | Columns to use as message keys | +| `clearOffset` | boolean | `false` | Clear existing offset on start | +| `count` | integer | `0` | Limit records per poll (0 = unlimited) | + +## Troubleshooting + +### Connection issues + +**Error: "Failed to connect to Snowflake"** + +- Verify the account URL is correct (format: `.snowflakecomputing.com`) +- Check network connectivity to Snowflake +- Verify the warehouse is running and accessible +- Check firewall rules allow outbound HTTPS (port 443) + +**Error: "Authentication failed"** + +- For password auth: verify username and password are correct +- For key-pair auth: verify the private key matches the public key registered in Snowflake +- Ensure the user has appropriate permissions + +**Error: "Warehouse not found"** + +- Verify the warehouse name is correct +- Ensure the user has USAGE permission on the warehouse + +### CDC issues + +**No data appearing in Redis** + +1. Verify Snowflake Streams exist for target tables: + + ```sql + SHOW STREAMS IN SCHEMA my_database.my_schema; + ``` + +1. Check the polling interval configuration +1. Verify Redis connection is working +1. Check RIOTX collector logs: + + ```bash + kubectl logs -n rdi -l app=riotx-collector-source + ``` + +**Stale or missing changes** + +- Snowflake Streams have a retention period (default 14 days) +- If the collector was offline longer than retention, changes may be lost +- Consider using `clearOffset: true` to restart from current state + +### Performance tuning + +**High Snowflake API usage** + +- Increase `poll` interval (e.g., `"60s"` or `"120s"`) +- Use a dedicated warehouse for CDC operations + +**Redis memory concerns** + +- Set `streamLimit` to cap stream length +- Use `count` to limit records per poll batch + +**Initial snapshot too slow** + +- Use `snapshot: "NEVER"` to skip initial snapshot +- Pre-load data using other methods if needed + +### Enable debug logging + +Enable debug logging in the source configuration: + +```yaml +sources: + snowflake: + type: riotx + logging: + level: debug + # ... rest of configuration +``` + +View collector logs: + +```bash +kubectl logs -n rdi -l app=riotx-collector-source -f +``` + +## 5. Configuration is complete + +Once you have followed the steps above, your Snowflake database is ready for RDI to use. + +## See also + +- [Snowflake Streams Documentation](https://docs.snowflake.com/en/user-guide/streams) +- [Snowflake Key Pair Authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth) +- [RDI Deployment Guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}}) + From 0070eeffc2e790f8eabd4ba7caddb65936290d9a Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Fri, 13 Mar 2026 16:45:10 +0000 Subject: [PATCH 2/7] DOC-6370 replaced 'streaming' with 'CDC' --- .../data-pipelines/prepare-dbs/snowflake.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index f0daa18abd..c0061cae7a 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -19,7 +19,7 @@ This guide describes the steps required to prepare a Snowflake database as a sou RDI uses the [RIOTX](https://redis.github.io/riotx/) collector to stream data from Snowflake to Redis. During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the -[streaming]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) +[Change data capture (CDC)]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to capture changes related to the monitored tables. Note that RIOTX will automatically create and manage the required streams. From ab7a4770d66150859bc61d1f5781bef6bece2211 Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Fri, 10 Apr 2026 09:17:06 +0100 Subject: [PATCH 3/7] DOC-6370 remove RIOT-X mentions in text --- .../data-pipelines/prepare-dbs/snowflake.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index c0061cae7a..1b3eed6e34 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -16,12 +16,11 @@ weight: 20 --- This guide describes the steps required to prepare a Snowflake database as a source for Redis Data Integration (RDI) pipelines. - -RDI uses the [RIOTX](https://redis.github.io/riotx/) collector to stream data from Snowflake to Redis. + During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the [Change data capture (CDC)]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to -capture changes related to the monitored tables. Note that RIOTX will automatically create and manage +capture changes related to the monitored tables. Note that RDI will automatically create and manage the required streams. ## Setup @@ -47,7 +46,7 @@ Snowflake is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode do The RDI user requires the following permissions to connect and capture data from Snowflake: - `SELECT` on source tables -- `CREATE STREAM` permission (RIOTX automatically creates and manages Snowflake Streams for CDC) +- `CREATE STREAM` permission (RDI automatically creates and manages Snowflake Streams for CDC) - `USAGE` permission on the warehouse for query execution Grant the required permissions to your RDI user: @@ -195,7 +194,7 @@ The Snowflake connector supports connecting to exactly one database and schema. * Either `password` or private key authentication is required. See [Configure authentication](#2-configure-authentication) for details. -### Advanced RIOTX options +### Advanced configuration options Configure under `sources..advanced.riotx`: @@ -242,7 +241,7 @@ Configure under `sources..advanced.riotx`: 1. Check the polling interval configuration 1. Verify Redis connection is working -1. Check RIOTX collector logs: +1. Check the collector logs: ```bash kubectl logs -n rdi -l app=riotx-collector-source From 9c8d6504ec66e3643f45572fdd271fdf4d931d6e Mon Sep 17 00:00:00 2001 From: Jeremy Plichta Date: Sun, 19 Apr 2026 09:23:15 -0600 Subject: [PATCH 4/7] Align Snowflake docs with updated RDI config --- .../data-pipelines/prepare-dbs/snowflake.md | 105 ++++++++++++------ 1 file changed, 69 insertions(+), 36 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index 1b3eed6e34..c55bb8d980 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -16,12 +16,12 @@ weight: 20 --- This guide describes the steps required to prepare a Snowflake database as a source for Redis Data Integration (RDI) pipelines. - -During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the + +During both the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) and [Change data capture (CDC)]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) -phase, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to -capture changes related to the monitored tables. Note that RDI will automatically create and manage -the required streams. +phases, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to read data from the monitored +tables. For the initial snapshot, RDI creates the stream with `SHOW_INITIAL_ROWS = TRUE` so it can read the current +table contents before continuing with ongoing CDC. RDI automatically creates and manages the required streams. ## Setup @@ -43,19 +43,25 @@ Snowflake is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode do ## 1. Set up Snowflake permissions -The RDI user requires the following permissions to connect and capture data from Snowflake: +The RDI user requires permissions to read the source tables and to create the Snowflake objects RDI uses for CDC: + +- `USAGE`, `OPERATE` on the warehouse used for RDI reads +- `USAGE` on the source database and source schema +- `SELECT` on the source tables +- `USAGE` on the CDC schema used by RDI +- `CREATE STREAM`, `CREATE TABLE` on the CDC schema used by RDI -- `SELECT` on source tables -- `CREATE STREAM` permission (RDI automatically creates and manages Snowflake Streams for CDC) -- `USAGE` permission on the warehouse for query execution +If you configure `cdcDatabase` and `cdcSchema`, grant the CDC permissions there. Otherwise, grant them in the source +schema. If your Snowflake setup requires it, also grant any additional cross-database privileges needed for the CDC +schema to reference the source tables. Grant the required permissions to your RDI user: ```sql -- Grant usage on the warehouse -GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE rdi_role; +GRANT USAGE, OPERATE ON WAREHOUSE COMPUTE_WH TO ROLE rdi_role; --- Grant usage on the database and schema +-- Grant usage on the source database and schema GRANT USAGE ON DATABASE MYDB TO ROLE rdi_role; GRANT USAGE ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role; @@ -63,8 +69,9 @@ GRANT USAGE ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role; GRANT SELECT ON TABLE MYDB.PUBLIC.customers TO ROLE rdi_role; GRANT SELECT ON TABLE MYDB.PUBLIC.orders TO ROLE rdi_role; --- Grant CREATE STREAM permission for CDC -GRANT CREATE STREAM ON SCHEMA MYDB.PUBLIC TO ROLE rdi_role; +-- Grant permissions on the schema RDI uses for CDC objects +GRANT USAGE ON SCHEMA MYDB.RDI_CDC TO ROLE rdi_role; +GRANT CREATE STREAM, CREATE TABLE ON SCHEMA MYDB.RDI_CDC TO ROLE rdi_role; -- Assign the role to your RDI user GRANT ROLE rdi_role TO USER rdi_user; @@ -78,6 +85,13 @@ RDI supports two authentication methods for Snowflake. You must configure one of Use standard username and password credentials. Store these securely using Kubernetes secrets (see step 3). +{{< note >}} +Many Snowflake accounts require MFA for password-based sign-ins. If you want to use password authentication for RDI, +configure the Snowflake user as a service user that is allowed to authenticate non-interactively. Otherwise, use +private key authentication instead. For more information, see the Snowflake +[MFA rollout documentation](https://docs.snowflake.com/en/user-guide/security-mfa-rollout). +{{< /note >}} + ### Private key authentication For enhanced security, use key-pair authentication: @@ -142,22 +156,26 @@ sources: connection: type: snowflake url: "jdbc:snowflake://myaccount.snowflakecomputing.com/" - username: "${SOURCE_DB_USERNAME}" + user: "${SOURCE_DB_USERNAME}" password: "${SOURCE_DB_PASSWORD}" # Omit for key-pair auth database: "MYDB" - schema: "PUBLIC" warehouse: "COMPUTE_WH" # role: "RDI_ROLE" # Optional: Snowflake role # cdcDatabase: "CDC_DB" # Optional: Separate database for CDC streams # cdcSchema: "CDC_SCHEMA" # Optional: Separate schema for CDC streams + schemas: + - PUBLIC tables: - customers: {} - orders: {} + PUBLIC.customers: {} + PUBLIC.orders: {} advanced: riotx: poll: "30s" snapshot: "INITIAL" # Or "NEVER" to skip initial snapshot + # streamPrefix: "data:" # Optional: Redis stream prefix # streamLimit: 100000 # Optional: Max stream length + # keyColumns: # Recommended: stable key columns + # - "id" # clearOffset: false # Optional: Clear offset on start targets: @@ -174,7 +192,9 @@ processors: ``` {{< note >}} -The Snowflake connector supports connecting to exactly one database and schema. All table names in the `tables` section are assumed to be in the configured database and schema. +Snowflake uses one configured `database` and one or more source-level `schemas`. In the `tables` section, specify each +table as `SCHEMA.table`. Even when you configure only one schema, explicit `SCHEMA.table` names are recommended for +clarity. {{< /note >}} ### Snowflake connection properties @@ -183,10 +203,9 @@ The Snowflake connector supports connecting to exactly one database and schema. |---------------|--------|----------|----------------------------------------------------------------| | `type` | string | Yes | Must be `"snowflake"` | | `url` | string | Yes | JDBC URL: `jdbc:snowflake://.snowflakecomputing.com/` | -| `username` | string | Yes | Snowflake username | +| `user` | string | Yes | Snowflake user | | `password` | string | No* | Snowflake password | | `database` | string | Yes | Snowflake database name | -| `schema` | string | Yes | Snowflake schema name | | `warehouse` | string | Yes | Snowflake warehouse name | | `role` | string | No | Snowflake role name | | `cdcDatabase` | string | No | Database for CDC streams (if different from source) | @@ -194,18 +213,28 @@ The Snowflake connector supports connecting to exactly one database and schema. * Either `password` or private key authentication is required. See [Configure authentication](#2-configure-authentication) for details. +### Snowflake source properties + +| Property | Type | Required | Description | +|------------|--------|----------|------------------------------------------------------------------| +| `schemas` | array | Yes | Schema names to capture from | +| `tables` | object | Yes | Tables to capture, keyed as `SCHEMA.table` | + ### Advanced configuration options Configure under `sources..advanced.riotx`: -| Property | Type | Default | Description | -|---------------|---------|-------------|----------------------------------------| -| `poll` | string | `"30s"` | Polling interval for stream changes | -| `snapshot` | string | `"INITIAL"` | Snapshot mode: `INITIAL` or `NEVER` | -| `streamLimit` | integer | - | Maximum stream length (XTRIM MAXLEN) | -| `keyColumns` | array | - | Columns to use as message keys | -| `clearOffset` | boolean | `false` | Clear existing offset on start | -| `count` | integer | `0` | Limit records per poll (0 = unlimited) | +| Property | Type | Default | Description | +|----------------|---------|-------------|----------------------------------------------| +| `poll` | string | `"30s"` | Polling interval for stream changes | +| `snapshot` | string | `"INITIAL"` | Snapshot mode: `INITIAL` or `NEVER` | +| `streamPrefix` | string | `"data:"` | Prefix for the Redis stream written by RDI | +| `streamLimit` | integer | - | Maximum stream length (XTRIM MAXLEN) | +| `keyColumns` | array | - | Stable source columns to use as message keys | +| `clearOffset` | boolean | `false` | Clear existing offset on start | +| `count` | integer | `0` | Limit records per poll (0 = unlimited) | + +For reliable update and delete handling, define `keyColumns` with a stable business key or surrogate key when possible. ## Troubleshooting @@ -233,10 +262,10 @@ Configure under `sources..advanced.riotx`: **No data appearing in Redis** -1. Verify Snowflake Streams exist for target tables: +1. Verify Snowflake Streams exist in the CDC schema: ```sql - SHOW STREAMS IN SCHEMA my_database.my_schema; + SHOW STREAMS IN SCHEMA my_cdc_database.my_cdc_schema; ``` 1. Check the polling interval configuration @@ -244,21 +273,24 @@ Configure under `sources..advanced.riotx`: 1. Check the collector logs: ```bash - kubectl logs -n rdi -l app=riotx-collector-source + kubectl get deployments -n rdi | grep riotx-collector + kubectl logs -n rdi deployment/ ``` **Stale or missing changes** -- Snowflake Streams have a retention period (default 14 days) -- If the collector was offline longer than retention, changes may be lost +- Snowflake Streams depend on Snowflake change tracking and retention settings +- If the collector was offline longer than the available retention window, changes may be lost - Consider using `clearOffset: true` to restart from current state ### Performance tuning -**High Snowflake API usage** +**High Snowflake warehouse usage** - Increase `poll` interval (e.g., `"60s"` or `"120s"`) - Use a dedicated warehouse for CDC operations +- Each poll first calls Snowflake's `SYSTEM$STREAM_HAS_DATA` function to check whether the stream has new data. This + check does not start the warehouse; warehouse compute starts only when RDI reads rows from the stream. **Redis memory concerns** @@ -286,7 +318,8 @@ sources: View collector logs: ```bash -kubectl logs -n rdi -l app=riotx-collector-source -f +kubectl get deployments -n rdi | grep riotx-collector +kubectl logs -n rdi deployment/ -f ``` ## 5. Configuration is complete @@ -297,5 +330,5 @@ Once you have followed the steps above, your Snowflake database is ready for RDI - [Snowflake Streams Documentation](https://docs.snowflake.com/en/user-guide/streams) - [Snowflake Key Pair Authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth) +- [Snowflake MFA rollout documentation](https://docs.snowflake.com/en/user-guide/security-mfa-rollout) - [RDI Deployment Guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy" >}}) - From 324169c5386a95139991612657066e991985e034 Mon Sep 17 00:00:00 2001 From: Jeremy Plichta Date: Sun, 19 Apr 2026 09:35:06 -0600 Subject: [PATCH 5/7] Clarify Snowflake stream permission prerequisites --- .../data-pipelines/prepare-dbs/snowflake.md | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index c55bb8d980..eb35bad7f1 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -43,7 +43,8 @@ Snowflake is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode do ## 1. Set up Snowflake permissions -The RDI user requires permissions to read the source tables and to create the Snowflake objects RDI uses for CDC: +The following are the minimum runtime permissions for the RDI role to read the source tables and create the Snowflake +objects RDI uses for CDC: - `USAGE`, `OPERATE` on the warehouse used for RDI reads - `USAGE` on the source database and source schema @@ -55,6 +56,17 @@ If you configure `cdcDatabase` and `cdcSchema`, grant the CDC permissions there. schema. If your Snowflake setup requires it, also grant any additional cross-database privileges needed for the CDC schema to reference the source tables. +{{< note >}} +Before RDI can create the initial stream for a source table, Snowflake change tracking must already be enabled on that +table, or the role creating the initial stream must own the table. If the source tables are not owned by the RDI role, +ask a Snowflake administrator or table owner to enable change tracking first: + +```sql +ALTER TABLE MYDB.PUBLIC.customers SET CHANGE_TRACKING = TRUE; +ALTER TABLE MYDB.PUBLIC.orders SET CHANGE_TRACKING = TRUE; +``` +{{< /note >}} + Grant the required permissions to your RDI user: ```sql @@ -77,6 +89,15 @@ GRANT CREATE STREAM, CREATE TABLE ON SCHEMA MYDB.RDI_CDC TO ROLE rdi_role; GRANT ROLE rdi_role TO USER rdi_user; ``` +If you use centralized grant management, you can also add future grants in the CDC schema so newly created tables and +streams automatically receive the desired privileges. These grants are optional and are not part of the minimum runtime +permissions: + +```sql +GRANT SELECT ON FUTURE TABLES IN SCHEMA MYDB.RDI_CDC TO ROLE rdi_role; +GRANT SELECT ON FUTURE STREAMS IN SCHEMA MYDB.RDI_CDC TO ROLE rdi_role; +``` + ## 2. Configure authentication RDI supports two authentication methods for Snowflake. You must configure one of these methods. From 9feb9a72b253143bfa9ce9d1656daffb4bf088ad Mon Sep 17 00:00:00 2001 From: Jeremy Plichta Date: Sun, 19 Apr 2026 10:06:36 -0600 Subject: [PATCH 6/7] Refine Snowflake stream ownership guidance --- .../data-pipelines/prepare-dbs/snowflake.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index eb35bad7f1..2961300b0f 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -57,9 +57,13 @@ schema. If your Snowflake setup requires it, also grant any additional cross-dat schema to reference the source tables. {{< note >}} -Before RDI can create the initial stream for a source table, Snowflake change tracking must already be enabled on that -table, or the role creating the initial stream must own the table. If the source tables are not owned by the RDI role, -ask a Snowflake administrator or table owner to enable change tracking first: +RDI manages the Snowflake streams it uses for snapshot and CDC. The collector creates the stream in the configured CDC +schema and later issues `CREATE OR REPLACE STREAM` statements to keep the stream aligned with the expected offset, so +the RDI role must be able to create and own those stream objects in the CDC schema. + +There is one stricter bootstrap requirement for the first stream created on a source table: if Snowflake change +tracking is not already enabled on that table, only the table owner can create that initial stream. If the source +tables are not owned by the RDI role, ask a Snowflake administrator or table owner to enable change tracking first: ```sql ALTER TABLE MYDB.PUBLIC.customers SET CHANGE_TRACKING = TRUE; From f7eb9f49d0657c19bb8e040a7a2335810cf1e461 Mon Sep 17 00:00:00 2001 From: Jeremy Plichta Date: Sun, 19 Apr 2026 10:37:48 -0600 Subject: [PATCH 7/7] Mark Snowflake source docs as private preview --- .../data-pipelines/prepare-dbs/snowflake.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md index 2961300b0f..c047971901 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/snowflake.md @@ -11,6 +11,7 @@ group: di linkTitle: Prepare Snowflake summary: Redis Data Integration keeps Redis in sync with the primary database in near real time. +bannerText: Snowflake source support for Redis Data Integration is currently in private preview. Features and behavior are subject to change. General private preview terms apply. type: integration weight: 20 ---