Skip to content

DOC-6370 Snowflake source prep docs#2889

Open
andy-stark-redis wants to merge 7 commits intomainfrom
DOC-6370-rdi-snowflake
Open

DOC-6370 Snowflake source prep docs#2889
andy-stark-redis wants to merge 7 commits intomainfrom
DOC-6370-rdi-snowflake

Conversation

@andy-stark-redis
Copy link
Copy Markdown
Contributor

@andy-stark-redis andy-stark-redis commented Mar 13, 2026

Based on https://github.com/RedisLabs/redis-data-integration/blob/main/docs/snowflake-connector.md

The AI tool has expanded on the SQL in the original document, so please check that this is OK. Any other corrections or suggestions are, of course, most welcome :-)


Note

Low Risk
Adds a new documentation page only; no runtime code changes, so the main risk is inaccurate setup instructions for Snowflake permissions/auth.

Overview
Adds a new Prepare Snowflake for RDI guide describing how to use Snowflake as an RDI source (private preview) when deployed on Kubernetes/Helm.

The page documents required Snowflake grants (including change tracking prerequisites), supported auth methods (password or key-pair), Kubernetes secret setup, an example config.yaml for the Snowflake source, and troubleshooting/performance tips.

Reviewed by Cursor Bugbot for commit f7eb9f4. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 13, 2026

DOC-6370

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Mar 13, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown
Collaborator

@dwdougherty dwdougherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nitpick; otherwise, language LGTM.


RDI uses the [RIOTX](https://redis.github.io/riotx/) collector to stream data from Snowflake to Redis.
During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the
[streaming]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sets of bullets that this link points to doesn't have "streaming" as a keyword. Maybe add that streaming is CDC or something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dwdougherty Great catch, thanks!


During the [snapshot]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}}) phase, RDI reads the current state of the database using the JDBC driver. In the
[Change data capture (CDC)]({{< relref "/integrate/redis-data-integration/data-pipelines#pipeline-lifecycle" >}})
phase, RDI uses [Snowflake Streams](https://docs.snowflake.com/en/user-guide/streams) to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the initial snapshot and CDC both use a Snowflake STREAM. When initial snapshot is desired the SHOW_INITIAL_ROWS = TRUE option is set on the Snowflake STREAM.


## 1. Set up Snowflake permissions

The RDI user requires the following permissions to connect and capture data from Snowflake:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are likely not enough for this to work - see the "minimally required permission" section here: https://redis.github.io/riotx/databases/snowflake.html - I dont think we should link to that page but use it as a reference and copy what is required here


## 2. Configure authentication

RDI supports two authentication methods for Snowflake. You must configure one of these methods.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many production snowflake accounts require MFA for username and passwords. If you wish to create the RDI account using username and password you must designate it as a service account so it does not require MFA. See https://docs.snowflake.com/en/user-guide/security-mfa-rollout for more information


### Performance tuning

**High Snowflake API usage**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different way to word this besides High Snowflake API usage, maybe High Snowflake Warehouse usage

It is also worth noting that on each poll interval the RDI Snowflake connector uses a system function STREAM_HAS_DATA (https://docs.snowflake.com/en/sql-reference/functions/system_stream_has_data) to check if there is new data on the stream. This system function does not start a warehouse so will not incur warehouse compute time. If this function returns true, indicating there is new data to import then the next queries issued from the connector to read the data from the stream automatically spin up a warehouse as needed.

Copy link
Copy Markdown
Contributor

Added two follow-up commits to keep this docs PR aligned with the current Snowflake source implementation direction in RedisLabs/redis-data-integration#2532.

9c8d6504e updates the page to use the new config shape (connection.user, source-level schemas, and SCHEMA.table entries) and also corrects a few Snowflake-specific details in the original draft, including:

  • snapshot behavior using Snowflake streams
  • stronger minimum role/warehouse/schema guidance
  • the MFA/service-user note for password auth
  • troubleshooting and warehouse usage wording

324169c53 is a narrower follow-up on permissions. The earlier revision still understated one bootstrap requirement: if change tracking is not already enabled on a source table, the role creating the initial stream must own that table. That commit makes the docs explicit about the CHANGE_TRACKING = TRUE prerequisite and keeps future table/stream grants documented as optional admin convenience rather than minimum runtime permissions.

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Apr 19, 2026

❌ Jit Scanner failed - Our team is investigating

Jit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions.


💡 Need to bypass this check? Comment @sera bypass to override.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants