perf: allow V1 incremental path to skip config change metadata queries#1403
perf: allow V1 incremental path to skip config change metadata queries#1403moomindani wants to merge 3 commits intodatabricks:mainfrom
Conversation
databricks#1402) Replace inline config change detection/application in the V1 incremental path with the existing `process_config_changes()` macro already used by V2. This allows users to set `incremental_apply_config_changes: false` to skip 8 unnecessary information_schema queries per incremental model execution. Co-authored-by: Isaac
Co-authored-by: Isaac
There was a problem hiding this comment.
Hi @moomindani
Thanks for the PR.
Now that this code path uses process_config_changes, it adds column_tags, column_masks, and comment to the V1 incremental path as well. Can you please add V1 functional tests to validate these changes?
Also please add a functional test for behaviour when incremental_apply_config_changes is set to false and that the metadata are not fetched.
| {% endif %} | ||
| {%- endif -%} | ||
| {{ process_config_changes(target_relation) }} | ||
| {% do persist_docs(target_relation, model, for_relation=True) %} |
There was a problem hiding this comment.
Doesn't process_config_changes take care of persist_docs as well?
- Remove persist_docs call from V1 incremental merge path since process_config_changes -> apply_config_changeset already handles comment and column_comments changes (aligns with V2 behavior) - Add V1 functional tests for column_tags and column_masks changes - Add functional test verifying incremental_apply_config_changes=false skips all metadata fetch queries (fetch_tags, fetch_column_tags, fetch_non_null_constraint_columns, fetch_primary_key_constraints, fetch_foreign_key_constraints, fetch_column_masks) Co-authored-by: Isaac
|
Thanks for the review! Addressed both points in 2882e9c. Re: Yes — Re: functional tests Added three tests:
|
Resolves #1402
Description
The V1 incremental materialization path calls
adapter.get_relation_config()unconditionally during every incremental merge run. This triggers 8 sequential metadata queries againstinformation_schemaandsystemtables — even when none of the related features (tags, constraints, column masks) are in use.The V2 path already uses
process_config_changes(), which respects theincremental_apply_config_changesconfig flag. This PR replaces the V1 inline code with the same macro, bringing V1 in line with V2.Change:
{{ process_config_changes(target_relation) }}callQueries eliminated when
incremental_apply_config_changes: false:SELECT ... FROM system.information_schema.table_tagsSELECT ... FROM system.information_schema.column_tagsSELECT ... FROM information_schema.columns(NOT NULL constraints)SELECT ... FROM information_schema.key_column_usage(PRIMARY KEY)SELECT ... FROM information_schema.key_column_usage(FOREIGN KEY)SELECT ... FROM system.information_schema.column_masksSHOW TBLPROPERTIESDESCRIBE TABLE EXTENDEDMeasured results (Serverless SQL Warehouse, incremental merge model with
auto_liquid_cluster: true):incremental_apply_config_changes: falseget_relation_configoverheadTest assets used for benchmarking
dbt_project.yml:
models/incremental_merge_test.sql (default run):
{{ config( materialized='incremental', incremental_strategy='merge', unique_key='id', auto_liquid_cluster=true ) }} SELECT id, name, category, amount, updated_at FROM ( VALUES (1, 'Alice', 'A', 100.0, current_timestamp()), (2, 'Bob', 'B', 200.0, current_timestamp()), (3, 'Charlie', 'A', 150.0, current_timestamp()), (4, 'Diana', 'C', 300.0, current_timestamp()), (5, 'Eve', 'B', 250.0, current_timestamp()) ) AS t(id, name, category, amount, updated_at)models/incremental_merge_test.sql (skip run — only config diff):
{{ config( materialized='incremental', incremental_strategy='merge', unique_key='id', auto_liquid_cluster=true, incremental_apply_config_changes=false ) }}Steps:
dbt run(initial table creation) →dbt run(incremental merge, default) → addincremental_apply_config_changes=false→dbt run(incremental merge, skip). Comparedbt.logfrom the last two runs.Query-level breakdown (default run)
Query-level breakdown (skip run)
Functional parity:
apply_config_changeset(used byprocess_config_changes) handles all config types that the V1 inline code handled (tags, tblproperties, liquid_clustering, constraints) plus additional types (column_comments, column_tags, column_masks).When
incremental_apply_config_changesistrue(default), behavior is unchanged — all 8 queries run and config changes are applied.Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.