Skip to content

perf: reuse columns from process_schema_changes in incremental materialization#1412

Open
moomindani wants to merge 1 commit intodatabricks:mainfrom
moomindani:feat/reuse-schema-change-columns
Open

perf: reuse columns from process_schema_changes in incremental materialization#1412
moomindani wants to merge 1 commit intodatabricks:mainfrom
moomindani:feat/reuse-schema-change-columns

Conversation

@moomindani
Copy link
Copy Markdown

@moomindani moomindani commented Apr 21, 2026

Summary

Closes #1411.

The incremental materialization currently discards the return value of
process_schema_changes, so each downstream strategy macro (merge,
append, delete+insert) re-issues another DESCRIBE TABLE EXTENDED
on the target relation even though check_for_schema_changes has just
DESCRIBEd it. This PR reuses those columns, eliminating one metadata
round-trip per incremental model, per run.

Changes

  • dbt/include/databricks/macros/materializations/incremental/incremental.sql
    • Capture columns from process_schema_changes in both V1 and V2 paths.
    • When on_schema_change == 'ignore' (returns {}), fall back to a
      single adapter.get_columns_in_relation(existing_relation).
    • Thread the result through strategy_arg_dict['dest_columns']
      (previously hard-coded to none).
    • Extend get_build_sql with a dest_columns=none parameter so the
      V2 path can pass through.
  • dbt/include/databricks/macros/materializations/incremental/strategies.sql
    • databricks__get_merge_sql: only DESCRIBE the target when
      dest_columns is none.
    • get_delete_insert_sql: honor arg_dict['dest_columns'] when set.
    • get_insert_into_sql: accept a dest_columns=none parameter and
      honor it; databricks__get_incremental_append_sql now passes
      arg_dict['dest_columns'] through.
  • CHANGELOG.md: new entry under ## dbt-databricks nextUnder the Hood.

Behavior

  • When on_schema_change is 'fail', 'sync_all_columns', or
    'append_new_columns': process_schema_changes already DESCRIBEd
    both relations, so we reuse its result — one fewer DESCRIBE.
  • When on_schema_change == 'ignore': we issue exactly one DESCRIBE on
    the existing relation, matching today's total count for that path.
  • Existing public macro signatures are preserved. get_build_sql gains
    an optional keyword argument that defaults to none.

Test plan

Manually verified on a live Databricks SQL Warehouse with a project of
9 incremental stg models (on_schema_change: 'fail', 7 merge + 2
append strategies).

Target DESCRIBE TABLE EXTENDED … AS JSON count per incremental model:

Path Before After
V1 (use_materialization_v2: false) 2 1
V2 (use_materialization_v2: true) 2 1

Wall-clock impact on a full dbt run is within measurement noise at
this scale (9 small models, 16 threads); the saved round-trips get
absorbed by parallelism. The win here is fewer metadata round-trips
(lower warehouse load, less API traffic), not a dramatic wall-clock
speedup.

  • Ruff lint clean on changed files.

…alization

Incremental materialization previously discarded the return value of
`process_schema_changes`, causing each strategy macro (`merge`, `append`,
`delete+insert`) to issue a second `DESCRIBE TABLE EXTENDED` on the target
relation even though `check_for_schema_changes` had just DESCRIBEd it.

This change:
- captures the columns returned by `process_schema_changes` in both V1
  and V2 paths
- falls back to a single `adapter.get_columns_in_relation(existing_relation)`
  when `on_schema_change == 'ignore'`
- threads the result through `strategy_arg_dict['dest_columns']`
- teaches `databricks__get_merge_sql`, `get_delete_insert_sql`, and
  `get_insert_into_sql` to honor a pre-supplied `dest_columns` and skip
  their own `DESCRIBE` when provided

Net effect: one fewer `DESCRIBE TABLE EXTENDED … AS JSON` round-trip per
incremental model, per run.

Verified on a project with 9 incremental stg models (V1 path,
`on_schema_change: 'fail'`): target DESCRIBE count drops from 2 to 1 per
model across merge, append, and delete+insert strategies.

Resolves databricks#1411

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incremental strategies fire a redundant DESCRIBE on the target even after process_schema_changes

1 participant