Skip to content

fix: import unity struct and array columns as structured ODCS types#1300

Merged
jochenchrist merged 5 commits into
mainfrom
fix/unity-import-complex-types
Jun 18, 2026
Merged

fix: import unity struct and array columns as structured ODCS types#1300
jochenchrist merged 5 commits into
mainfrom
fix/unity-import-complex-types

Conversation

@jochenchrist

Copy link
Copy Markdown
Contributor

Fixes the import side of #1280.

Problem

datacontract import unity emitted complex column types only as a flat Spark DDL string (physicalType: struct<value:bigint>, array<bigint>) with logicalType: object, while datacontract test and the ODCS spec expect the structured representation (nested properties for structs, items for arrays, logicalType: array for arrays).

Changes

  • The unity importer now parses Unity's type_json (Spark StructField JSON; StructField.fromJson is pure Python, no JVM or SparkSession needed) and emits nested properties for struct columns and items for array columns, reusing the spark importer's recursive conversion. Nested field comments are carried over. If pyspark is unavailable or type_json is missing/unparseable, it logs a warning and falls back to the previous flat output instead of failing the import.
  • map_type_from_sql now maps array<...> to logicalType: array (previously object), with explicit struct/map branches. This also corrects the sql and snowflake importers, which share the function.
  • Map columns keep the flat map<k,v> string in physicalType with logicalType: object for now: ODCS v3.1 has no map representation. Once ODCS v3.2 lands (logicalType: map with map.key/map.value, RFC 0030) and the model library supports it, the importer can emit structured maps from the already-parsed MapType in the same helper.

Tests

New fixture modeled on the issue's repro table (bigint, array<bigint>, struct<value:bigint>, plus array<struct<...>> and map<string,bigint>) with a test asserting the structured output and that the result passes lint. The issue's example table now imports as exactly the contract the reporter had to write by hand.

Note: this does not change the test engine's type checking, which still compares only top-level type categories (the v1.0.x part of the issue). The imported contracts now carry the nested type information, so deeper validation can build on this.

🤖 Generated with Claude Code

…1280)

The unity importer only emitted the flat Spark DDL type string
(physicalType: struct<value:bigint>) with logicalType: object for all
complex types. Now it parses Unity's type_json (Spark StructField JSON,
pure-Python parser, no JVM needed) and emits nested properties for
structs and items for arrays, reusing the spark importer's recursive
conversion. Arrays also get the correct logicalType: array via
map_type_from_sql, which fixes the sql and snowflake importers too.

Map columns keep the flat map<k,v> physicalType until ODCS v3.2 adds
logicalType: map (RFC 0030).
Comment thread datacontract/imports/sql_importer.py Outdated
@jochenchrist

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in d021298. The CHANGELOG.md conflict was resolved by combining the Unity import fix with the new features from main (Trino authentication, ClickHouse export). All Unity importer tests pass.

@jochenchrist jochenchrist merged commit e419e65 into main Jun 18, 2026
14 checks passed
@jochenchrist jochenchrist deleted the fix/unity-import-complex-types branch June 18, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants