binary type promotion by vustef · Pull Request #69 · RelationalAI/iceberg-rust

vustef · 2026-03-20T14:47:17Z

With this table:

CREATE OR REPLACE ICEBERG TABLE vustef_db_12092025.public.ibt_all_types (
			id INT,
			c_bigint BIGINT,
			c_float FLOAT,
			c_double DOUBLE,
			c_decimal DECIMAL(10,2),
			c_boolean BOOLEAN
			,c_date DATE
			,c_time TIME
			,c_timestamp TIMESTAMP_NTZ(6)
			,c_timestamptz TIMESTAMP_LTZ(6)
			,c_string STRING
			,c_uuid BINARY(16)
		)
			CATALOG = 'SNOWFLAKE'
			EXTERNAL_VOLUME = 'snowflake_managed';

INSERT INTO vustef_db_12092025.public.ibt_all_types
  (ID, C_BIGINT, C_FLOAT, C_DOUBLE, C_DECIMAL, C_BOOLEAN, C_DATE, C_TIME, C_TIMESTAMP, C_TIMESTAMPTZ, C_STRING, C_UUID)
SELECT column1, column2, column3, column4, column5, column6, column7, column8, column9, column10, column11, HEX_DECODE_BINARY(REPLACE(UUID_STRING(), '-', ''))
FROM VALUES
(11, 1000000010, 10.1::FLOAT, 10.0010::FLOAT, 99999.99::NUMBER(10,2), FALSE, '2025-10-25'::DATE, '19:30:00'::TIME, '2025-10-25 19:30:00'::TIMESTAMP_NTZ, '2025-10-25 19:30:00 +01:00'::TIMESTAMP_LTZ, 'juliet');

we uncovered a bug:

when we query whole table, we get an error "failed to process iceberg_next_batch with error: Error reading batch: Unexpected => Stream error: Unexpected => failed to read record batch, source: Unexpected => unexpected target column type FixedSizeBinary(16)" - this is an error for C_UUID column
when we only query C_UUID, it works
when we query any other column in addition to C_UUID, we get the error again.

Root Cause

Snowflake writes FIXED_LEN_BYTE_ARRAY(16) in Parquet for the fixed[16] column, but the Arrow Parquet reader decodes it as
Binary (not FixedSizeBinary(16))
arrow_schema_to_schema converts Binary to Primitive(Binary), while the Iceberg table metadata says the column is
Primitive(Fixed(16))
type_promotion_is_valid rejects (Binary, Fixed(16)) — there's no arm allowing this "promotion" (or rather, this type mapping
mismatch)
The column is excluded from the Parquet projection mask, so it's never read from the file
RecordBatchTransformer sees the column as "missing" and tries to fill it via ColumnSource::Add with None (null default)
create_primitive_array_repeated has no FixedSizeBinary arm, producing the error

The fix is at step 3: add (Some(PrimitiveType::Binary), Some(PrimitiveType::Fixed(_))) to type_promotion_is_valid,
since Binary in Parquet/Arrow is a valid representation of Fixed(N) in Iceberg. This would let the column be read from the file
normally, and then a cast or passthrough at the transformer level would handle it.

Why it works for C_UUID only

When you project only [12] and field 12 fails the type check, indices is empty → ProjectionMask::all() → all columns are read,
including C_UUID. Then the transformer sees C_UUID in the source because it was read (despite not being in the projection mask
properly). It gets column index 0 or whatever, and since FixedSizeBinary(16) comes through from the actual Parquet read, the
equals_datatype check in the transformer passes against the target type.

So: the projected scan "works" by accident — the empty-indices fallback reads everything, and the column happens to be there.

That's the full picture:

Full scan: 11 of 12 columns pass the type check → indices is non-empty → ProjectionMask::leaves(indices) excludes C_UUID →
transformer fills it with nulls → crashes on FixedSizeBinary not supported in create_primitive_array_repeated
Projected scan of only C_UUID: 0 of 1 columns pass → indices is empty → ProjectionMask::all() fallback → all columns read →
C_UUID is present → works by accident

What happens if the Binary data exceeds the Fixed(N) length?

With the current fix, type_promotion_is_valid just says "yes, this column is readable" — it lets the
Parquet reader include the column in the projection mask. The actual data flows through as-is
(Binary type from Arrow). The RecordBatchTransformer then checks
source_field.data_type().equals_datatype(target_type):

Source: Binary
Target: FixedSizeBinary(16)

These don't match, so it goes to ColumnSource::Promote, which calls arrow_cast::cast(Binary →
FixedSizeBinary(16)). Arrow's cast implementation will fail at runtime if any value's length != 16.
So you'd get an error like "cannot cast Binary to FixedSizeBinary(16): value at index N has length
M" — not silent corruption, but a hard error.

This is actually reasonable behavior: if a Parquet file claims to have Fixed(16) data but contains
values of a different length, that's a corrupt file, and erroring is correct.

binary type promotion

5d3f838

vustef requested review from gbrgr and mjschleich March 20, 2026 14:47

gbrgr approved these changes Mar 20, 2026

View reviewed changes

vustef enabled auto-merge March 20, 2026 14:49

vustef disabled auto-merge March 20, 2026 14:54

vustef added 2 commits March 20, 2026 16:01

add test

b400cbf

cargo fmt

24960f4

vustef enabled auto-merge March 20, 2026 15:01

vustef merged commit ff28348 into main Mar 20, 2026
19 checks passed

vustef deleted the vs-type-promotion-binary branch March 20, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary type promotion#69

binary type promotion#69
vustef merged 3 commits intomainfrom
vs-type-promotion-binary

vustef commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vustef commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Why it works for C_UUID only

What happens if the Binary data exceeds the Fixed(N) length?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vustef commented Mar 20, 2026 •

edited

Loading