Skip to content

Core: Fix row ID assignment for EXISTING entry during a manifest merge#16263

Draft
amogh-jahagirdar wants to merge 1 commit intoapache:mainfrom
amogh-jahagirdar:existing-row-id-carryover
Draft

Core: Fix row ID assignment for EXISTING entry during a manifest merge#16263
amogh-jahagirdar wants to merge 1 commit intoapache:mainfrom
amogh-jahagirdar:existing-row-id-carryover

Conversation

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar commented May 9, 2026

We observed some cases where when EXISTING entries are carried over in an operation as part of a manifest merge, the first row ID assignment is not preserved and instead gets a new first row ID based on the inheritance rules which is unexpected (and is a spec violation).

@github-actions github-actions Bot added the core label May 9, 2026
}
};
} else {
// data file's first_row_id is null when the manifest's first_row_id is null
Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar amogh-jahagirdar May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still checking if changing idAssigner like this for all cases is really the right way to fix this or if during manifest merging we should actually pass in our own id assignment transformer that preserves the existing case.

Here we're changing the manifest reader because

// data file's first_row_id is null when the manifest's first_row_id is null

isn't true in the context of reading filtered manifests that have EXISTING entries which need to have first row IDs preserved but the filtered manifest does not have a first row ID.

Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

i ran the test on main and it failed. one nit on using rewriteFiles as it is deprecated

table.newAppend().appendFile(FILE_A).appendFile(FILE_B).commit();
// FILE_A→0, FILE_B→125; nextRowId=225
table.updateProperties().set(TableProperties.MANIFEST_MIN_MERGE_COUNT, "1").commit();
table.newRewrite().rewriteFiles(Set.of(FILE_A), Set.of(FILE_C)).commit();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table.newRewrite().rewriteFiles(Set.of(FILE_A), Set.of(FILE_C)).commit();
table.newRewrite().deleteFile(FILE_A).addFile(FILE_C).commit();

nit: rewriteFiles is deprecated


return entry;
};
// preserve any existing file-level first_row_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a more explanatory comment here?

Suggested change
// preserve any existing file-level first_row_id
// We must preserve existing row id assignments. Existing entries
// carry their own file-level first_row_id from the source manifest,
// even though intermediate manifests may have a null first_row_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants