[AMORO-4006][ams] Fix base table snapshot expiration failure for keyed Mixed-Iceberg#4210
Open
fightBoxing wants to merge 2 commits intoapache:masterfrom
Conversation
added 2 commits
May 8, 2026 16:22
…d Mixed-Iceberg For Mixed-Iceberg tables managed by AMS internal REST catalog, the base store and the change store share the same server-side TableMetadata record and therefore the same meta_version. In MixedTableMaintainer the changeMaintainer.expireSnapshots() is invoked before baseMaintainer, and after it commits the shared meta_version has been bumped. The baseMaintainer however still holds the Iceberg Table instance captured at construction time, whose cached TableMetadata points to the old meta_version. When baseMaintainer then tries to commit its own snapshot expiration it trips the optimistic-lock check and fails with CommitFailedException, leaving the base store snapshots never expired. Refresh the base table metadata right after the change store commits so that the base store's subsequent expireSnapshots() commit starts from the latest version. A try/catch guards against transient refresh failures so they don't mask the original error signal. Closes apache#4006
…expiration refresh Adds two regression tests in TestSnapshotExpire covering issue apache#4006: 1. testRefreshBaseTableBeforeBaseExpiration — wraps the base maintainer's cached Iceberg Table with a Mockito spy via reflection and verifies that refresh() is invoked during MixedTableMaintainer.expireSnapshots(). Prior to the fix, no refresh() call happens on the base maintainer's cached table during the expire-snapshots flow, so this test fails without the fix and passes with it. 2. testExpireSnapshotsShrinksBothStores — end-to-end smoke test that exercises expireSnapshots() on a keyed Mixed-Iceberg table with expirable snapshots on both stores and asserts that both the base store and the change store snapshot lists shrink. Serves as a safety net for the happy path. The BasicCatalogTestHelper backend used by this test class cannot faithfully reproduce the shared-meta_version scenario (base and change stores live in independent Iceberg tables), so the first test asserts the fix behaviourally (refresh() is called), which is the direct code path introduced by the fix. Verified locally: with the fix applied both tests pass; after reverting MixedTableMaintainer to upstream/master the first test fails as expected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
Closes #4006.
For keyed Mixed-Iceberg tables managed by AMS's internal REST catalog, the base store and the change store share the same server-side
TableMetadatarecord (and therefore the samemeta_version). InMixedTableMaintainer.expireSnapshots()the change maintainer runs before the base maintainer:After the change maintainer commits, the shared
meta_versionhas been bumped on the server side, but thebaseMaintainer.tablereference was captured at construction time and its cachedTableMetadatastill points to the old version. The subsequent base commit then trips the optimistic-lock check and fails withCommitFailedException, so base-store snapshots are never actually expired — only the change store is cleaned up, causing steady metadata growth over time as reported in the issue.Brief change log
Fix —
MixedTableMaintainer.javachangeMaintainer.expireSnapshots()(and its@VisibleForTestingoverload) finishes, refresh the base maintainer's cached IcebergTablebefore running the base expiration, so the next commit starts from the latestmeta_version.try/catchthat logs a warning on failure — a transient refresh error must not mask the original error signal coming frombaseMaintainer.expireSnapshots().Regression tests —
TestSnapshotExpire.javatestRefreshBaseTableBeforeBaseExpiration— wrapsbaseMaintainer.tablewith a Mockito spy (injected via reflection) and verifies thatrefresh()is invoked duringexpireSnapshots(). The pre-fix code path does not callrefresh()inside the expire-snapshots flow, so this test fails without the fix and passes with it (verified locally by revertingMixedTableMaintainer.javatoupstream/master).testExpireSnapshotsShrinksBothStores— end-to-end smoke test that drivesexpireSnapshots()on a keyed table with multiple expirable snapshots on both stores and asserts that both snapshot lists shrink. Acts as a happy-path safety net.How was this patch tested?
Locally with JDK 11 + Maven against the reactor:
testRefreshBaseTableBeforeBaseExpirationtestExpireSnapshotsShrinksBothStoresupstream/masterrefresh()never calledmeta_version)Final result with the fix in place:
(8 = 4 parameterised configs × 2 methods; 4 skipped are non-keyed configs gated by
Assume.assumeTrue(isKeyedTable()).)Also verified:
mvn spotless:apply— no format violations.mvn -pl amoro-ams -am test-compile—BUILD SUCCESSacross the whole reactor.Documentation
Does this PR introduce any user-facing change?