fix(incremental-scan): treat EXISTING manifest entries as appends when from=None#74
Merged
fix(incremental-scan): treat EXISTING manifest entries as appends when from=None#74
Conversation
…n from=None When `from_snapshot_id` is None (scan from the beginning of time), the incremental scan now uses full-scan semantics instead of silently dropping EXISTING manifest entries. Previously, data files from expired snapshots that appear as EXISTING entries in later manifests were missed, causing incomplete results. Changes: - `IncrementalPlanContext` gains a `from_snapshot_id: Option<i64>` field to track whether the user provided an explicit starting snapshot. - `build_manifest_file_contexts`: when `from=None`, reads ALL manifests from `to_snapshot`'s manifest list without filtering by `added_snapshot_id`, and clears `filter_fn` so entries from expired snapshots pass through. - `plan_files`: when `from=None`, treats EXISTING data entries as appends (alongside Added entries) and skips Deleted entries; also skips `from_snapshot_collection` since equality deletes are already handled per-file via `AppendedFileScanTask`. - Adds regression test that simulates an expired snapshot: snapshot 1 is removed from the catalog metadata while its data files (referenced as EXISTING entries in snapshot 2's manifest) must still appear in the append stream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update version comments to match the exact tags that the pinned commit SHAs correspond to, fixing zizmor v1.23.1 ref-version-mismatch findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract `scan_and_verify` free function from `verify_incremental_scan` so tests can pass `Option<i64>` snapshot IDs and an arbitrary `Table`. Add `IncrementalTestFixture::with_expired_snapshots(n)` that rebuilds table metadata with the `n` oldest snapshots removed, encapsulating the inline JSON construction that was in the test body. The regression test drops from ~80 lines to ~20. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
from_snapshot_idisNone(scan from the beginning of time), the incremental scan was silently droppingEXISTINGmanifest entries, causing data from expired snapshots to go missingadded_snapshot_idmanifest filter, thefilter_fnentry filter, and theelse { Ok(()) }status checkfrom=None, switch to full-scan semantics — use onlyto_snapshot's manifest list (noadded_snapshot_idfilter), clearfilter_fn, and treatEXISTINGentries as appends alongsideAddedentriesTest plan
test_incremental_scan_from_none_includes_existing_entries_from_expired_snapshots— creates a 2-snapshot table, rebuilds the Table with only snapshot 2 in the metadata (simulating snapshot 1 being expired), and asserts thatfrom=Nonereturns all live rows including those from the expired snapshot🤖 Generated with Claude Code