Aliyun, Dell: Handle EOF in OSS and ECS InputStream read() variants#16266
Open
jbbqqf wants to merge 1 commit intoapache:mainfrom
Open
Aliyun, Dell: Handle EOF in OSS and ECS InputStream read() variants#16266jbbqqf wants to merge 1 commit intoapache:mainfrom
jbbqqf wants to merge 1 commit intoapache:mainfrom
Conversation
…e[], off, len) OSSInputStream and EcsSeekableInputStream forwarded the underlying stream's -1 EOF return to the caller, but unconditionally advanced pos / next and incremented the readBytes / readOperations metrics by the returned value (-1) on every EOF read. A sequential reader that loops until EOF would shift the position counter past the file size and skew metrics by one read per call. This is the same bug as the GCS / S3 / ADLS variant fixed in apache#16055 for the other cloud-storage InputStream implementations; apache#16062 called out that OSSInputStream and EcsSeekableInputStream were left unfixed. The fix follows the same shape: detect -1 from the underlying read, return it without mutating position or metrics, and mirror the guard for both single-byte and buffered overloads. Adds EOF regression tests to TestOSSInputStream and TestEcsSeekableInputStream covering both overloads. All four tests fail on origin/main and pass with this change. Resolves apache#16062 for the aliyun and dell modules. The remaining pieces (GCS, S3, ADLS) are tracked by apache#16055. Co-Authored-By: Claude Code <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OSSInputStream(aliyun) andEcsSeekableInputStream(dell) had the same EOF-handling bug fixed in #16055 forGCSInputStream/S3InputStream/ADLSInputStream: when the underlyingread()returns-1, the wrapper still advancedpos(andnext, in OSS's case) and incremented thereadBytes/readOperationsmetric counters by-1.This PR applies the same one-shape fix the maintainers picked in #16055 to the two modules left out of that PR, plus regression tests covering both overloads.
Resolves #16062 for
aliyunanddell. The remaining pieces (GCS,S3,ADLS) are still tracked by the in-flight #16055.Context
The original issue body for #16062 explicitly enumerates the four affected files; #16055 covers
GCS/S3/ADLSand the issue comment confirmsOSSInputStreamandEcsSeekableInputStreamare still pending.In Iceberg's hot path, files are read via known-offset range reads rather than sequential reads-to-EOF, so the bug is latent in production today. It still matters because:
read() == -1would silently corruptpos/nextand the metric counters.InputStream's contract — every other Iceberg*InputStreamalready returns-1cleanly after GCS, S3, ADLS: Handle EOF in inputStreams #16055.FileIOMetricsContextsnapshots that overshoot the file size.Changes
aliyun/src/main/java/org/apache/iceberg/aliyun/oss/OSSInputStream.javaread()andread(byte[], int, int)now check the underlying return value and, if-1, return immediately without mutatingpos/nextor incrementingreadBytes/readOperations. Same pattern as GCS, S3, ADLS: Handle EOF in inputStreams #16055'sS3InputStreamchange.dell/src/main/java/org/apache/iceberg/dell/ecs/EcsSeekableInputStream.java— same shape.aliyun/src/test/java/org/apache/iceberg/aliyun/oss/TestOSSInputStream.java— addstestReadSingleByteAtEofandtestReadBufferAtEof. Both run against the in-processAliyunOSSMockExtension(no live OSS access required).dell/src/test/java/org/apache/iceberg/dell/ecs/TestEcsSeekableInputStream.java— same two tests against the existingEcsS3MockRule.All four new tests fail on
origin/mainand pass with this change.Reproduce BEFORE/AFTER yourself (copy-paste)
The only thing that changes between BEFORE and AFTER is the checked-out ref of the production InputStream classes; the test files are the same in both runs.
What I ran locally
./gradlew :iceberg-dell:test --tests "...TestEcsSeekableInputStream"onorigin/mainwith the new tests imported → 6 tests, 2 failures (regression visible)../gradlew :iceberg-aliyun:test --tests "...TestOSSInputStream"onorigin/mainwith the new tests imported → 5 tests, 2 failures../gradlew :iceberg-aliyun:spotlessCheck :iceberg-dell:spotlessCheck→ PASS.Edge cases tested
read()past the last byte of a 4-byte object-1;getPos()stays at file size; idempotent on a second callTestEcsSeekableInputStream#testReadSingleByteAtEofread(byte[], 0, len)past EOF-1;getPos()stays at file sizeTestEcsSeekableInputStream#testReadBufferAtEofread()past EOF on an 8-byte object-1;getPos()stays; idempotentTestOSSInputStream#testReadSingleByteAtEofread(byte[], 0, dataSize+1)followed by another EOF buffered readdataSize, second returns-1;getPos()stays atdataSizeTestOSSInputStream#testReadBufferAtEofRisk / blast radius
*InputStreamclasses, mirroring the GCS/S3/ADLS shape that was already chosen by maintainers in GCS, S3, ADLS: Handle EOF in inputStreams #16055.InputStream.read()is documented to return-1at EOF without side effects, and that is now what these wrappers do.readBytes/readOperationsno longer drift on EOF reads. Pre-fix snapshots overcounted reads; post-fix they don't. This is observable but should be considered a fix, not a regression.Release note
PR drafted with assistance from Claude Code. The change was reviewed manually against
apache/iceberg's source onmainand against the corresponding fix shape in #16055. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim.