feat: Address pagination API gap with aligned approach#345
Conversation
sichanyoo
left a comment
There was a problem hiding this comment.
Comments for additional context
| @@ -0,0 +1,40 @@ | |||
| # frozen_string_literal: true | |||
There was a problem hiding this comment.
This is the custom enumerator with limited API surface to safeguard against unsafe Enumerable methods
| @@ -4,6 +4,10 @@ module Smithy | |||
| module Schema | |||
| # A module mixed into Structs that provides utility methods for Structure shapes. | |||
| module Structure | |||
There was a problem hiding this comment.
Add each undef to prevent silent data loss in the case pageable_response plugin doesn't run for whatever reason (e.g., customer customization on plugin list that leaves it out)
There was a problem hiding this comment.
Probably better if you turn the above into a code comment for future us.
There was a problem hiding this comment.
Added it as inline comment.
| @@ -36,6 +36,13 @@ def initialize(response) | |||
| # This yields one response object per API call made. The SDK retrieves additional | |||
| # pages of data to complete the request. | |||
| # | |||
There was a problem hiding this comment.
Updates the PageableResponse module that gets added to Response to return the custom enumerator when not give a block, and if given a block, enumerate using the custom enumerator and yield results. As it has been, PageableResponse#each will paginate over pages instead of items due to uniqueness of AWS background discussed offline.
jterapin
left a comment
There was a problem hiding this comment.
The overall approach looks good but curious what we can do about the enumerator being created at every method call. 🤔
Also - I was wondering if the PR description could contain overall info about the motivation of this approach and what decisions we make.
We should also have wangrch take a look when he comes back.
| def enum | ||
| Enumerator.new(&@block) | ||
| end |
There was a problem hiding this comment.
The current PageEnumerator creates a new Enumerator on every method call (map, select, first, etc.) via the private enum method. Is there a way to prevent this or was this intentional?
There was a problem hiding this comment.
The allocation cost is negligible compared to the API calls each iteration triggers & returning brand new enumerator each time I think has a subtle benefit of thread safety with interleaved paging operations when it's used concurrently. But changing it to reuse enumerator is a one line change so if this is based on rubyist convention / you feel strongly for reusing, we can do that
| def reduce(*, &) = enum.reduce(*, &) | ||
|
|
||
| def first(val = (no_arg = true | ||
| nil)) |
There was a problem hiding this comment.
Can we do: def first(*args) if we are just forwarding args?
There was a problem hiding this comment.
Good catch! Changed as suggested 👍
| @@ -107,16 +126,29 @@ def each_page(&) | |||
| end | |||
There was a problem hiding this comment.
What about returning self here to enable chaining?
There was a problem hiding this comment.
Q: What could be the usecase for returning self to allow chaining here? If user doesn't provide a block, they get enumerator they can chain to already, and if user provides a block, whatever the user needs to do would be specified within that block. Fwiw, V3 also returns nil instead of self for each when user provides a block: https://github.com/aws/aws-sdk-ruby/blob/313cb58ad76dac97c27ff6d63b99eaadd5ad4d00/gems/aws-sdk-core/lib/aws-sdk-core/pageable_response.rb#L188-L196
| @@ -4,6 +4,10 @@ module Smithy | |||
| module Schema | |||
| # A module mixed into Structs that provides utility methods for Structure shapes. | |||
| module Structure | |||
There was a problem hiding this comment.
Probably better if you turn the above into a code comment for future us.
| require_relative 'smithy-client/paginators/page_enumerator' | ||
| require_relative 'smithy-client/paginators/pageable_response' |
There was a problem hiding this comment.
Ruby convention: a directory paginators/ implies a module Paginators. But PageEnumerator and PageableResponse don't live under a Smithy::Client::Paginators module
Feels longwinded if we add a new module so maybe we should keep this flat.
There was a problem hiding this comment.
I see! Updated to keep the paginator file structure flat.
| response = response.next_page | ||
| @paginator.items(response.data).each(&) | ||
| # @return [PageEnumerator, nil] | ||
| def each_item(&block) |
There was a problem hiding this comment.
If the API does not define items and customer calls .each_item, they'll get a NotImplementedError. Should we document this?
There was a problem hiding this comment.
Good point, added a @raise doc comment.
richardwang1124
left a comment
There was a problem hiding this comment.
Looks good overall, can take a second pass review once the comments are addressed.
| def each(&block) | ||
| return each_page unless block | ||
|
|
||
| each_page(&block) | ||
| end |
There was a problem hiding this comment.
I think this could be one line? Maybe
def each(&block)
each_page(&block)
end
There was a problem hiding this comment.
Good catch! Changed as suggested 👍
sichanyoo
left a comment
There was a problem hiding this comment.
Address initial review comments.
| def enum | ||
| Enumerator.new(&@block) | ||
| end |
There was a problem hiding this comment.
The allocation cost is negligible compared to the API calls each iteration triggers & returning brand new enumerator each time I think has a subtle benefit of thread safety with interleaved paging operations when it's used concurrently. But changing it to reuse enumerator is a one line change so if this is based on rubyist convention / you feel strongly for reusing, we can do that
| def reduce(*, &) = enum.reduce(*, &) | ||
|
|
||
| def first(val = (no_arg = true | ||
| nil)) |
There was a problem hiding this comment.
Good catch! Changed as suggested 👍
| @@ -107,16 +126,29 @@ def each_page(&) | |||
| end | |||
There was a problem hiding this comment.
Q: What could be the usecase for returning self to allow chaining here? If user doesn't provide a block, they get enumerator they can chain to already, and if user provides a block, whatever the user needs to do would be specified within that block. Fwiw, V3 also returns nil instead of self for each when user provides a block: https://github.com/aws/aws-sdk-ruby/blob/313cb58ad76dac97c27ff6d63b99eaadd5ad4d00/gems/aws-sdk-core/lib/aws-sdk-core/pageable_response.rb#L188-L196
| def each(&block) | ||
| return each_page unless block | ||
|
|
||
| each_page(&block) | ||
| end |
There was a problem hiding this comment.
Good catch! Changed as suggested 👍
| response = response.next_page | ||
| @paginator.items(response.data).each(&) | ||
| # @return [PageEnumerator, nil] | ||
| def each_item(&block) |
There was a problem hiding this comment.
Good point, added a @raise doc comment.
| require_relative 'smithy-client/paginators/page_enumerator' | ||
| require_relative 'smithy-client/paginators/pageable_response' |
There was a problem hiding this comment.
I see! Updated to keep the paginator file structure flat.
| @@ -4,6 +4,10 @@ module Smithy | |||
| module Schema | |||
| # A module mixed into Structs that provides utility methods for Structure shapes. | |||
| module Structure | |||
There was a problem hiding this comment.
Added it as inline comment.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Background & Decisions
Responses are Struct subclasses, and Ruby's Struct includes Enumerable by default. Without this fix, calling .each on a paginated response silently iterated struct field values instead of pages, a possible silent data-loss bug. This PR fixes the gap by adding .each to PageableResponse (delegating to each_page), undefining each on Smithy::Schema::Structure as a safety net, and introducing a PageEnumerator class that exposes only safe enumeration methods (map, select, flat_map, reduce,first, take, lazy) while blocking dangerous ones (count, sort, to_a, etc.) that would silently trigger full pagination with real API calls and real cost.
We chose a custom PageEnumerator over mixing Enumerable into the response because starting with a narrow surface and expanding later is non-breaking, whereas removing a dangerous method after customers depend on it is breaking.
.each iterates pages rather than items because items is optional in Smithy's @paginated trait. About 21% of paginated operations at AWS have no items defined currently, and some operations return multiple result collections, making page iteration the only universally safe default.