Skip to content

feat: Address pagination API gap with aligned approach#345

Open
sichanyoo wants to merge 5 commits into
decaffrom
feat/pagination-api-gap
Open

feat: Address pagination API gap with aligned approach#345
sichanyoo wants to merge 5 commits into
decaffrom
feat/pagination-api-gap

Conversation

@sichanyoo

@sichanyoo sichanyoo commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Background & Decisions

Responses are Struct subclasses, and Ruby's Struct includes Enumerable by default. Without this fix, calling .each on a paginated response silently iterated struct field values instead of pages, a possible silent data-loss bug. This PR fixes the gap by adding .each to PageableResponse (delegating to each_page), undefining each on Smithy::Schema::Structure as a safety net, and introducing a PageEnumerator class that exposes only safe enumeration methods (map, select, flat_map, reduce,first, take, lazy) while blocking dangerous ones (count, sort, to_a, etc.) that would silently trigger full pagination with real API calls and real cost.

We chose a custom PageEnumerator over mixing Enumerable into the response because starting with a narrow surface and expanding later is non-breaking, whereas removing a dangerous method after customers depend on it is breaking.

.each iterates pages rather than items because items is optional in Smithy's @paginated trait. About 21% of paginated operations at AWS have no items defined currently, and some operations return multiple result collections, making page iteration the only universally safe default.

@sichanyoo sichanyoo left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for additional context

@@ -0,0 +1,40 @@
# frozen_string_literal: true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the custom enumerator with limited API surface to safeguard against unsafe Enumerable methods

@@ -4,6 +4,10 @@ module Smithy
module Schema
# A module mixed into Structs that provides utility methods for Structure shapes.
module Structure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add each undef to prevent silent data loss in the case pageable_response plugin doesn't run for whatever reason (e.g., customer customization on plugin list that leaves it out)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better if you turn the above into a code comment for future us.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it as inline comment.

@@ -36,6 +36,13 @@ def initialize(response)
# This yields one response object per API call made. The SDK retrieves additional
# pages of data to complete the request.
#

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates the PageableResponse module that gets added to Response to return the custom enumerator when not give a block, and if given a block, enumerate using the custom enumerator and yield results. As it has been, PageableResponse#each will paginate over pages instead of items due to uniqueness of AWS background discussed offline.

@jterapin jterapin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach looks good but curious what we can do about the enumerator being created at every method call. 🤔

Also - I was wondering if the PR description could contain overall info about the motivation of this approach and what decisions we make.

We should also have wangrch take a look when he comes back.

Comment on lines +35 to +37
def enum
Enumerator.new(&@block)
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current PageEnumerator creates a new Enumerator on every method call (map, select, first, etc.) via the private enum method. Is there a way to prevent this or was this intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation cost is negligible compared to the API calls each iteration triggers & returning brand new enumerator each time I think has a subtle benefit of thread safety with interleaved paging operations when it's used concurrently. But changing it to reuse enumerator is a one line change so if this is based on rubyist convention / you feel strongly for reusing, we can do that

def reduce(*, &) = enum.reduce(*, &)

def first(val = (no_arg = true
nil))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix alignment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do: def first(*args) if we are just forwarding args?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Changed as suggested 👍

@@ -107,16 +126,29 @@ def each_page(&)
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about returning self here to enable chaining?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What could be the usecase for returning self to allow chaining here? If user doesn't provide a block, they get enumerator they can chain to already, and if user provides a block, whatever the user needs to do would be specified within that block. Fwiw, V3 also returns nil instead of self for each when user provides a block: https://github.com/aws/aws-sdk-ruby/blob/313cb58ad76dac97c27ff6d63b99eaadd5ad4d00/gems/aws-sdk-core/lib/aws-sdk-core/pageable_response.rb#L188-L196

@@ -4,6 +4,10 @@ module Smithy
module Schema
# A module mixed into Structs that provides utility methods for Structure shapes.
module Structure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better if you turn the above into a code comment for future us.

Comment thread gems/smithy-client/lib/smithy-client.rb Outdated
Comment on lines +29 to +30
require_relative 'smithy-client/paginators/page_enumerator'
require_relative 'smithy-client/paginators/pageable_response'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby convention: a directory paginators/ implies a module Paginators. But PageEnumerator and PageableResponse don't live under a Smithy::Client::Paginators module

Feels longwinded if we add a new module so maybe we should keep this flat.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Updated to keep the paginator file structure flat.

response = response.next_page
@paginator.items(response.data).each(&)
# @return [PageEnumerator, nil]
def each_item(&block)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API does not define items and customer calls .each_item, they'll get a NotImplementedError. Should we document this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a @raise doc comment.

@richardwang1124 richardwang1124 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, can take a second pass review once the comments are addressed.

Comment on lines +134 to +138
def each(&block)
return each_page unless block

each_page(&block)
end

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be one line? Maybe

def each(&block)
  each_page(&block)
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Changed as suggested 👍

@sichanyoo sichanyoo left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address initial review comments.

Comment on lines +35 to +37
def enum
Enumerator.new(&@block)
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation cost is negligible compared to the API calls each iteration triggers & returning brand new enumerator each time I think has a subtle benefit of thread safety with interleaved paging operations when it's used concurrently. But changing it to reuse enumerator is a one line change so if this is based on rubyist convention / you feel strongly for reusing, we can do that

def reduce(*, &) = enum.reduce(*, &)

def first(val = (no_arg = true
nil))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Changed as suggested 👍

@@ -107,16 +126,29 @@ def each_page(&)
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What could be the usecase for returning self to allow chaining here? If user doesn't provide a block, they get enumerator they can chain to already, and if user provides a block, whatever the user needs to do would be specified within that block. Fwiw, V3 also returns nil instead of self for each when user provides a block: https://github.com/aws/aws-sdk-ruby/blob/313cb58ad76dac97c27ff6d63b99eaadd5ad4d00/gems/aws-sdk-core/lib/aws-sdk-core/pageable_response.rb#L188-L196

Comment on lines +134 to +138
def each(&block)
return each_page unless block

each_page(&block)
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Changed as suggested 👍

response = response.next_page
@paginator.items(response.data).each(&)
# @return [PageEnumerator, nil]
def each_item(&block)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a @raise doc comment.

Comment thread gems/smithy-client/lib/smithy-client.rb Outdated
Comment on lines +29 to +30
require_relative 'smithy-client/paginators/page_enumerator'
require_relative 'smithy-client/paginators/pageable_response'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Updated to keep the paginator file structure flat.

@@ -4,6 +4,10 @@ module Smithy
module Schema
# A module mixed into Structs that provides utility methods for Structure shapes.
module Structure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it as inline comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants