Paginate by creation time instead of key order#96
Paginate by creation time instead of key order#96benthecarman wants to merge 1 commit intolightningdevkit:mainfrom
Conversation
|
👋 Thanks for assigning @tankyleo as a reviewer! |
defad39 to
cb70f98
Compare
|
🔔 1st Reminder Hey @tankyleo! This PR has been waiting for your review. |
tankyleo
left a comment
There was a problem hiding this comment.
We'll want to update types.rs to require this ordering too
dc205c9 to
cd32024
Compare
|
responded to review |
proto/vss.proto
Outdated
| message ListKeyVersionsResponse { | ||
|
|
||
| // Fetched keys and versions. | ||
| // Fetched keys and versions, ordered by creation time (oldest first). |
There was a problem hiding this comment.
To confirm here, the server is free to choose any ordering they would like in case two keys have the same timestamp.
There was a problem hiding this comment.
We have the server sort by key after timestamp. I'll add that here
There was a problem hiding this comment.
Yes, but I don't this is a requirement on the VSS API right ? Ie clients don't care what ordering is picked beyond newest first.
There was a problem hiding this comment.
I think it's better to me more explicit than less. We have to define a tiebreaker anyways so may as well put it in the docs
There was a problem hiding this comment.
Let me see this defines an API constraint right, in addition to just documentation ? For sure I see the use for documentation here, but I don't want clients to start relying on this behavior.
|
please request review when you are ready so i don't miss this PR |
6622747 to
c02dcda
Compare
| // Use this value to query for next-page of paginated `ListKeyVersions` operation, by specifying | ||
| // this value as the `page_token` in the next request. | ||
| // | ||
| // If `next_page_token` is empty (""), then the "last page" of results has been processed and |
There was a problem hiding this comment.
No, I think we were intentionally following protobuf's/Google's best practices here. https://google.aip.dev/158 states:
- Response messages for collections should define a string next_page_token field, providing the user with a page token that may be used to retrieve the next page.
- The field containing pagination results should be the first field in the message and have a field number of 1. It should be a repeated field containing a list of resources constituting a single page of results.
- If the end of the collection has been reached, the next_page_token field must be empty. This is the only way to communicate "end-of-collection" to users.
- If the end of the collection has not been reached (or if the API can not determine in time), the API must provide a next_page_token.
IMO would be good to revert and include this context in the docs.
c02dcda to
34d4241
Compare
|
🔔 1st Reminder Hey @tankyleo! This PR has been waiting for your review. |
tankyleo
left a comment
There was a problem hiding this comment.
Rushed some comments out, will be back after lunch.
rust/api/src/kv_store_tests.rs
Outdated
| } | ||
|
|
||
| let first_page = ctx.list(None, Some(page_size), None).await?; | ||
| assert_eq!(first_page.key_versions.len(), page_size as usize); |
There was a problem hiding this comment.
Given the VSS API, the page could have length 0 here. As long as the page token is not empty, the client would be expected to make another request with the new page token.
There was a problem hiding this comment.
I'm confused what you're saying here. This is the first page we are requesting, the only way we'd a length of 0 would be if we had no items.
There was a problem hiding this comment.
Just that I think a VSS-server is within bounds if it returns a response with an empty list, but a non-empty token. The client would be expected to continue asking for pages.
I'm mostly going from this line in the docs of ListKeyVersionRequest:
/// `page_size` is used by clients to specify the maximum number of results that can be returned by
/// the server.
/// The server may further constrain the maximum number of results returned in a single page.
/// If the `page_size` is 0 or not set, the server will decide the number of results to be returned.
#[prost(int32, optional, tag = "3")]
pub page_size: ::core::option::Option<i32>,
TLDR can't assume the page you get back is the same length as the page size in your ListKeyVersionsRequest
There was a problem hiding this comment.
TLDR can't assume the page you get back is the same length as the page size in your ListKeyVersionsRequest
This is true.
if it returns a response with an empty list, but a non-empty token.
But there is no reason to respond with empty list and non-empty pagination-token. (I don't think this should happen, can add a kvstore test/assert for it if doesn't exist already.)
|
Done with this pass here, just needed to add the comment about the VSS server API constraint |
|
Adding some historical context, VSS was purposefully built to be storage engine agnostic. The VSS protocol/API/data-model doesn't enforce a specific underlying storage database. We can totally revisit this requirement and say VSS only works with postgres-like relational databases. But extending pagination to order by creation_time limits the capability to use any modern pure KV store as an underlying database engine like DynamoDB or CosmosDB. I understand the need to efficiently paginate through large result sets. A couple of alternatives that preserve storage engine agnosticism:
|
34d4241 to
e17c676
Compare
|
Fixed @tankyleo comments about tests |
|
@G8XSU Thank you for the feedback, I'll be considering the tradeoffs over the next few days. Was wondering do you have an email where I can reach you ? Feel free to ping me at "hello at leonash dot net" I was wondering if we could get the vss-client crate at https://crates.io/crates/vss-client from you. |
|
@G8XSU my perspective
in theory, yes, but ldk-node isn't using this and migrating would be a huge lift
ldk-node uses txid/payment_hash as key so we can't really use prefixes
Yes, but then the pagination isn't really useful anymore. You just end up getting a random set of payments rather an actual ordered list.
imo postgres scales really well and you're scaling past something like postgres then it's likely you aren't using the off the shelf solution anymore and will have a whole team to manage things like stuff. And being able to sort by something useful rather than just lexicographical is worth it |
Clients often need to fetch recent payments or entries without iterating over the entire keyspace. Ordering by created_at lets callers retrieve the newest records first and stop early, which is not possible with lexicographic key ordering. The page token now encodes (created_at, key) so the cursor remains unique even when multiple rows share the same timestamp. A composite index on (user_token, store_id, created_at, key) keeps the query efficient, and a migration back-fills any NULL created_at values and adds the NOT NULL constraint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e17c676 to
36811fc
Compare
Yes I understand, but vss wasn't built with just keeping ldk-node payment-store in mind. Maybe it is not the right for payment-store currently. But tying vss interface to be relational db specific is a 1-way door decision, and such interface change hasn't been made to either ldk-node or ldk iiuc.
Hmm.. maybe, but why are we only thinking about payment store? Payment store requires PaginatedKvStore interface and on top of that it needs creation time based sorting. Imo the right solution would still be changing payment-storage to use something like uuidv7 so that it can work with any key based paginated store efficiently.
It is not about scaling at all but operational flexibility, can we get the change to sort by creation_time through ldk-storage interface? If not, then the similar reasons apply here as well i.e. clients want to use the storage that they want to use.
It's not about this server code in repo but interoperable protocol and api model. I understand it is limiting to some extent but I just wanted to flag the reason for having a conservative api/protocol approach here. |
Clients often need to fetch recent payments or entries without iterating over the entire keyspace. Ordering by created_at lets callers retrieve the newest records first and stop early, which is not possible with lexicographic key ordering.
The page token now encodes (created_at, key) so the cursor remains unique even when multiple rows share the same timestamp. A composite index on (user_token, store_id, created_at, key) keeps the query efficient, and a migration back-fills any NULL created_at values and adds the NOT NULL constraint.