Skip to content

[SPARK-56369][SS] Introduce a new API "rangeScan/rangeScanWithMultiValues" in StateStore#55226

Open
HeartSaVioR wants to merge 7 commits intoapache:masterfrom
HeartSaVioR:SPARK-56369
Open

[SPARK-56369][SS] Introduce a new API "rangeScan/rangeScanWithMultiValues" in StateStore#55226
HeartSaVioR wants to merge 7 commits intoapache:masterfrom
HeartSaVioR:SPARK-56369

Conversation

@HeartSaVioR
Copy link
Copy Markdown
Contributor

@HeartSaVioR HeartSaVioR commented Apr 7, 2026

What changes were proposed in this pull request?

This PR proposes to introduce a new API "rangeScan/rangeScanWithMultiValues" in StateStore. This new API is mostly an optimization focused on RocksDB state store provider.

The new API receives startKey (inclusive) and endKey (exclusive), and provides valid entries in the range. The new API requires the column family to use the state key encoder which can support range scan. This PR adds a new flag on state key encoder to represent the case. At this point, the new API supports three state key encoders - range scan encoder, timestamp prefix/postfix encoder.

Worth noting that the new API supports the combination of "prefix + range (+ remaining)" along with "range + remaining", hence the callers should reason about the ordering for the keys based on the state key encoder they are using for the CF before calling the API. The API implementation may not prevent the incorrect orderliness of the given startKey and endKey (e.g. startKey being later than endKey, some unexpected keys in binary ordering of the keys between startKey and endKey), so the callers should be very careful of composing startKeys and endKeys.

Why are the changes needed?

This will enable the pattern of range scan to optimize further, effectively skipping tombstones and valid keys in both sides (before the lower bound, after the upper bound) within CF.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude 4.6 Opus

@HeartSaVioR
Copy link
Copy Markdown
Contributor Author

Maybe there could be a couple improvements (or open discussions):

  1. The name of API: "iterator" can also work.
  2. The default implementation of the API: we can call iterator() and filter out keys which aren't bound to the range.

@HeartSaVioR
Copy link
Copy Markdown
Contributor Author

I just changed the API name to "rangeScan" to clarify the meaning. I'll update the PR title and description. Like prefixScan, we will require key state encoder to be compatible with the API.

@HeartSaVioR HeartSaVioR changed the title [SPARK-56369][SS] Introduce a new API "scan/scanWithMultiValues" in StateStore [SPARK-56369][SS] Introduce a new API "rangeScan/rangeScanWithMultiValues" in StateStore Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant