SDSTOR-21797 Propose scrub flatbuffers schema and blob scrub design adr by JacksonYao287 · Pull Request #407 · eBay/HomeObject

JacksonYao287 · 2026-04-10T08:30:43Z

No description provided.

codecov-commenter · 2026-04-10T09:12:32Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.95%. Comparing base (1746bcc) to head (4c70315).
⚠️ Report is 171 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #407       +/-   ##
===========================================
- Coverage   63.15%   52.95%   -10.21%     
===========================================
  Files          32       36        +4     
  Lines        1900     5284     +3384     
  Branches      204      656      +452     
===========================================
+ Hits         1200     2798     +1598     
- Misses        600     2196     +1596     
- Partials      100      290      +190

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xiaoxichen · 2026-04-10T10:58:10Z

+  scrub_info:         ScrubInfo;
+  blob_scrub_results: [BlobScrubResultEntry];
+}
+


Can you have an option of

table ScrubResultEntry { shard_id: uint64; blob_id: uint64; scrub_result: ScrubResult; // default = NONE → absent for healthy blobs (saves space) hash: [ubyte]; // absent for shallow scrub or corrupted blobs } // use single scrub map for Blob/Shard/PG table ScrubMap { reqID: uint64; type: ScrubType; scrub_result: [ScrubResultEntry] }

For shard , the blobid field left empty
For PG, both shard_id/blob_id left empty.

for shard, I want ScrubResultEntry contains shard_id and blob count,
for pg, I want ScrubResultEntry contains the shard count and blob count of this pg.

so , they will not be empty.

table ScrubResultEntry { key_1: uint64; key_2: uint64; scrub_result: ScrubResult; // default = NONE → absent for healthy blobs (saves space) hash: [ubyte]; // absent for shallow scrub or corrupted blobs } table ScrubMap { scrub_info: ScrubInfo; scrub_result: [ScrubResultEntry] }

this is a more generic definition.
if scrub type is deep_blob/shallow blob, key_1 is shard_id, key_2 is blob_i2
if scrub type is depp_shard/shallow_shard, key_1 is shard_id, key_2 is blob count
if scrub type is pg_meta, key_1 is total_shard_count, key_2 is total_blob_count.

total_shard_count and total_blob_count be useful to compare the replica at a hight level

How do you expect the blob_count on PG level be used? If we can keep one result per entry it is more straightforward.

Also, maybe we can change ScrubResult to int64 , use negative for error? and positive for value?

I don`t have a clear for this so far. I think they will useful in some scenario in the future and it is transfered only once(pg_meta scrub is executed only once) and not expensive, so I put here.

positive for value?
the value stands for what for a single blob?

If doesn't impact single blob
NONE: 0
ERROR: NEGTIVE
POSITIVE: APP_defined.

How do you expect APP_defined be used? or just keep it for later extension?

For shard it is blob_count
for pg it is shard count (optionally if we really want blob count, hash field can be used)
for blob, reserve for later extension

xiaoxichen

skipped review for the flatbuffer review, only the blob-range as the later one depends on the first one.

xiaoxichen

lgtm, ideally if there is a sequence diagram can be generated by ai， will be easier to follower

JacksonYao287 · 2026-04-15T09:09:04Z

+  /// Randomised per-request identifier used by the receiver to match a response to
+  /// the outstanding request and to discard stale responses from a previous scrub
+  /// epoch.  The leader generates a distinct req_id for every (replica, request) pair:
+  /// two replicas receiving the same scrub round will each see a different req_id,


add comments for req_id

JacksonYao287 · 2026-04-15T10:43:02Z

 This design combines data produced during the shard scrub phase with a dynamic bin-packing algorithm and a streaming response protocol to satisfy all three constraints simultaneously.

-### Step 1: Build shard_blob_count from the Shard Scrub Phase
+### Step 1: META Scrub


@xiaoxichen pls take a look , I have add the decription for meta scrub

xiaoxichen

lgtm.

A few other thinking point you might doc in here or other doc, or re-purpose the first doc into scrub flow etc.

Step 1 when qeurying the PG btree, assuming all the shards are big shards, consider the lock intention with incoming IOs.
How the shard metablk crc works
When leader starts a scrubbing task, it is not inside the state-machine which means the read to lsn, last_shard_id, last_blobid is not transactional. Say LSN 101 is a create shard(0xAA) and LSN 102 is a put blob(0xBB), if we read <LSN=100, 0xAA, 0xBB> that will cause some false-positive of missing shard/missing PG. If we do spot check then it is not a big deal. A 60% confidence thought is that we read the LSN at last which ensures the scrub-lsn covers last_shard_id and last_blob_id, make it <LSN=102, 0xAA, 0xBB>. More carefully thinking to bump up the confidence level is needed.

JacksonYao287 · 2026-04-16T06:57:16Z

1 Step 1 when qeurying the PG btree, assuming all the shards are big shards, consider the lock intention with incoming IOs.

IIUC, when we do range query, let`s say the range covers 2 btree nodes. when we scan the second node, [we will not lock the first node] (https://github.com/eBay/HomeStore/blob/860330af43973ff99a862f6da17329de045dedc1/src/include/homestore/btree/detail/btree_query_impl.ipp#L33) ?.
I need to confirm this by checking the code carefully. thanks for putting the point here ， I will check it later since LLM is not available now(504 error) , sigh...

How the shard metablk crc works

the key point for pg/shard crc is which fields we will use to calculate the metablk crc, since some fields will probably be different among different replicas. we can discuss it later.

lsn, last_shard_id, last_blobid is not transactional.

pls see
https://github.com/JacksonYao287/HomeObject/blob/40463ac34020dd3f06cf26c8bb80d2dbfe77e9d6/src/lib/homestore_backend/scrub_manager.cpp#L792

and

https://github.com/JacksonYao287/HomeObject/blob/40463ac34020dd3f06cf26c8bb80d2dbfe77e9d6/src/lib/homestore_backend/scrub_manager.cpp#L858

in my implementation, we get last_shard_id and last_blob_id before getting scrub_lsn. This can ensure that when committing to scrub_lsn, a replica should at least see last_shard_id and last_blob_id. this seems basically the same point as you put here.

xiaoxichen · 2026-04-16T09:23:56Z

makes sense, just putting them into doc as well.

JacksonYao287 · 2026-04-17T01:39:55Z

sure , I have added it to the doc.

by the way, I have check index btree with AI, when we do range query, we will not lock the entire btree, we only lock the btree node we are scanning. other nodes of this btree can be read and write in the middle of range query

JacksonYao287 requested a review from xiaoxichen April 10, 2026 08:30

JacksonYao287 changed the title ~~propose scrub flatbuffers schema design~~ SDSTOR-21797 propose scrub flatbuffers schema design Apr 10, 2026

JacksonYao287 force-pushed the SDSTOR-21797 branch from 6e09014 to f44babb Compare April 10, 2026 09:31

xiaoxichen reviewed Apr 10, 2026

View reviewed changes

Comment thread docs/adr/scrub-blob-range-coverage.md Outdated

xiaoxichen reviewed Apr 10, 2026

View reviewed changes

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated

xiaoxichen reviewed Apr 10, 2026

View reviewed changes

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated

JacksonYao287 changed the title ~~SDSTOR-21797 propose scrub flatbuffers schema design~~ SDSTOR-21797 Propose scrub flatbuffers schema and blob scrub design adr Apr 10, 2026

xiaoxichen reviewed Apr 11, 2026

View reviewed changes

Comment thread docs/adr/scrub-blob-range-coverage.md Outdated

JacksonYao287 force-pushed the SDSTOR-21797 branch from 5a07bba to 22e199c Compare April 12, 2026 10:12

xiaoxichen reviewed Apr 13, 2026

View reviewed changes

JacksonYao287 force-pushed the SDSTOR-21797 branch from e73e4a7 to 0fb6854 Compare April 13, 2026 07:54

JacksonYao287 requested a review from xiaoxichen April 14, 2026 07:08

xiaoxichen reviewed Apr 14, 2026

View reviewed changes

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated

JacksonYao287 force-pushed the SDSTOR-21797 branch from 8661ad8 to f7293b4 Compare April 15, 2026 07:00

JacksonYao287 requested a review from xiaoxichen April 15, 2026 07:04

JacksonYao287 force-pushed the SDSTOR-21797 branch from f7293b4 to 6274618 Compare April 15, 2026 07:46

JacksonYao287 commented Apr 15, 2026

View reviewed changes

Comment thread docs/adr/scrub-blob-range-coverage.md Outdated

propose scrub flatbuffers schema design and blob range scrub design adrs

a0f1964

JacksonYao287 force-pushed the SDSTOR-21797 branch 2 times, most recently from e61fab5 to 9fa82a8 Compare April 15, 2026 09:00

JacksonYao287 commented Apr 15, 2026

View reviewed changes

Comment thread docs/adr/scrub-flatbuffers-schema-design.md Outdated

JacksonYao287 commented Apr 15, 2026

View reviewed changes

add description for META Scrub

77ec375

JacksonYao287 force-pushed the SDSTOR-21797 branch from edab125 to 77ec375 Compare April 15, 2026 11:18

xiaoxichen reviewed Apr 16, 2026

View reviewed changes

add more details

4c70315

Conversation

JacksonYao287 commented Apr 10, 2026

Uh oh!

codecov-commenter commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 commented Apr 16, 2026

Uh oh!

xiaoxichen commented Apr 16, 2026

Uh oh!

JacksonYao287 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Apr 10, 2026 •

edited

Loading

JacksonYao287 Apr 10, 2026 •

edited

Loading