Skip to content

Use relative pointers in block encoding#21

Open
caolan wants to merge 2 commits into
holepunchto:mainfrom
caolan:relative-pointers2
Open

Use relative pointers in block encoding#21
caolan wants to merge 2 commits into
holepunchto:mainfrom
caolan:relative-pointers2

Conversation

@caolan

@caolan caolan commented Jan 26, 2026

Copy link
Copy Markdown
Contributor

This patch rewrites all DeltaOps to use seq values relative to the seq number of the enclosing block. This takes advantage of variable width integer encoding to reduce block size when pointers reference nearby blocks (a common case).

Note: it is possible to have a pointer point to a higher seq number than the block it was written in when adding
writes on top of a remote Hyperbee2. To avoid needing to support negative relative seq values, the optimisation is
restricted to pointers where core=0. Pointers to other cores are always absolute. This allows the use of a uint for relative seq values, which compresses better.

Compression benchmarks indicate this patch results in an approximately 20% reduction in block overhead costs. Impact on read/write performance does not seem significant.

I considered three ways to implement this change:

  1. To write new code inside encoding.js that converts pointers to relative before encoding and back to absolute after decoding (this is what I did).
  2. To write a new encoder for compact-encoding that does the same work. This has the advantage of not having to clone the DeltaOps first.
  3. To update the tree to work with relative pointers internally, requiring no change to encode/decode logic (apart from supporting past versions).

I chose option 1 because it is the least intrusive code change. However, it does involve cloning DeltaOps before making their pointers relative because they are referenced directly in the tree. It also requires special care around ownership of objects between encoding.js, index.js, and write.js. Ownership might be made clearer by moving these steps inside write.js or index.js (when creating the batches or inflating the blocks) but that code already felt complex enough.

Option 2 was rejected because it requires new context during the encoding/decoding process. This process is currently well decoupled from the rest of the logic and it makes sense to keep it that way.

Option 3 was rejected because it is a much more intrusive change and I'd be concerned about regressions.

This patch rewrites all DeltaOps to use `seq` values relative to
the `seq` number of the enclosing block. This takes advantage
of variable width integer encoding to reduce block size when
pointers reference nearby blocks (a common case).
@caolan caolan force-pushed the relative-pointers2 branch from fdbf7f2 to 3784b61 Compare February 2, 2026 10:47
@caolan caolan force-pushed the relative-pointers2 branch from 83e059b to b727bca Compare February 3, 2026 13:58
@caolan

caolan commented Feb 10, 2026

Copy link
Copy Markdown
Contributor Author

Note: this change will currently make the block size estimates used by WriteBatch less accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant