Use relative pointers in block encoding#21
Open
caolan wants to merge 2 commits into
Open
Conversation
This patch rewrites all DeltaOps to use `seq` values relative to the `seq` number of the enclosing block. This takes advantage of variable width integer encoding to reduce block size when pointers reference nearby blocks (a common case).
fdbf7f2 to
3784b61
Compare
83e059b to
b727bca
Compare
Contributor
Author
|
Note: this change will currently make the block size estimates used by WriteBatch less accurate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch rewrites all DeltaOps to use
seqvalues relative to theseqnumber of the enclosing block. This takes advantage of variable width integer encoding to reduce block size when pointers reference nearby blocks (a common case).Note: it is possible to have a pointer point to a higher
seqnumber than the block it was written in when addingwrites on top of a remote Hyperbee2. To avoid needing to support negative relative
seqvalues, the optimisation isrestricted to pointers where
core=0. Pointers to other cores are always absolute. This allows the use of auintfor relativeseqvalues, which compresses better.Compression benchmarks indicate this patch results in an approximately 20% reduction in block overhead costs. Impact on read/write performance does not seem significant.
I considered three ways to implement this change:
encoding.jsthat converts pointers to relative before encoding and back to absolute after decoding (this is what I did).DeltaOpsfirst.I chose option 1 because it is the least intrusive code change. However, it does involve cloning DeltaOps before making their pointers relative because they are referenced directly in the tree. It also requires special care around ownership of objects between
encoding.js,index.js, andwrite.js. Ownership might be made clearer by moving these steps insidewrite.jsorindex.js(when creating the batches or inflating the blocks) but that code already felt complex enough.Option 2 was rejected because it requires new context during the encoding/decoding process. This process is currently well decoupled from the rest of the logic and it makes sense to keep it that way.
Option 3 was rejected because it is a much more intrusive change and I'd be concerned about regressions.