Reduce method#46
Draft
caolan wants to merge 5 commits into
Draft
Conversation
bb76da5 to
1324d78
Compare
Contributor
Author
|
Note: this proposal needs at least a test suite before it can be merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A reducer API for Hyperbee2
I'm not recommending you merge this yet. I'm opening the pull request as a place to discuss the feature. Take a look in the examples directory to give you a feel for how it works.
You'll notice this only includes the 'reduce' part of 'map/reduce'. That is intentional. A map is essentially another index/tree built on top of this one via a map function. I imagine, if required, we would create a new Hyperbee2 that watches the source tree for changes and applies a map function to any updates. This allows querying the mapped data with range requests, reducers, etc. using the regular Hyperbee2 API.
The intermediate output of reducers can either be ephemeral (temporary), cached in memory, or cached on disk (written to a batch in Hypercore). Caching the output of reducers for subtrees greatly improves query performance.
Writing to disk requires providing the desired reducer functions (and names) to
flush()on theWriteBatch. It is possible with this API to stop using old reducers without incurring the cost of their ongoing recalculation and write overhead, and to introduce new reducers as necessary. The API does not rely on eagerly writing reducer output to nodes on every operation because this would be inefficient, but more importantly, it would make it difficult to layer new reducers on top of trees forked from a remote peer. Recalculation on demand duringflush()and writing to your own batch avoids those issues.When written to a batch, the cached results are included directly on the tree node inside the batch (JSON encoded for now). There is currently no indirection (like with value pointers), the reducer output is written directly into the node. Since any update to the node or it's descendants will invalidate the output of the reducer, I didn't see a reason to link them across batches via a pointer.
The API asks that any reducers using a cache provide a unique name as a string. This facilitates dropping old reducers and introducing new ones as the application develops. Changing your reducer in a backwards-incompatible way demands you change it's name string to invalidate the cache.
API
await db.reduce(name, reducer)Calculates an accumulated value for all entries in the tree.
To cache intermediate results between calls, provide a string as the
nameargument. This must be unique to the reducer function and beupdated if the reducer changes in a backwards-incompatible way.
To perform a one-off temporary reduce, provide
nullas the name.The
reduceris a function that will calculate the accumulated value.It takes two arguments:
valuesandrereduce.When
rereduceisfalse, thevaluesargument will be an Array oftree entries with
keyandvalueproperties. When therereduceargument is
true, thevaluesargument will be an Array of valuesreturned from previous
reducercalls.The output of a reducer must be a JSON-compatible value.
Note: while entries provided to the reducer are in sorted key order,
those entries might not be contiguous across reducer calls. For example,
a reducer might receive
[3,4], [6,7], [9], [5,8]across 4 calls.await db.reduceRange(name, reducer, start, end)Calculates an accumulated value for a range of entries in the tree.
startcan benull(to begin with the first entry) or aBufferlessthan or equal to the first key to include.
endcan benull(to endafter the last entry), or a
Buffergreater than the last key to include.The
nameandreducerarguments are described in the documentation fordb.reduce().await batch.flush([reducers])The
reducersargument is an Object withreducerfunctions (as described bythe documentation for
db.reduce()) keyed by unique strings. If provided,these will be recalculated for all nodes lacking a cached reduce result and
updated values will be written to Hypercore as part of the batch. This greatly
improves query time for reducers at the expense of writing a larger batch.
Limitations
Because Hyperbee2 is a B-tree and not a B+tree, entries are sorted but might not be contiguous across calls. This is probably not what the author of a reducer expects. For example, a reducer might receive
[3, 4], [6,7], [9], [5,8]across 4 calls.