Reduce method by caolan · Pull Request #46 · holepunchto/hyperbee2

caolan · 2026-05-05T16:10:44Z

A reducer API for Hyperbee2

I'm not recommending you merge this yet. I'm opening the pull request as a place to discuss the feature. Take a look in the examples directory to give you a feel for how it works.

You'll notice this only includes the 'reduce' part of 'map/reduce'. That is intentional. A map is essentially another index/tree built on top of this one via a map function. I imagine, if required, we would create a new Hyperbee2 that watches the source tree for changes and applies a map function to any updates. This allows querying the mapped data with range requests, reducers, etc. using the regular Hyperbee2 API.

The intermediate output of reducers can either be ephemeral (temporary), cached in memory, or cached on disk (written to a batch in Hypercore). Caching the output of reducers for subtrees greatly improves query performance.

Writing to disk requires providing the desired reducer functions (and names) to flush() on the WriteBatch. It is possible with this API to stop using old reducers without incurring the cost of their ongoing recalculation and write overhead, and to introduce new reducers as necessary. The API does not rely on eagerly writing reducer output to nodes on every operation because this would be inefficient, but more importantly, it would make it difficult to layer new reducers on top of trees forked from a remote peer. Recalculation on demand during flush() and writing to your own batch avoids those issues.

When written to a batch, the cached results are included directly on the tree node inside the batch (JSON encoded for now). There is currently no indirection (like with value pointers), the reducer output is written directly into the node. Since any update to the node or it's descendants will invalidate the output of the reducer, I didn't see a reason to link them across batches via a pointer.

The API asks that any reducers using a cache provide a unique name as a string. This facilitates dropping old reducers and introducing new ones as the application develops. Changing your reducer in a backwards-incompatible way demands you change it's name string to invalidate the cache.

API

`await db.reduce(name, reducer)`

Calculates an accumulated value for all entries in the tree.

To cache intermediate results between calls, provide a string as the
name argument. This must be unique to the reducer function and be
updated if the reducer changes in a backwards-incompatible way.

To perform a one-off temporary reduce, provide null as the name.

The reducer is a function that will calculate the accumulated value.
It takes two arguments: values and rereduce.

When rereduce is false, the values argument will be an Array of
tree entries with key and value properties. When the rereduce
argument is true, the values argument will be an Array of values
returned from previous reducer calls.

The output of a reducer must be a JSON-compatible value.

Note: while entries provided to the reducer are in sorted key order,
those entries might not be contiguous across reducer calls. For example,
a reducer might receive [3,4], [6,7], [9], [5,8] across 4 calls.

import Hyperbee from '../index.js'
import Corestore from 'corestore'

const total = (values, rereduce) => {
  let total = 0
  for (const v of values) {
    if (rereduce) {
      total += v
    } else {
      total += Number(v.value.toString())
    }
  }
  return total
}

const b = new Hyperbee(new Corestore('./sandbox'))
await b.ready()

// Calculate total of all number strings
console.log(
    await b.reduce('total', total)
)

`await db.reduceRange(name, reducer, start, end)`

Calculates an accumulated value for a range of entries in the tree.
start can be null (to begin with the first entry) or a Buffer less
than or equal to the first key to include. end can be null (to end
after the last entry), or a Buffer greater than the last key to include.

The name and reducer arguments are described in the documentation for
db.reduce().

`await batch.flush([reducers])`

The reducers argument is an Object with reducer functions (as described by
the documentation for db.reduce()) keyed by unique strings. If provided,
these will be recalculated for all nodes lacking a cached reduce result and
updated values will be written to Hypercore as part of the batch. This greatly
improves query time for reducers at the expense of writing a larger batch.

Limitations

Because Hyperbee2 is a B-tree and not a B+tree, entries are sorted but might not be contiguous across calls. This is probably not what the author of a reducer expects. For example, a reducer might receive [3, 4], [6,7], [9], [5,8] across 4 calls.

caolan · 2026-05-05T16:41:53Z

Note: this proposal needs at least a test suite before it can be merged.

caolan added 4 commits May 5, 2026 11:08

Reducer output stored on tree node pointers

d31dc11

Add reduceRange method

53c581d

Allow temporary unnamed reducers

62bbdf3

Provide entries with key and value properties to reducers

6328977

caolan marked this pull request as draft May 5, 2026 16:10

caolan force-pushed the map-reduce2 branch 2 times, most recently from bb76da5 to 1324d78 Compare May 5, 2026 16:20

Document reducer methods

eb08387

caolan force-pushed the map-reduce2 branch from 1324d78 to eb08387 Compare May 5, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce method#46

Reduce method#46
caolan wants to merge 5 commits into
holepunchto:mainfrom
caolan:map-reduce2

caolan commented May 5, 2026 •

edited

Loading

Uh oh!

caolan commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

caolan commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A reducer API for Hyperbee2

API

await db.reduce(name, reducer)

await db.reduceRange(name, reducer, start, end)

await batch.flush([reducers])

Limitations

Uh oh!

caolan commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

caolan commented May 5, 2026 •

edited

Loading

`await db.reduce(name, reducer)`

`await db.reduceRange(name, reducer, start, end)`

`await batch.flush([reducers])`