Skip to content

feat(reader): surface per-zone stats from the zone-map table (ADR 0013 §6)#181

Merged
dfa1 merged 2 commits into
mainfrom
feat/zone-stats-surface
Jun 27, 2026
Merged

feat(reader): surface per-zone stats from the zone-map table (ADR 0013 §6)#181
dfa1 merged 2 commits into
mainfrom
feat/zone-stats-surface

Conversation

@dfa1

@dfa1 dfa1 commented Jun 27, 2026

Copy link
Copy Markdown
Owner

What

Adds the read-side surface for ADR 0013 §6 aggregate push-down: a way to read per-zone statistics without decoding data segments.

  • ScanIterator.columnZoneStats(col) — returns one ArrayStats per zone (min/max/sum/null-count), positionally aligned with chunkRowCounts(). The whole-zone tier a reduce kernel folds; boundary zones fall back to streaming decode (future work).
  • ArrayStats gains a sum component; fromFbs decodes it (forward-compat).
  • ZonedStatsSchema moves inspectorreader so the read path can reconstruct the stats-table dtype. cli/inspector imports updated; test moved with it.

Where sum comes from

Sum is decoded from the column's vortex.stats zone-map table, not per-flat node stats — matching Rust, whose flat writer retains only pre-computed stats (flat/writer.rs) and emits SUM only in the zoned table (zoned/writer.rs). When a column has no zone map, columnZoneStats falls back to per-chunk node stats (sum null).

VortexWriter is unchanged functionally (comment only) — sum already lived in the zone-map table.

Tests

  • ColumnZoneStatsTest (writer): per-zone min/max/sum/null-count, whole-zone SUM fold, float→Double sums, missing-column → empty-per-zone.
  • New parity test in RustWritesJavaReadsIntegrationTest: a Rust-written 200k-row file — every zone's sum non-null, folds to the exact column total. Proves the reader decodes Rust's zone-map sum.
  • Verified both interop directions (Rust→Java 13, Java→Rust 217, roundtrip 1) and reader/writer/inspector/cli unit suites.

Scope

ADR 0013 stays Proposed. Out of scope (follow-ups): Mask/Predicate/kernel vocab, the two-tier reduce, and rewiring calcite VortexAggregates.SUM off its full-scan (now trivial — sum is available per-zone).

🤖 Generated with Claude Code

@dfa1 dfa1 force-pushed the feat/zone-stats-surface branch from 62a74d4 to 778f863 Compare June 27, 2026 09:57
…3 §6)

Add ScanIterator.columnZoneStats(col): one ArrayStats per zone with
min/max/sum/null-count, the read-side feed for aggregate push-down.

Sum is decoded from the column's vortex.stats zone-map table rather than
per-flat node stats — matching Rust, whose flat writer retains only
pre-computed stats (flat/writer.rs) and emits SUM only in the zoned table
(zoned/writer.rs). Falls back to per-chunk node stats (sum null) when a
column has no zone map.

- ArrayStats gains a sum component; fromFbs decodes it (forward-compat).
- ZonedStatsSchema moves inspector -> reader so the read path can
  reconstruct the stats-table dtype; cli/inspector imports updated.
- VortexWriter is unchanged functionally (comment only); sum continues to
  live in the zone-map table.

Calcite VortexAggregates.SUM/AVG now fold the per-zone sums instead of a
full scan: metadata-only when every zone carries a sum, falling back to a
streaming scan only when a column has no zone map.

Verified both interop directions, incl. a new test folding per-zone sums
from a Rust-written file to the exact column total.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 force-pushed the feat/zone-stats-surface branch from 778f863 to 3df5b7d Compare June 27, 2026 10:06
… ref

columnZoneStats javadoc overstated alignment: zone-map rows align with
chunkRowCounts() only on the fallback path and for files this writer
produces (one zone per chunk). A foreign writer may use a fixed zone
length independent of chunk boundaries, so the zone count need not match.
Reword to scope the guarantee.

Also fix VortexWriter's stale [io.github.dfa1.vortex.inspect] reference to
ZonedStatsSchema — the class moved to the reader package in this branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 3166974 into main Jun 27, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/zone-stats-surface branch June 27, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant