Skip to content

[GH-2877] Add Box2D type and Box2DUDT#2878

Merged
jiayuasu merged 7 commits intoapache:masterfrom
jiayuasu:feature/box2d-type
May 2, 2026
Merged

[GH-2877] Add Box2D type and Box2DUDT#2878
jiayuasu merged 7 commits intoapache:masterfrom
jiayuasu:feature/box2d-type

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 1, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Adds the `Box2D` value type and its UDT, the foundation for the bbox work tracked in #2877. Functions (`ST_Box2D`, `ST_MakeBox2D`, `ST_Extent`, accessor overloads, casts) follow in subsequent PRs.

  • `common/.../geometryObjects/Box2D.java` — planar 2D bounding box. Always a valid finite bbox; absence of a bbox is represented by SQL NULL at the column level (PostGIS-compatible). `xmin > xmax` is intentionally not used as an in-band empty marker so it remains free for a future antimeridian-wraparound semantics on geography bboxes (cf. `apache/sedona-db`'s `WraparoundInterval`).
  • `spark/common/.../UDT/Box2DUDT.scala` — struct-backed UDT with `sqlType = struct<xmin, ymin, xmax, ymax>` (all `double`, non-nullable). Struct-backed (not binary-backed) so values round-trip natively to Parquet and align zero-copy with GeoParquet 1.1 bbox covering columns.
  • `spark/common/.../UDT/UdtRegistratorWrapper.scala` — register `Box2D ↔ Box2DUDT`.
  • `python/sedona/spark/...` — matching `Box2DType` UDT and `Box2D` value class so a Box2D column materialized in PySpark resolves cleanly.

Field names (`xmin/ymin/xmax/ymax`) match the GeoParquet 1.1 spec and `apache/sedona-db`'s GeoParquet writer for direct cross-engine interop.

How was this patch tested?

`Box2DUDTSuite` (new) covers:

  • `UdtRegistratorWrapper.registerAll()` registers Box2D
  • JSON schema round-trip
  • Box2D struct serde round-trip
  • Case-object equality and `hashCode`
  • Parquet write/read of a Box2D column

Python `Box2DType` was smoke-tested locally for `serialize` / `deserialize` round-trip and `scalaUDT` linkage. Function-level Python tests arrive with the function PRs that introduce constructors.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public SQL API so no need to change the documentation. `Box2D` is not directly user-constructible until the follow-up PR adds `ST_Box2D` / `ST_MakeBox2D`; documentation lands with that PR.

Introduces a planar bounding-box value type backed by a struct UDT
(struct<xmin,ymin,xmax,ymax>, all double, non-nullable) so values
round-trip natively to Parquet and align with GeoParquet 1.1 bbox
covering columns. Empty boxes are encoded as xmin > xmax (JTS Envelope
convention), making union/expand a no-op against empty.

This change adds only the type and its registration. Functions
(ST_Box2D, ST_MakeBox2D, ST_Extent, accessor overloads, casts) follow
in subsequent commits per the plan in apache#2877.
@jiayuasu jiayuasu marked this pull request as draft May 1, 2026 06:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new JVM/Spark-native planar bounding-box value type (Box2D) and a Spark UDT (Box2DUDT) as groundwork for bbox-related SQL functions and GeoParquet bbox covering-column interoperability (per GH-2877).

Changes:

  • Introduce Box2D (Java) with empty-box semantics and basic conversions (Envelope/Polygon).
  • Add struct-backed Box2DUDT (struct<xmin,ymin,xmax,ymax> doubles) for Spark SQL serialization/deserialization.
  • Register Box2DBox2DUDT in UdtRegistratorWrapper.registerAll().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
common/src/main/java/org/apache/sedona/common/geometryObjects/Box2D.java New planar bbox value type with empty/union helpers and conversion utilities.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/Box2DUDT.scala New struct-backed Spark UDT for Box2D, including JSON schema support.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/UdtRegistratorWrapper.scala Registers the new Box2D UDT mapping alongside existing Sedona UDTs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/main/java/org/apache/sedona/common/geometryObjects/Box2D.java Outdated
jiayuasu added 3 commits May 1, 2026 00:05
Mirrors the JVM Box2DUDT so a Box2D column materialized in PySpark
(e.g. via a JVM-created DataFrame) resolves to the matching Python
type. Round-trips through the struct sqlType cleanly, including the
empty-box encoding (xmin > xmax || ymin > ymax).
Test coverage: UDT registration, JSON schema round-trip, Box2D serde
round-trip (including empty), case-object equality, Parquet write/read
of a Box2D column.

Javadoc on Box2D updated to match isEmpty() (xmin > xmax || ymin > ymax),
not just xmin > xmax.
Drops the in-band 'xmin > xmax' empty marker. A Box2D is now always a
valid finite bbox; absence (bbox of empty geometry, extent over zero
rows) is represented by SQL NULL at the column level. This matches
PostGIS behavior (where Box2D(EMPTY) returns NULL) and leaves
xmin > xmax free for a future antimeridian-wraparound semantics on
geography bboxes (cf. sedona-db's WraparoundInterval, S2's
S2LatLngRect).

Drops Box2D.empty() / isEmpty() and the Python equivalents. The
expandToInclude(null) no-op is preserved so aggregation buffers can
fold over a stream of geometries that may produce null bboxes.
@jiayuasu
Copy link
Copy Markdown
Member Author

jiayuasu commented May 1, 2026

@zhangfengcdt @paleolimbot what do you think of this?

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the Spark details but the definition looks good to me!

It also matches the GeoArrow naming and definition of the box type, which is what we'll match this with in SedonaDB: https://geoarrow.org/format.html#box

@zhangfengcdt
Copy link
Copy Markdown
Member

Would be clearer if we just name it BOXUDT, with the possibility to extend to z and m dimensions later on (if needed)? The parquet bbox does not limit to 2D scenario.

@jiayuasu
Copy link
Copy Markdown
Member Author

jiayuasu commented May 1, 2026

Would be clearer if we just name it BOXUDT, with the possibility to extend to z and m dimensions later on (if needed)? The parquet bbox does not limit to 2D scenario.

@zhangfengcdt Yes, there will be a BOX3D type. This is to maintain compatibility with PostGIS box2d and box3d

@zhangfengcdt
Copy link
Copy Markdown
Member

Would be clearer if we just name it BOXUDT, with the possibility to extend to z and m dimensions later on (if needed)? The parquet bbox does not limit to 2D scenario.

@zhangfengcdt Yes, there will be a BOX3D type. This is to maintain compatibility with PostGIS box2d and box3d

Got it, make sense to me.

jiayuasu added 2 commits May 1, 2026 21:37
fromEnvelope(Envelope) and toEnvelope() are not used by the Phase 1
SQL surface (ST_Box2D, ST_MakeBox2D, ST_Extent, accessors, CAST AS
geometry, ST_AsText). Removing them in line with the PostGIS box
function set we're targeting.
The polygon conversion is only needed by CAST(box2d AS geometry),
which lands with the function PR. Dropping until then keeps Box2D as
pure data plus the Geometry intake (fromGeometry) and the merge
primitive (expandToInclude) that ST_Extent needs. Removes Polygon,
Coordinate, GeometryFactory imports.
@jiayuasu jiayuasu marked this pull request as ready for review May 2, 2026 04:43
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone May 2, 2026
@jiayuasu jiayuasu merged commit 1786a14 into apache:master May 2, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a native Box2D type for bounding boxes

4 participants