Skip to content

[AURON #2183] Implement native support for ORC InsertIntoHiveTable writes#2191

Open
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/orc-sink_native_iceberg
Open

[AURON #2183] Implement native support for ORC InsertIntoHiveTable writes#2191
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/orc-sink_native_iceberg

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2183

Rationale for this change

Auron already supports native Parquet InsertIntoHiveTable writes, but ORC Hive writes still fall back to Spark’s regular execution path. This leaves native write coverage incomplete for a common Hive storage format.

This PR adds native support for ORC InsertIntoHiveTable writes so eligible Hive ORC write workloads can stay on the native path instead of falling back.

What changes are included in this PR?

This PR:

  • adds native ORC sink support in the native engine
  • adds planner / proto support for ORC sink execution
  • adds Spark-side physical plan support for native ORC InsertIntoHiveTable
  • extends AuronConverters to convert supported Hive ORC write plans to the native path
  • adds ORC sink utilities for task output path generation and output completion
  • preserves dynamic partition write handling on the native ORC write path
  • adapts input batches to the expected ORC/Hive output schema before writing
  • records output row and byte metrics for native ORC writes
  • adds execution coverage in AuronExecSuite

Are there any user-facing changes?

Yes.

Hive table writes using ORC may now remain on the native execution path when they match the supported InsertIntoHiveTable write pattern, instead of falling back to Spark’s regular write execution.

How was this patch tested?

CI.

@weimingdiit weimingdiit force-pushed the feat/orc-sink_native_iceberg branch 4 times, most recently from 4682ae0 to 65f7ae7 Compare April 13, 2026 05:27
Shims.get.createNativeParquetInsertIntoHiveTableExec(cmd, sortedChild)
Shims.get.createNativeParquetInsertIntoHiveTableExec(cmd, sortInsertChild(cmd, child))

case DataWritingCommandExec(cmd: InsertIntoHiveTable, child)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Auron only has auron.enable.data.writing to control whether writing is converted to Native, but it is not enabled for different formats. It is recommended to add it for separate control.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I changed the write gating to support separate format-level controls on top of the existing global spark.auron.enable.data.writing switch. The converter now checks spark.auron.enable.data.writing.parquet and spark.auron.enable.data.writing.orc before converting InsertIntoHiveTable.

@weimingdiit weimingdiit force-pushed the feat/orc-sink_native_iceberg branch from 65f7ae7 to bc8c720 Compare April 13, 2026 09:44
…ble writes

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit force-pushed the feat/orc-sink_native_iceberg branch from bc8c720 to 28946b0 Compare April 13, 2026 11:20
@weimingdiit weimingdiit marked this pull request as ready for review April 13, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for ORC InsertIntoHiveTable writes

2 participants