feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support by JingsongLi · Pull Request #234 · apache/paimon-rust

JingsongLi · 2026-04-11T02:43:41Z

Purpose

Subtask of #232

Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers.

Key components:

TableWrite: routes batches by partition/bucket, holds DataFileWriters
DataFileWriter: manages parquet file lifecycle with rolling support
WriteBuilder: creates TableWrite and TableCommit instances
PaimonDataSink: DataFusion DataSink integration for INSERT/OVERWRITE
FormatFileWriter: extended with flush() and in_progress_size()

Configurable options via CoreOptions:

file.compression (default: zstd)
target-file-size (default: 256MB)
write.parquet-buffer-size (default: 256MB)

Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.

Brief change log

Tests

API and Format

Documentation

… support Add TableWrite for writing Arrow RecordBatches to Paimon append-only tables. Each (partition, bucket) pair gets its own DataFileWriter with direct writes (matching delta-rs DeltaWriter pattern). File rolling uses tokio::spawn for background close, and prepare_commit uses try_join_all for parallel finalization across partition writers. Key components: - TableWrite: routes batches by partition/bucket, holds DataFileWriters - DataFileWriter: manages parquet file lifecycle with rolling support - WriteBuilder: creates TableWrite and TableCommit instances - PaimonDataSink: DataFusion DataSink integration for INSERT/OVERWRITE - FormatFileWriter: extended with flush() and in_progress_size() Configurable options via CoreOptions: - file.compression (default: zstd) - target-file-size (default: 256MB) - write.parquet-buffer-size (default: 256MB) Includes E2E integration tests for unpartitioned, partitioned, fixed-bucket, multi-commit, column projection, and bucket filtering.

littlecoder04 · 2026-04-11T15:52:02Z

crates/paimon/src/table/table_commit.rs

+                let row = BinaryRow::from_serialized_bytes(&msg.partition)?;
+                let mut spec = HashMap::new();
+                for (i, key) in partition_keys.iter().enumerate() {
+                    if let Some(datum) = extract_datum(&row, i, &data_types[i])? {


This will drop NULL partition keys from the overwrite predicate. I reproduced a case where overwriting the NULL partition also deletes other partitions.

JingsongLi force-pushed the writer branch from e91da7d to 00a4023 Compare April 11, 2026 03:17

littlecoder04 reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234

feat(write): add write pipeline with DataFusion INSERT INTO/OVERWRITE support#234
JingsongLi wants to merge 1 commit intoapache:mainfrom
JingsongLi:writer

JingsongLi commented Apr 11, 2026

Uh oh!

littlecoder04 Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JingsongLi commented Apr 11, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

littlecoder04 Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

littlecoder04 Apr 11, 2026 •

edited

Loading