Skip to content

Difficult to track/debug export info #1824

@DimensionWieldr

Description

@DimensionWieldr

It is difficult to debug issues with exports because information is scattered across various system tables and storage locations.

For example, to see parquet files for a certain table on S3, I need to know its storage location and use something like the following to list them (applies to metadata.json files as well where **.parquet needs to be changed to **.metadata.json:

SELECT _path
FROM s3('s3://protvjoe-1lfyj7kp-iceberg/default/destination12**.parquet', One)
ORDER BY _path ASC

And for information about each written iceberg snapshot, I query system.iceberg_history:

SELECT *
FROM system.iceberg_history
WHERE `table` = 'default.destination12'
ORDER BY made_current_at ASC

For per-partition info (export status, source replica, etc.):

SELECT *
FROM system.replicated_partition_exports
WHERE (source_database = 'default') AND (source_table = 'source14') AND (destination_database = 'export-test') AND (destination_table = 'default.destination14')
ORDER BY partition_id ASC

And there's also system.part_log.

The point is, it is confusing for users to debug anything related to exports. It feels a bit like a scavenger hunt for information. I'm not sure what the best solution would be for this, but it would be nice to expose table, parquet, and metadata file locations in some way that doesn't flood system tables with too much info, but also doesn't involve guessing if there was an issue with storage setup, ice version, or something else by counting parquet files.

Maybe someone has an idea about how to better organize information?

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions