Skip to content

[python] Add asyncio-native streaming API for bounded log reading #545

@charlesdong1991

Description

@charlesdong1991

Search before asking

  • I searched in the issues and found nothing similar.

Description

LogScanner.to_arrow_batch_reader() currently returns a synchronous pyarrow.RecordBatchReader that blocks the calling thread on each __next__() call. This might be acceptable for Arrow interop (e.g., feeding into DuckDB, Polars etc), but is not suitable for asyncio-native Python code.

So, ideally we should add an async counterpart, e.g., async for batch in scanner.read_batches() which yields RecordBatch objects without blocking the event loop.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions