Skip to content

feat(core): support list_with_glob via GlobLayer#7629

Open
chitralverma wants to merge 9 commits into
apache:mainfrom
chitralverma:feat/oplist-glob-capability
Open

feat(core): support list_with_glob via GlobLayer#7629
chitralverma wants to merge 9 commits into
apache:mainfrom
chitralverma:feat/oplist-glob-capability

Conversation

@chitralverma
Copy link
Copy Markdown
Contributor

@chitralverma chitralverma commented May 28, 2026

Which issue does this PR close?

Closes #6535.

Rationale for this change

Issue #6535 requests glob support on list operations. Naive client-side list-then-filter over recursive listings is wasteful for selective patterns. RFC-6209 + #6535 specify guided traversal: walk the pattern segment-by-segment, list only what could match, prune everything else.

What changes are included in this PR?

  • OpList::with_glob / ListOptions::glob / Capability::list_with_glob plumbing, .glob() on FutureList/FutureLister, capability-check rejection.
  • Internal glob_matcher (no new deps): fsspec/pathlib syntax — *, ?, **, [abc], [!abc], {a,b}. Rejects a**b and trailing \.
  • Opt-in GlobLayer (sibling of SimulateLayer):
    • Passthrough when native_capability.list_with_glob = true.
    • Otherwise drives guided traversal: literals consumed without listing; glob segments → one non-recursive list per matching parent; ** fans out only over plausible dirs.
    • Frame dedup via visited: HashSet<(path, idx)> (NFA→DFA); emission-side path dedup as belt-and-braces.
    • limit enforced post-filter; start_after rejected under client-side glob.

Are there any user-facing changes?

Yes, additive only. No breaking changes.

  • New Capability::list_with_glob (defaults false).
  • New OpList::with_glob / glob(), ListOptions::glob.
  • New .glob(&str) on FutureList / FutureLister.
  • New opt-in layers::GlobLayer.

Usage:

use opendal::layers::GlobLayer;
use opendal::{services::S3, Operator};

let op = Operator::new(S3::default().bucket("b"))?
    .layer(GlobLayer) // no-op if service has native glob
    .finish();

let entries = op.list_with("photos/").glob("**/*.jpg").await?;
let top10 = op.list_with("/").glob("*.{jpg,png}").limit(10).await?;

Without .layer(GlobLayer) and no native support, .glob(...) returns Unsupported. Pattern is relative to the list root. Syntax: * ? ** [abc] [!abc] {a,b}, case-sensitive.

AI Usage Statement

Implemented with assistance from Claude (Opus 4.7) via opencode. All code reviewed and tested locally; 24 unit/integration tests added against services::Memory (including a **-zero regression for the consecutive-DoubleStar case).

Copilot AI review requested due to automatic review settings May 28, 2026 09:21
@chitralverma chitralverma requested a review from Xuanwo as a code owner May 28, 2026 09:21
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. releases-note/feat The PR implements a new feature or has a title that begins with "feat" labels May 28, 2026
@chitralverma chitralverma changed the title feat(core): add list_with_glob via GlobLayer (RFC-6209 guided traversal) feat(core): support list_with_glob via GlobLayer May 28, 2026
…ield

Both java and python bindings constructed core ListOptions with full
field literals, breaking when the new glob field was added. Use
..Default::default() spread to remain forward-compatible with future
ListOptions additions.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread bindings/java/src/lib.rs
recursive: convert::read_bool_field(env, options, "recursive").unwrap_or_default(),
versions: convert::read_bool_field(env, options, "versions").unwrap_or_default(),
deleted: convert::read_bool_field(env, options, "deleted").unwrap_or_default(),
..Default::default()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes breaking ci

recursive: opts.recursive.unwrap_or(false),
versions: opts.versions.unwrap_or(false),
deleted: opts.deleted.unwrap_or(false),
..Default::default()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes breaking ci

compile_segments now folds consecutive literal segments at the leading
edge of a pattern into a single Literal variant. This lets guided
traversal walk the prefix in one frame-loop iteration instead of one
per segment, which is the common case (e.g. a/b/c/**/*.json).

Literals appearing after a Glob or DoubleStar are kept as separate
single-component Literal segments so downstream basename matching in
handle_active_entry remains unambiguous.

Adds unit tests for the fold and an e2e test exercising a long literal
prefix followed by a recursive glob.
Same fix as java/python: replace full struct literal with
..Default::default() spread so future ListOptions additions don't break
the binding.
@chitralverma chitralverma force-pushed the feat/oplist-glob-capability branch from 8bb6be3 to fd3052b Compare May 28, 2026 18:54
@chitralverma chitralverma requested a review from suyanhanx as a code owner May 28, 2026 18:54
recursive: value.recursive.unwrap_or_default(),
versions: value.versions.unwrap_or_default(),
deleted: value.deleted.unwrap_or_default(),
..Default::default()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes ci

@chitralverma
Copy link
Copy Markdown
Contributor Author

@Xuanwo please have a look when you can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new feature: Efficient Client-Side Glob Implementation via Guided Traversal

2 participants