Skip to content

feat: add AI skill to find and improve the Pythonic interface to functions#1484

Draft
timsaucer wants to merge 4 commits intoapache:mainfrom
timsaucer:feat/pythonic-skill
Draft

feat: add AI skill to find and improve the Pythonic interface to functions#1484
timsaucer wants to merge 4 commits intoapache:mainfrom
timsaucer:feat/pythonic-skill

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

None

Rationale for this change

This adds an AI agent skill that can be used to search the repository and identify cases where we can make our interface more intuitive to users. Attached is also the diff recommended when using this skill in coordination with our existing agent directives about how to write functions.

What changes are included in this PR?

Add skill for searching repository for functions, investigating their upstream equivalent, and update the function inputs where appropriate.

I ran the skill and updated many function signatures.

Are there any user-facing changes?

Improved type hints and inputs allowed in Python.

timsaucer and others added 3 commits April 9, 2026 11:44
…uiring lit()

Update 47 functions in functions.py to accept native Python types (int, float,
str) for arguments that are contextually literals, eliminating verbose lit()
wrapping. For example, users can now write split_part(col("a"), ",", 2) instead
of split_part(col("a"), lit(","), lit(2)). All changes are backward compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions

Update instr and position (aliases of strpos) to accept Expr | str for
the substring parameter, matching the updated primary function signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alias functions that delegate to a primary function must have their type
hints updated to match, even though coercion logic is only added to the
primary. Added a new Step 3 to the implementation workflow for this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer
Copy link
Copy Markdown
Member Author

Since @kevinjqliu asked about it regarding the last skill I wrote, here is an export of the chat history:
chat-export-2026-04-09.md

One of the important things to note in the chat history is that I intentionally exited the session and started a new session so that the skill would be applied without prior context. Then as I reviewed the code it generated I gave it feedback on the fact that it missed the aliases. And so then I had the agent update the skill it was using.

I've found that this has to be an iterative process. The next step I'm going to take is to start a fresh session and have it review this PR, both the skill and the updates it makes. I'll keep iterating on the process of having the agent and myself review the code suggestions and update the skill.

@timsaucer timsaucer requested a review from Copilot April 9, 2026 16:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds a new AI “make-pythonic” skill and applies its recommendations to make datafusion-python function wrappers accept native Python literals (e.g., str, int, float) in places where callers previously had to wrap values with lit().

Changes:

  • Added a reusable AI skill definition documenting how to audit and “pythonicize” function signatures.
  • Updated many python/datafusion/functions.py APIs to accept native types and internally coerce them to Expr.literal(...).
  • Updated doctest examples to demonstrate the simplified calling convention.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
python/datafusion/functions.py Broad signature + coercion updates to accept native Python types; doctest examples updated accordingly.
.ai/skills/make-pythonic/SKILL.md New skill documentation describing how to identify and implement pythonic argument coercions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



def log(base: Expr, num: Expr) -> Expr:
def log(base: Expr | float, num: Expr) -> Expr:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type hints for log, power, and pow only allow float for base/exponent, but the coercion (Expr.literal(...)) will also accept int cleanly and the upstream signatures typically coerce “numeric”/“integer” for these parameters. To avoid incorrect/overly-restrictive typing for users and type-checkers, widen these hints to Expr | int | float (or at least include int) for the numeric literal parameters.

Suggested change
def log(base: Expr | float, num: Expr) -> Expr:
def log(base: Expr | int | float, num: Expr) -> Expr:

Copilot uses AI. Check for mistakes.
Comment on lines +1212 to 1214
if not isinstance(base, Expr):
base = Expr.literal(base)
return Expr(f.log(base.expr, num.expr))
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type hints for log, power, and pow only allow float for base/exponent, but the coercion (Expr.literal(...)) will also accept int cleanly and the upstream signatures typically coerce “numeric”/“integer” for these parameters. To avoid incorrect/overly-restrictive typing for users and type-checkers, widen these hints to Expr | int | float (or at least include int) for the numeric literal parameters.

Copilot uses AI. Check for mistakes.


def power(base: Expr, exponent: Expr) -> Expr:
def power(base: Expr, exponent: Expr | float) -> Expr:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type hints for log, power, and pow only allow float for base/exponent, but the coercion (Expr.literal(...)) will also accept int cleanly and the upstream signatures typically coerce “numeric”/“integer” for these parameters. To avoid incorrect/overly-restrictive typing for users and type-checkers, widen these hints to Expr | int | float (or at least include int) for the numeric literal parameters.

Copilot uses AI. Check for mistakes.
Comment on lines +1446 to +1451
if not isinstance(exponent, Expr):
exponent = Expr.literal(exponent)
return Expr(f.power(base.expr, exponent.expr))


def pow(base: Expr, exponent: Expr) -> Expr:
def pow(base: Expr, exponent: Expr | float) -> Expr:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type hints for log, power, and pow only allow float for base/exponent, but the coercion (Expr.literal(...)) will also accept int cleanly and the upstream signatures typically coerce “numeric”/“integer” for these parameters. To avoid incorrect/overly-restrictive typing for users and type-checkers, widen these hints to Expr | int | float (or at least include int) for the numeric literal parameters.

Copilot uses AI. Check for mistakes.
Comment on lines +1687 to 1706
if not isinstance(regex, Expr):
regex = Expr.literal(regex)

def _to_raw(val: Any) -> Any:
if val is None:
return None
if not isinstance(val, Expr):
val = Expr.literal(val)
return val.expr

return Expr(
f.regexp_instr(
values.expr,
regex.expr,
start,
n,
flags,
sub_expr,
_to_raw(start),
_to_raw(n),
_to_raw(flags),
_to_raw(sub_expr),
)
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regexp_instr introduces a nested _to_raw helper, which is inconsistent with the inline coercion approach used throughout the rest of this file (and conflicts with the newly-added skill guidance that explicitly says not to introduce helper functions for coercion). Recommend expanding this to explicit inline conversions per-argument (similar to regexp_count / string_to_array) so the coercion is uniform and easier to scan/review. Also, the use of Any here requires Any to be in scope for annotation evaluation; avoiding Any (or ensuring it’s imported / annotations are deferred) prevents potential runtime issues.

Copilot uses AI. Check for mistakes.
Comment on lines +2362 to +2367
def date_part(part: Expr | str, date: Expr) -> Expr:
"""Extracts a subfield from the date.

Args:
part: The part of the date to extract. Must be one of ``"year"``,
``"month"``, ``"day"``, ``"hour"``, ``"minute"``, ``"second"``, etc.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings describe part as an enum-like literal (e.g., "year", "month", …), but the signatures still accept Expr | str. If Expr is supported only for backward compatibility (and/or the Rust implementation requires a literal scalar), please document that expectation explicitly (e.g., “must be a literal scalar string; column expressions are not supported”) to avoid misleading users. Alternatively (more breaking), consider narrowing the type to str if non-literal expressions are not meaningful here.

Copilot uses AI. Check for mistakes.
Comment on lines +2379 to 2381
if not isinstance(part, Expr):
part = Expr.literal(part)
return Expr(f.date_part(part.expr, date.expr))
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings describe part as an enum-like literal (e.g., "year", "month", …), but the signatures still accept Expr | str. If Expr is supported only for backward compatibility (and/or the Rust implementation requires a literal scalar), please document that expectation explicitly (e.g., “must be a literal scalar string; column expressions are not supported”) to avoid misleading users. Alternatively (more breaking), consider narrowing the type to str if non-literal expressions are not meaningful here.

Copilot uses AI. Check for mistakes.


def date_trunc(part: Expr, date: Expr) -> Expr:
def date_trunc(part: Expr | str, date: Expr) -> Expr:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings describe part as an enum-like literal (e.g., "year", "month", …), but the signatures still accept Expr | str. If Expr is supported only for backward compatibility (and/or the Rust implementation requires a literal scalar), please document that expectation explicitly (e.g., “must be a literal scalar string; column expressions are not supported”) to avoid misleading users. Alternatively (more breaking), consider narrowing the type to str if non-literal expressions are not meaningful here.

Copilot uses AI. Check for mistakes.
Comment on lines +2411 to 2413
if not isinstance(part, Expr):
part = Expr.literal(part)
return Expr(f.date_trunc(part.expr, date.expr))
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings describe part as an enum-like literal (e.g., "year", "month", …), but the signatures still accept Expr | str. If Expr is supported only for backward compatibility (and/or the Rust implementation requires a literal scalar), please document that expectation explicitly (e.g., “must be a literal scalar string; column expressions are not supported”) to avoid misleading users. Alternatively (more breaking), consider narrowing the type to str if non-literal expressions are not meaningful here.

Copilot uses AI. Check for mistakes.
Update SKILL.md to prevent three classes of issues: clarify that float
already accepts int per PEP 484 (avoiding redundant int | float that
fails ruff PYI041), add backward-compat rule for Category B so existing
Expr params aren't removed, and add guidance for inline coercion with
many optional nullable params instead of local helpers.

Replace regexp_instr's _to_raw() helper with inline coercion matching
the pattern used throughout the rest of the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants