Skip to content

minor: remove deprecated interfaces#1481

Open
timsaucer wants to merge 14 commits intoapache:mainfrom
timsaucer:minor/remove-deprecated
Open

minor: remove deprecated interfaces#1481
timsaucer wants to merge 14 commits intoapache:mainfrom
timsaucer:minor/remove-deprecated

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

None

Rationale for this change

We have a bunch of stale interfaces, some that have been marked as deprecated for over 10 releases. This removes any that have been deprecated for >3.

What changes are included in this PR?

Remove deprecated interfaces.

Are there any user-facing changes?

Yes, these old interfaces are no longer available.

@timsaucer timsaucer self-assigned this Apr 7, 2026
@timsaucer timsaucer marked this pull request as ready for review April 8, 2026 13:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes a set of long-deprecated Python and Rust-facing interfaces in the DataFusion Python bindings, and updates examples/docs to use the supported APIs (notably migrating window usage to Expr.over(Window(...))).

Changes:

  • Remove deprecated Python modules/methods/classes (e.g., datafusion.html_formatter, datafusion.udf module alias, Expr.display_name, functions.window, SessionContext.tables, Catalog.database, etc.).
  • Remove deprecated Rust/PyO3 bindings that backed those Python APIs (e.g., the window() pyfunction and deprecated DataFrame methods).
  • Update docs and TPCH examples to use the non-deprecated window/formatter APIs.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/tests/test_expr.py Removes test coverage for the deleted Expr.display_name() deprecation path.
python/datafusion/udf.py Deletes deprecated datafusion.udf module shim.
python/datafusion/substrait.py Removes deprecated lowercase alias classes (plan, serde, producer, consumer).
python/datafusion/html_formatter.py Deletes deprecated datafusion.html_formatter module shim.
python/datafusion/functions.py Removes deprecated functions.window() API and related exports/imports.
python/datafusion/expr.py Removes deprecated Expr.display_name() method and decorator import usage.
python/datafusion/dataframe.py Removes deprecated select_columns() and unnest_column() Python methods.
python/datafusion/dataframe_formatter.py Updates docstring examples to use datafusion.dataframe_formatter instead of datafusion.html_formatter.
python/datafusion/context.py Removes deprecated RuntimeConfig, from_arrow_table(), and tables() Python APIs.
python/datafusion/catalog.py Removes deprecated Catalog.database() and Database alias class.
python/datafusion/init.py Stops exporting Database from the public top-level API.
examples/tpch/q22_global_sales_opportunity.py Migrates aggregate window usage from F.window(...) to F.avg(...).over(Window(...)).
examples/tpch/q17_small_quantity_order.py Migrates aggregate window usage from F.window(...) to F.avg(...).over(Window(...)).
examples/tpch/q15_top_supplier.py Migrates aggregate window usage from F.window(...) to F.max(...).over(Window(...)).
examples/tpch/q11_important_stock_identification.py Migrates aggregate window usage from F.window(...) to F.sum(...).over(Window(...)).
examples/tpch/q02_minimum_cost_supplier.py Migrates aggregate window usage from F.window(...) to F.min(...).over(Window(...)).
docs/source/user-guide/common-operations/windows.rst Updates documentation to demonstrate windowing aggregates via .over(Window(...)) rather than functions.window().
crates/core/src/functions.rs Removes deprecated window PyO3 function and its helper lookup logic.
crates/core/src/dataframe.rs Removes deprecated select_columns/unnest_column bindings; switches __getitem__ to select_exprs.
crates/core/src/context.rs Removes deprecated tables() binding implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 750 to 754
Example:
>>> from datafusion.html_formatter import get_formatter
>>> from datafusion.dataframe_formatter import get_formatter
>>> formatter = get_formatter()
>>> formatter.max_cell_length = 50 # Increase cell length
"""
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs still reference the removed datafusion.html_formatter module (e.g. docs/source/user-guide/dataframe/rendering.rst), but this PR deletes python/datafusion/html_formatter.py. Update those documentation imports/usages to datafusion.dataframe_formatter (or keep a small compatibility shim) to avoid broken docs/examples.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timsaucer timsaucer mentioned this pull request Apr 8, 2026
14 tasks
Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just verified everything removed said deprecated. I didn't check the exact deprecation date

Copy link
Copy Markdown
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timsaucer.

Comment on lines -557 to -563
#[pyo3(signature = (*args))]
fn select_columns(&self, args: Vec<PyBackedStr>) -> PyDataFusionResult<Self> {
let args = args.iter().map(|s| s.as_ref()).collect::<Vec<&str>>();
let df = self.df.as_ref().clone().select_columns(&args)?;
Ok(Self::new(df))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that this is deprecated in datafusion-python but not in datafusion: https://docs.rs/datafusion/53.0.0/datafusion/dataframe/struct.DataFrame.html#method.select_columns

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can take both arguments in select vs select_columns, we deprecated select_columns a long time ago because it's more pythonic to have just one interface.

Comment on lines +189 to +196
f.avg(col('"Attack"')).over(
Window(
window_frame=WindowFrame("rows", None, None),
partition_by=[col('"Type 1"')],
order_by=[col('"Speed"')],
null_treatment=NullTreatment.IGNORE_NULLS,
)
).alias("Average Attack"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the order_by, window_frame, or null_treatment make sense here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The window_frame is necessary so that we get the same avg across the entire partition. Otherwise we'd need a sort on it afterwards to make sure it shows up in the "running avg" and I think using the frame is more easy to understand. But you're right about the null treatment and order_by.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants