Conversation
…been deprecated since DF48.
There was a problem hiding this comment.
Pull request overview
This PR removes a set of long-deprecated Python and Rust-facing interfaces in the DataFusion Python bindings, and updates examples/docs to use the supported APIs (notably migrating window usage to Expr.over(Window(...))).
Changes:
- Remove deprecated Python modules/methods/classes (e.g.,
datafusion.html_formatter,datafusion.udfmodule alias,Expr.display_name,functions.window,SessionContext.tables,Catalog.database, etc.). - Remove deprecated Rust/PyO3 bindings that backed those Python APIs (e.g., the
window()pyfunction and deprecated DataFrame methods). - Update docs and TPCH examples to use the non-deprecated window/formatter APIs.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/tests/test_expr.py | Removes test coverage for the deleted Expr.display_name() deprecation path. |
| python/datafusion/udf.py | Deletes deprecated datafusion.udf module shim. |
| python/datafusion/substrait.py | Removes deprecated lowercase alias classes (plan, serde, producer, consumer). |
| python/datafusion/html_formatter.py | Deletes deprecated datafusion.html_formatter module shim. |
| python/datafusion/functions.py | Removes deprecated functions.window() API and related exports/imports. |
| python/datafusion/expr.py | Removes deprecated Expr.display_name() method and decorator import usage. |
| python/datafusion/dataframe.py | Removes deprecated select_columns() and unnest_column() Python methods. |
| python/datafusion/dataframe_formatter.py | Updates docstring examples to use datafusion.dataframe_formatter instead of datafusion.html_formatter. |
| python/datafusion/context.py | Removes deprecated RuntimeConfig, from_arrow_table(), and tables() Python APIs. |
| python/datafusion/catalog.py | Removes deprecated Catalog.database() and Database alias class. |
| python/datafusion/init.py | Stops exporting Database from the public top-level API. |
| examples/tpch/q22_global_sales_opportunity.py | Migrates aggregate window usage from F.window(...) to F.avg(...).over(Window(...)). |
| examples/tpch/q17_small_quantity_order.py | Migrates aggregate window usage from F.window(...) to F.avg(...).over(Window(...)). |
| examples/tpch/q15_top_supplier.py | Migrates aggregate window usage from F.window(...) to F.max(...).over(Window(...)). |
| examples/tpch/q11_important_stock_identification.py | Migrates aggregate window usage from F.window(...) to F.sum(...).over(Window(...)). |
| examples/tpch/q02_minimum_cost_supplier.py | Migrates aggregate window usage from F.window(...) to F.min(...).over(Window(...)). |
| docs/source/user-guide/common-operations/windows.rst | Updates documentation to demonstrate windowing aggregates via .over(Window(...)) rather than functions.window(). |
| crates/core/src/functions.rs | Removes deprecated window PyO3 function and its helper lookup logic. |
| crates/core/src/dataframe.rs | Removes deprecated select_columns/unnest_column bindings; switches __getitem__ to select_exprs. |
| crates/core/src/context.rs | Removes deprecated tables() binding implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Example: | ||
| >>> from datafusion.html_formatter import get_formatter | ||
| >>> from datafusion.dataframe_formatter import get_formatter | ||
| >>> formatter = get_formatter() | ||
| >>> formatter.max_cell_length = 50 # Increase cell length | ||
| """ |
There was a problem hiding this comment.
The docs still reference the removed datafusion.html_formatter module (e.g. docs/source/user-guide/dataframe/rendering.rst), but this PR deletes python/datafusion/html_formatter.py. Update those documentation imports/usages to datafusion.dataframe_formatter (or keep a small compatibility shim) to avoid broken docs/examples.
There was a problem hiding this comment.
Here is the referenced page: https://datafusion.apache.org/python/user-guide/dataframe/rendering.html
ntjohnson1
left a comment
There was a problem hiding this comment.
I basically just verified everything removed said deprecated. I didn't check the exact deprecation date
| #[pyo3(signature = (*args))] | ||
| fn select_columns(&self, args: Vec<PyBackedStr>) -> PyDataFusionResult<Self> { | ||
| let args = args.iter().map(|s| s.as_ref()).collect::<Vec<&str>>(); | ||
| let df = self.df.as_ref().clone().select_columns(&args)?; | ||
| Ok(Self::new(df)) | ||
| } | ||
|
|
There was a problem hiding this comment.
Strange that this is deprecated in datafusion-python but not in datafusion: https://docs.rs/datafusion/53.0.0/datafusion/dataframe/struct.DataFrame.html#method.select_columns
There was a problem hiding this comment.
Because we can take both arguments in select vs select_columns, we deprecated select_columns a long time ago because it's more pythonic to have just one interface.
| f.avg(col('"Attack"')).over( | ||
| Window( | ||
| window_frame=WindowFrame("rows", None, None), | ||
| partition_by=[col('"Type 1"')], | ||
| order_by=[col('"Speed"')], | ||
| null_treatment=NullTreatment.IGNORE_NULLS, | ||
| ) | ||
| ).alias("Average Attack"), |
There was a problem hiding this comment.
I don't think the order_by, window_frame, or null_treatment make sense here.
There was a problem hiding this comment.
The window_frame is necessary so that we get the same avg across the entire partition. Otherwise we'd need a sort on it afterwards to make sure it shows up in the "running avg" and I think using the frame is more easy to understand. But you're right about the null treatment and order_by.
…he deprecated html_formatter
Which issue does this PR close?
None
Rationale for this change
We have a bunch of stale interfaces, some that have been marked as deprecated for over 10 releases. This removes any that have been deprecated for >3.
What changes are included in this PR?
Remove deprecated interfaces.
Are there any user-facing changes?
Yes, these old interfaces are no longer available.