rowLevelResultsAsDataFrame returns all True on multi-partition DataFrames

## Bug

`VerificationResult.rowLevelResultsAsDataFrame()` returns all `true` Boolean values for row-level check columns when the input DataFrame has multiple partitions (e.g., read from a Delta table with `spark.table()`).

## Reproduction

1. Read a DataFrame from a persisted Delta table (~200K rows, multiple partitions)
2. Run a VerificationSuite with `isUnique` constraint — aggregate result correctly reports uniqueness < 1.0
3. Call `VerificationResult.rowLevelResultsAsDataFrame(spark, result, data)`
4. The Boolean column is `true` for all rows — no rows are flagged as failures

A minimal in-memory DataFrame (e.g., 4 rows with `spark.createDataFrame`) works correctly.

## Expected behavior

Duplicate rows should have `false` in the row-level Boolean column.

## Workaround

Calling `df.repartition(1)` before passing to the VerificationSuite produces correct row-level results. This suggests a row ordering / partition alignment issue in the underlying Deequ JVM method.

## Related

- Root cause issue filed on awslabs/deequ: https://github.com/awslabs/deequ/issues/753

## Environment

- Spark 3.5 (Microsoft Fabric)
- pydeequ (latest)
- Delta Lake table source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rowLevelResultsAsDataFrame returns all True on multi-partition DataFrames #272

Bug

Reproduction

Expected behavior

Workaround

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

rowLevelResultsAsDataFrame returns all True on multi-partition DataFrames #272

Description

Bug

Reproduction

Expected behavior

Workaround

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions