Skip to content

fix(doc-tests): closes #244 — auto-downsample retina 2× screenshots before SSIM#298

Merged
proggeramlug merged 1 commit intomainfrom
fix/issue-244
Apr 29, 2026
Merged

fix(doc-tests): closes #244 — auto-downsample retina 2× screenshots before SSIM#298
proggeramlug merged 1 commit intomainfrom
fix/issue-244

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

  • Root cause (crates/perry-doc-tests/src/image_diff.rs:35-43): the SSIM comparison hard-errored on any size mismatch before SSIM ran. On retina macOS the gallery binary captures at 2× backing scale (e.g. 1800×1940), but the blessed baseline is 1× (900×970). The strict equality check returned Err("size mismatch") immediately → SCREENSHOT_DIFF failure for 30+ consecutive commits.
  • Fix: when the actual screenshot is exactly 2× the baseline in both dimensions, downsample with a 2×2 box-filter (halve()) before SSIM. After downsampling the 1× pixels match the 1× baseline well within the existing 0.05 SSIM threshold. Any other size mismatch still returns Err (genuine resize regressions are caught).
  • Unit tests (3 new, cargo test -p perry-doc-tests → 10/10):
    • halve_averages_2x2_blocks — verifies box-filter arithmetic
    • diff_retina_2x_against_1x_baseline_passes — the exact retina repro shape (4×4 actual vs 2×2 baseline)
    • diff_arbitrary_size_mismatch_errors — non-2× mismatches still produce a size-mismatch error

Files changed

File Change
crates/perry-doc-tests/src/image_diff.rs Add halve() downsampler; update diff() to use it on 2× captures; add #[derive(Debug)] to DiffOutcome; 3 unit tests
crates/perry-doc-tests/Cargo.toml Add tempfile = "3" as dev-dependency (for unit tests only)

Test plan

  • cargo test --release -p perry-doc-tests → 10/10 passed (was 7/10)
  • On macOS retina hardware: ./target/release/doc-tests --skip-xcompile → 80/80 passed (was 79/80)

Closes #244

https://claude.ai/code/session_01FVjeQDnyymLz7yTJbnsyKG


Generated by Claude Code

…efore SSIM

The screenshot comparison in `image_diff::diff()` hard-errored on any
size mismatch before SSIM ever ran.  On macOS retina hardware the
gallery binary captures a 2× PNG (e.g. 1800×1940) while the blessed
baseline is 1× (900×970); the strict `actual.width() != baseline.width()`
check returned `Err("size mismatch")` immediately, producing a recurring
SCREENSHOT_DIFF failure for 30+ commits.

Fix: when the actual screenshot is exactly 2× the baseline in both
dimensions, downsample with a 2×2 box-filter (`halve()`) before SSIM.
Any other size mismatch still returns an error (so genuine resize
regressions are still caught). A downsampled 1× image compared with
the existing 1× baseline falls well within the 0.05 SSIM threshold.

Added `#[derive(Debug)]` to `DiffOutcome` (needed for `unwrap_err()` in
tests) and 3 unit tests pinning the new behaviour:
- `halve_averages_2x2_blocks` — verifies box-filter arithmetic
- `diff_retina_2x_against_1x_baseline_passes` — the exact retina repro
- `diff_arbitrary_size_mismatch_errors` — non-2× mismatches still fail

https://claude.ai/code/session_01FVjeQDnyymLz7yTJbnsyKG
@proggeramlug proggeramlug merged commit 51b58dd into main Apr 29, 2026
4 of 9 checks passed
@proggeramlug proggeramlug deleted the fix/issue-244 branch April 29, 2026 19:44
proggeramlug added a commit that referenced this pull request Apr 29, 2026
… merges

The lint job on c61296b caught rustfmt drift from the recent batch
of squash-merges:

- crates/perry-doc-tests/src/image_diff.rs (PR #298)
- crates/perry-hir/src/destructuring.rs (PR #301)
- crates/perry-hir/src/lower/expr_new.rs (PR #301)
- crates/perry-hir/src/lower_decl.rs (PR #301)
- crates/perry-stdlib/src/fetch.rs (PR #301)
- crates/perry-stdlib/src/streams.rs (PR #301)
- crates/perry-stdlib/src/ethers.rs (PR #299, my conflict resolution
  inserted a long line that needed wrapping)
- crates/perry-codegen/src/lower_call.rs +
  crates/perry-codegen/src/lower_call/builtin.rs +
  crates/perry-codegen/src/runtime_decls.rs (PR #301)

Plus Cargo.lock sync from 0.5.384 → 0.5.385 (my v0.5.385 commit
landed Cargo.toml's version bump but I forgot to stage Cargo.lock).

Pure `cargo fmt --all` output, no hand edits. Verified
`cargo build --release -p perry -p perry-runtime -p perry-stdlib`
clean post-fmt in 1m 31s.

No version bump — same precedent as ea95e85 (rustfmt baseline as
chore companion to #294).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

doc-tests: ui/gallery.ts retina screenshot baseline mismatch (recurring)

2 participants