Perf optimization by wrignj08 · Pull Request #1 · DPIRD-DMA/MultiClean

wrignj08 · 2026-05-04T01:03:37Z

No description provided.

Performance: NLUM benchmark 85.6s -> 40.1s (53% faster). Public API unchanged; outputs bit-identical to baseline on all bench scenarios. Algorithmic changes (multiclean/utils.py, multiclean/multiclean.py): * Replace float32 smoothed_labels (4 bytes/px) with uint8/uint16 code array (1-2 bytes/px). Per-class equality scan in build_invalid_mask is now 2-4x cheaper in memory bandwidth. * Fold find_small_islands into build_invalid_mask, OR-reducing each per-class mask in flight. Replaces a Dict[int, ndarray] of K bool masks (~43 GB peak on NLUM) with a single shared invalid_mask. * In-place fill in fill_invalids (no smoothed_labels.copy()). * distance_transform_edt(return_distances=False) skips scipy's internal float64 distance allocation. * (array == c).view(np.uint8) instead of .astype(np.uint8) -- no bool->uint8 conversion copy. * Drop redundant np.isin scan in fill_invalids (every non-NaN pixel is already a valid class label by construction). Edge-case fixes (also exercised by new tests): * float64 / int64 / int32 > 2^24 inputs are now preserved exactly. The previous implementation routed everything through float32 and silently lost precision (and emitted RuntimeWarning) on large ints. * All-NaN float input with fill_nan=True now deterministically returns NaN instead of relying on np.empty's uninitialised memory. Tests (tests/test_multiclean.py): * test_all_nan_fill_nan_true_is_deterministically_nan * test_dtype_preserved_for_float64_and_large_ints * test_many_classes_exercises_uint16_code_path * test_subset_targets_leave_multiple_background_classes_untouched Tooling: * .pre-commit-config.yaml -- ruff-check, ruff-format, pytest pre-push. * .github/workflows/ci.yml -- ruff + mypy + pytest on push/PR. * pyproject.toml -- add mypy + pre-commit dev deps, [tool.mypy].

Locks in that clean_array(smooth_edge_size=3, min_island_size=5) on the full Landsat cloud-and-cloud-shadow mask used by notebooks/Cloud example.ipynb stays bit-identical to the output produced by the original main-branch implementation. The expected output (1.7 MB compressed, ~63 MB raw uint8) was generated on main and lives at tests/data/landsat_expected.npz. The test loads it back and asserts np.array_equal against the current implementation's output. Skipped automatically if the fixture or input tif are missing. Test runs in ~3 s on the full array (8011x7901, 4 classes).

Tests should own their fixtures. The cloud/shadow mask is the input for both the regression test and the notebooks/Cloud example.ipynb demo, so move it to tests/data/ as the canonical location and have the notebook reach into it via Path().cwd().parent / "tests" / "data". No code change; only file relocation and notebook path adjustment.

Reverts the notebook half of the prior move: the cloud/shadow tif now lives in BOTH notebooks/data/ (so notebooks/Cloud example.ipynb stays self-contained for users following the example) and tests/data/ (so the regression test owns its own fixture and does not reach across the repo). 3.2 MB duplicated, but each side stands alone.

Drops the scipy dependency entirely (was the only consumer) and makes the nearest-neighbour fill ~3.4x faster on the Landsat cloud/shadow notebook example (fill stage 2.05s -> 0.60s; whole clean_array 2.48s -> 1.07s, a 2.19x end-to-end speedup). Mathematically equivalent: cv2.distanceTransformWithLabels under DIST_MASK_PRECISE returns an exact L2 nearest-source assignment. The only difference vs scipy.ndimage.distance_transform_edt is the choice of which equidistant source pixel wins a tie; both pick a source at the identical minimum distance. On the Landsat fixture this affects ~90k pixels out of 354k invalid pixels, all of which are tied-distance cases. Approach is uniformly faster across image sizes >=512x512 (2.5-4.2x on the fill stage, see edt_bench in the perf-optimization scratch space). For sizes >=1024 with synthetic blocky data the bytes are even identical because the layout has no tied-distance configurations. Updated tests/data/landsat_expected.npz to lock in the cv2 tie-break choices so the regression test stays a meaningful baseline going forward.

Three additions targeting gaps the cv2 EDT swap exposed: * test_fill_works_on_tiny_image_with_one_invalid_pixel -- cv2 has minimum-size requirements for some operations; pin a 4x4 array with a single invalid pixel through the full fill path. * test_single_valid_pixel_propagates_to_all_invalid -- one valid source surrounded by an all-NaN grid (with fill_nan=True); also exercises the ``int(valid_labels.max()) + 1`` lookup-size calculation when the label table has exactly one real entry. * Renamed test_landsat_cloud_shadow_matches_main_branch_output to test_landsat_cloud_shadow_regression -- the fixture was regenerated from this branch when we swapped in cv2, so "matches main branch output" is no longer accurate. New docstring documents how to regenerate the fixture if the implementation is ever deliberately changed again.

The Changelog follows Keep a Changelog format with the in-progress optimisation work captured under [Unreleased]. Older entries (0.1.0 initial release, 0.2.0 fill_nan addition) are reconstructed from git history. Linked from the README via an absolute GitHub URL so it renders as a working link on PyPI as well as GitHub. Also exposed via ``[project.urls] Changelog`` in pyproject.toml so package metadata (and tools like ``uv tree``, GitHub repository sidebar, etc.) point at the same canonical location, mirroring the iiq2img layout.

The ~43 GB figure was naive math (147 bool masks * 292 MB each), not measured RSS. macOS memory compression (and the run-length structure of bool masks) made the dict-of-K-masks approach much cheaper in resident memory than that arithmetic suggested. Replace with the actually measured peak RSS savings: 5.4 GB -> 2.8 GB on the 4-class Landsat example, with smaller absolute savings on the 147-class raster.

Closes the [Unreleased] section in CHANGELOG with a 0.3.0 release dated today. Headline changes: ~2x faster clean_array on Landsat-scale inputs, strict input-dtype preservation (no more silent float32 round-trip), and the scipy runtime dependency dropped (cv2 now handles the nearest-valid fill).

* Drop SciPy from the "Fast: NumPy + OpenCV + SciPy" feature line -- it's no longer a runtime dependency since the cv2-fill swap. * Correct "elliptical kernel" -> "circular kernel" in the How It Works section. ``create_circle_kernel`` thresholds on Euclidean distance from centre, so the structuring element is circular, not elliptical. * Tighten the API-Reference return-value note: the output now matches the input shape AND dtype (the prior wording was carried over from when float64 inputs got silently downcast to float32 internally). Verified the Quick Start code block runs as written; all referenced files (assets/land_use_before_after.png, the three notebooks) exist in the tree; clean_array's signature matches the documented one exactly (default values and parameter names).

Mirrors iiq2img's publish.yml. Triggers on tag pushes matching ``v*``, builds sdist+wheel via ``uv build``, uploads them as a workflow artifact, and uses ``pypa/gh-action-pypi-publish`` to push to PyPI under a GitHub ``pypi`` Environment. No PyPI API token is stored; authentication is via OIDC trusted- publisher config that has to be set up once on PyPI's side -- see release notes added in this commit's accompanying README change. Verified ``uv build`` produces a clean 10 KB wheel + 15 KB sdist with correct 0.3.0 version, single ``opencv-python>=4.0`` runtime dep, and the README + Project-URLs (Homepage / Repository / Changelog) in the package metadata.

Adds an empty multiclean/py.typed marker file and tells setuptools to ship it via [tool.setuptools.package-data]. With this, downstream consumers' type-checkers (mypy, pyright, pylance, etc.) pick up the inline type hints already present in clean_array and the utils module instead of treating multiclean as untyped. Verified the marker is present in the built wheel: multiclean/py.typed (0 bytes) Without this file, type-checkers default to ignoring annotations from installed third-party packages even when the source is annotated.

Mirrors iiq2img: setuptools-scm reads the version from the most recent ``v*`` tag, so cutting a release becomes a single ``git tag`` step and the ``multiclean/__version__.py`` file no longer needs hand-bumping. Changes: * pyproject.toml: ``setuptools-scm>=8`` added to build-system requires; the ``[tool.setuptools.dynamic]`` block is replaced by an empty ``[tool.setuptools_scm]`` block that enables the plugin. * multiclean/__init__.py: ``__version__`` is now read at runtime via ``importlib.metadata.version("multiclean")`` so the in-package symbol always tracks what's actually installed. * multiclean/__version__.py: deleted; no longer hand-maintained. When a build runs from a clean tag (e.g. ``v0.3.0``), the wheel filename and ``multiclean.__version__`` both resolve to ``0.3.0`` exactly. From an untagged checkout setuptools-scm produces ``0.1.devN+g<sha>`` so it's visually obvious the build wasn't from a release tag.

Captures the full cut-a-release sequence (update CHANGELOG, merge, tag, approve deployment, verify on PyPI) plus the one-time PyPI/GH setup so future-me / contributors don't have to reverse-engineer publish.yml + the trusted-publisher config. Linked from README under "Contributing" with an absolute GitHub URL so it works on PyPI as well as GitHub.

wrignj08 added 18 commits May 1, 2026 20:18

updated notebooks

9b2a4e6

Reword the small-island-merging changelog entry for plain English

ca89656

Trim verbose small-island bullet in CHANGELOG

00f64c9

Drop maintainer-only Prerequisites section from RELEASING.md

6af9048

wrignj08 merged commit 42238d3 into main May 4, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf optimization#1

Perf optimization#1
wrignj08 merged 18 commits intomainfrom
perf-optimization

wrignj08 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wrignj08 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant