Skip to content

Dev#87

Merged
vc1492a merged 6 commits into
mainfrom
dev
May 26, 2026
Merged

Dev#87
vc1492a merged 6 commits into
mainfrom
dev

Conversation

@vc1492a
Copy link
Copy Markdown
Owner

@vc1492a vc1492a commented May 26, 2026

No description provided.

vc1492a and others added 6 commits March 20, 2026 11:12
Rewrite the distance computation engine from scratch on top of v0.3.5:

- Vectorized kNN distances using NumPy broadcasting with chunked
  processing for memory efficiency and progress bar support
- Add n_jobs parameter for cross-cluster multiprocessing via
  concurrent.futures (n_jobs=-1 uses all cores)
- Restructure Numba path with non-generator kernels that support
  numba.prange for thread-level parallelism
- Optional scipy.spatial.distance.cdist and scipy.special.erf
  acceleration when scipy is available
- Vectorize _standard_distances, _prob_distances, and
  _norm_prob_outlier_factor pipeline methods
- Fully backward-compatible: all existing API calls work unchanged

Closes #36

Made-with: Cursor
Update version across loop.py, setup.py, and README badge.
Add changelog entry documenting all new features and improvements.

Made-with: Cursor
…mark

- Add tests for 1D data vectorized path, parallel progress bar,
  n_jobs=-2 validation, and single-cluster progress bar
- Update benchmark script with n_jobs parallel examples and
  __main__ guard for macOS multiprocessing compatibility
- Note n_jobs replacing num_threads in changelog

Co-authored-by: Cursor <cursoragent@cursor.com>
Numba prange dispatch was incorrectly capped to the number of clusters,
preventing intra-cluster thread parallelism for single-cluster data.
This fix provides 2-3x speedups on multi-core machines. Adds dedicated
Numba tests, a parallel benchmark script, and updated documentation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Multiprocessing via ProcessPoolExecutor was consistently slower than
sequential due to process startup and data serialization overhead.
Numba prange provides 2-3x speedups with zero overhead. n_jobs now
only takes effect with use_numba=True; a warning is issued otherwise.

Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add parallel distance computation and vectorized pipeline
@vc1492a vc1492a merged commit d9c90e5 into main May 26, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant