Conversation
Rewrite the distance computation engine from scratch on top of v0.3.5: - Vectorized kNN distances using NumPy broadcasting with chunked processing for memory efficiency and progress bar support - Add n_jobs parameter for cross-cluster multiprocessing via concurrent.futures (n_jobs=-1 uses all cores) - Restructure Numba path with non-generator kernels that support numba.prange for thread-level parallelism - Optional scipy.spatial.distance.cdist and scipy.special.erf acceleration when scipy is available - Vectorize _standard_distances, _prob_distances, and _norm_prob_outlier_factor pipeline methods - Fully backward-compatible: all existing API calls work unchanged Closes #36 Made-with: Cursor
Update version across loop.py, setup.py, and README badge. Add changelog entry documenting all new features and improvements. Made-with: Cursor
…mark - Add tests for 1D data vectorized path, parallel progress bar, n_jobs=-2 validation, and single-cluster progress bar - Update benchmark script with n_jobs parallel examples and __main__ guard for macOS multiprocessing compatibility - Note n_jobs replacing num_threads in changelog Co-authored-by: Cursor <cursoragent@cursor.com>
Numba prange dispatch was incorrectly capped to the number of clusters, preventing intra-cluster thread parallelism for single-cluster data. This fix provides 2-3x speedups on multi-core machines. Adds dedicated Numba tests, a parallel benchmark script, and updated documentation. Co-authored-by: Cursor <cursoragent@cursor.com>
Multiprocessing via ProcessPoolExecutor was consistently slower than sequential due to process startup and data serialization overhead. Numba prange provides 2-3x speedups with zero overhead. n_jobs now only takes effect with use_numba=True; a warning is issued otherwise. Co-authored-by: Cursor <cursoragent@cursor.com>
feat: add parallel distance computation and vectorized pipeline
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.