Skip to content

Merge #193 + large-tree-migration tip-limit hardening#196

Open
ms609 wants to merge 11 commits intomainfrom
copilot/merge-193-large-tree-features
Open

Merge #193 + large-tree-migration tip-limit hardening#196
ms609 wants to merge 11 commits intomainfrom
copilot/merge-193-large-tree-features

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Apr 16, 2026

Summary

This PR combines the missing large-tree resilience features from PR #193 and large-tree-migration, selecting the stronger approach where they differed.

Included features

C++ hardening

  • Added split_int alias (int32-based) in public types and re-exported in src/ints.h.
  • Migrated serial and batch split/tip/bin counters in tree-distance code to split_int/int32 to remove int16 bottlenecks.
  • check_ntip() now hard-stops with Rcpp::stop() and reports requested tips + compiled limit.
  • Added int64_t overflow-safe products in add_ic_element().
  • Added serial Rcpp::checkUserInterrupt() checks every 1024 outer iterations in all seven serial scorers.
  • Added check_ntip() validation at all batch C++ entry points.
  • Kept dynamic RF/MSI scratch buffers (no fixed SL_MAX_* stack arrays).

R-side guard centralization

  • Added cpp_max_tips() export and wired Rcpp exports.
  • Added .SL_MAX_TIPS initialization on load and .CheckMaxTips() helper.
  • Replaced hardcoded 32767L guards with centralized .CheckMaxTips() calls.
  • Preserved stricter intrinsic limits for MAST / consensus-info with clear messages.

Tests

  • Added tests/testthat/test-large-trees.R:
    • over-limit rejection tests based on .SL_MAX_TIPS
    • positive guarded large-tree functional tests (4000 tips; feasible 8000-tip RF/IRF)
  • Updated existing tip-limit tests to use .SL_MAX_TIPS/.CheckMaxTips.

Validation

  • Targeted tests passed after these changes:
    • devtools::test(filter = 'batch_coverage|tree_distance_utilities')
    • PASS 346, FAIL 0, WARN 0, SKIP 0

(Previous full-suite run reported one failure caused by missing optional TBRDist in environment.)

- add split_int alias and migrate serial/batch split counters to split_int/int32
- harden check_ntip with runtime Rcpp::stop() and informative compiled-limit message
- add overflow-safe int64 arithmetic in add_ic_element
- add serial interrupt checks every 1024 iterations in all seven serial scorers
- centralize R-side tip limits via .SL_MAX_TIPS + .CheckMaxTips loaded from cpp_max_tips()
- replace hardcoded 32767L guards across distance, transfer, NNI, MAST and consensus paths
- add C++ n_tip guards to all batch entry points
- add large-tree functional/guard tests and update existing tip-limit tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 82.91262% with 88 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.27%. Comparing base (f0ac649) to head (c66ee0f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/tree_distances.cpp 79.19% 31 Missing ⚠️
src/pairwise_distances.cpp 86.84% 25 Missing ⚠️
src/tree_distances.h 52.94% 24 Missing ⚠️
inst/include/TreeDist/mutual_clustering.h 75.00% 6 Missing ⚠️
src/nni_distance.cpp 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   95.84%   94.27%   -1.58%     
==========================================
  Files          57       57              
  Lines        5514     5678     +164     
==========================================
+ Hits         5285     5353      +68     
- Misses        229      325      +96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ms609 and others added 2 commits April 17, 2026 09:26
Use deterministic 4000-tip as.phylo near-neighbour trees for fast shortcut paths, remove redundant 8000-tip split coverage, and assert known RF expected values plus batch/pairwise agreement.

Add a source-level regression test that verifies serial interrupt checks and batch n_tip guards remain wired in C++ paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Guard the C++ source-inspection test so it runs in source-tree test sessions but skips in installed-package testthat runs where src/*.cpp files are unavailable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ms609
Copy link
Copy Markdown
Owner Author

ms609 commented Apr 17, 2026

Addressed follow-up feedback:

  • Updated large-tree tests to use deterministic near-neighbour trees (as.phylo(0, 4000) vs as.phylo(1, 4000)) so they stay fast and exercise shortcut paths.
  • Removed the separate 8000-tip functional duplicate.
  • Added known-answer expectation for RF (2) plus batch-vs-pairwise equality checks for CID/MSD/IRF.
  • Added an explicit guard-wiring regression test for serial interrupt checks and batch check_ntip wiring.
  • Adjusted that source-inspection test to skip in installed-package checks where src/*.cpp is unavailable.

Validation run locally:

  • devtools::test() passes (FAIL 0 | WARN 0 | SKIP 6 | PASS 2079)
  • R CMD build . passes
  • R CMD check TreeDist_2.13.0.9002.tar.gz passes with _R_CHECK_FORCE_SUGGESTS_=false (status OK)

Repository owner deleted a comment from github-actions bot Apr 17, 2026
Repository owner deleted a comment from github-actions bot Apr 17, 2026
Remove the .SL_MAX_TIPS global assignment and make tip-limit checks reflect real constraints.

Use TreeTools SL_MAX_TIPS as a cache/stack threshold only: add log lookup fallbacks for larger trees, update MCI/SPI/MSI/CID kernels to call safe lookup helpers, and keep runtime guards based on integer/type limits.

Add explicit runtime NNI guard at 32768 tips, keep algorithm-specific caps for MAST and consensus-info, and update large-tree/guard tests to validate fast 4000-tip known-answer behaviour and new guard semantics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ms609
Copy link
Copy Markdown
Owner Author

ms609 commented Apr 17, 2026

Addressed the SL_MAX_TIPS concern in commit b8440e96.

What changed:

  • Removed the .SL_MAX_TIPS global assignment (R/zzz.R) and refactored .CheckMaxTips() to avoid using TreeTools stack thresholds as a hard cap.
  • cpp_max_tips() now returns the true C++ type-based upper bound (split/int32), not SL_MAX_TIPS.
  • Added fallback log lookups for nTip > SL_MAX_TIPS in mutual-clustering headers and switched all relevant C++ score paths (tree_distances.cpp, pairwise_distances.cpp, installable header impl) to use these safe lookups.
  • check_ntip() in core tree-distance C++ now validates numeric/type limits only, not SL_MAX_TIPS.
  • Added explicit runtime NNI guard in C++ at the real algorithm limit (32768) and aligned R-side guarding/tests.
  • Updated large-tree tests to focus on fast deterministic 4000-tip known-answer behavior and batch/pairwise agreement (without .SL_MAX_TIPS rejection assumptions).

Validation on this branch:

  • devtools::test() passed (FAIL 0 | WARN 0 | SKIP 5 | PASS 2080)
  • R CMD build . passed
  • R CMD check TreeDist_2.13.0.9002.tar.gz passed with _R_CHECK_FORCE_SUGGESTS_=false (Status: OK)

ms609 and others added 4 commits April 17, 2026 11:15
Add an ASSERT in add_ic_element() to ensure nTip stays within the safe int32 multiplication range, and keep numerator / denominator as int32 products.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use int-native constexpr limits with static_assert fit checks in cpp_max_tips(), removing unnecessary int64_t intermediate types while preserving safety.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoid narrowing overflow in cpp_max_tips() on platforms where int_fast32_t exceeds int width by clamping comparisons in native source types before any cast to int.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ms609
Copy link
Copy Markdown
Owner Author

ms609 commented Apr 17, 2026

Fixed the Ubuntu vignette failure (Trees with > -1 tips are not yet supported) in commit d965ab16.

Root cause: cpp_max_tips() could narrow-overflow when int_fast32_t is wider than int (platform-dependent), producing -1 after cast.

Fix: cpp_max_tips() now clamps in native source types before any cast to int, so it is robust across different int_fast32_t widths.

Validation re-run on branch:

  • devtools::test() passed (FAIL 0 | WARN 0 | SKIP 5 | PASS 2081)
  • R CMD build . passed
  • R CMD check TreeDist_2.13.0.9002.tar.gz with _R_CHECK_FORCE_SUGGESTS_=false passed (Status: OK)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 17, 2026

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD 0.14% 15.5 →
15.4, 15.5
ClusteringInfoDistance(tr50) ⚪ NSD 0.27% 11.6 →
11.5, 11.6
LAPJV(test2000) ⚪ NSD -1.06% 93.3 →
93.1, 96.2
LAPJV(test40) ⚪ NSD -1.54% 0.0175 →
0.0178, 0.0178
LAPJV(test400) ⚪ NSD -0.76% 3.19 →
3.2, 3.23
MutualClusteringInfo(tr200) ⚪ NSD 0.59% 21.8 →
21.6, 21.7
MutualClusteringInfo(tr50) ⚪ NSD 4.07% 23.2 →
22.2, 22.7
PathDist(postTrees) 🟠 Slower 🙁 -46.46% 3.62 →
5.29, 5.3
PhylogeneticInfoDistance(tr200) 🟠 Slower 🙁 -22.88% 236 →
289, 290
PhylogeneticInfoDistance(tr50) 🟠 Slower 🙁 -14.11% 82 →
92.9, 94
RobinsonFoulds(tr200) ⚪ NSD -0.98% 2.89 →
2.9, 2.94
RobinsonFoulds(tr200) ⚪ NSD -1.29% 2.67 →
2.67, 2.73
RobinsonFoulds(tr50) ⚪ NSD -0.08% 4.44 →
4.4, 4.46

Use a local cached accessor in .CheckMaxTips() to avoid repeated cpp_max_tips() calls.
Re-introduce direct table-indexed fast paths for SPI/MSI scoring when n_tips
is within lookup-table range, while retaining safe lookup fallbacks for larger
trees.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD -0.03% 16.3 →
16.3, 16.2
ClusteringInfoDistance(tr50) ⚪ NSD -0.31% 10.8 →
10.7, 11
LAPJV(test2000) ⚪ NSD 2.2% 131 →
126, 130
LAPJV(test40) ⚪ NSD 0.29% 0.0175 →
0.0174, 0.0174
LAPJV(test400) ⚪ NSD 0.27% 3.51 →
3.47, 3.54
MutualClusteringInfo(tr200) ⚪ NSD 0.24% 22.6 →
22.5, 22.6
MutualClusteringInfo(tr50) ⚪ NSD -0.05% 22.1 →
21.8, 22.4
PathDist(postTrees) ⚪ NSD -0.31% 3.51 →
3.52, 3.54
PhylogeneticInfoDistance(tr200) 🟣 ~same 1.28% 237 →
234, 234
PhylogeneticInfoDistance(tr50) 🟢 Faster! 7.09% 76.7 →
71.2, 71.3
RobinsonFoulds(tr200) ⚪ NSD -0.4% 2.83 →
2.83, 2.86
RobinsonFoulds(tr200) ⚪ NSD -0.14% 2.57 →
2.56, 2.59
RobinsonFoulds(tr50) ⚪ NSD -0.04% 4.34 →
4.3, 4.37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant