fix(pathfinder): make find_nvidia_binary_utility deterministic, never search CWD#2196
fix(pathfinder): make find_nvidia_binary_utility deterministic, never search CWD#2196aryanputta wants to merge 3 commits into
Conversation
… search CWD find_nvidia_binary_utility assembled a bounded list of trusted directories (NVIDIA wheel bin/, CONDA_PREFIX, CUDA_HOME/CUDA_PATH) and then delegated to shutil.which(name, path=trusted_dirs). On Windows shutil.which prepends the process current working directory to the search even when an explicit path= is supplied, so a binary located in an arbitrary (possibly attacker-writable) CWD could be returned in preference to the trusted CUDA / Conda / wheel binary. That violates the pathfinder contract of a deterministic lookup over a documented, bounded set of trusted roots. Replace the shutil.which delegation with an explicit resolver that searches only the trusted directories, in order, returning the first executable match. The current working directory and ambient PATH are never consulted. POSIX execute-bit (X_OK) and Windows extension semantics are preserved, so behavior is unchanged except for removing the CWD/PATH leakage. Names resolved in the existing trusted dirs return exactly as before. Rewrites the search-path tests to assert the deterministic probe order and adds TestResolveInTrustedDirs covering CWD isolation, first-match-wins, empty/duplicate dir skipping, and POSIX non-executable rejection. Fixes NVIDIA#2119
|
Hi @aryanputta, this is great, but in isolation it'll probably break some users. This was pointed out by @kkraus14 here before:
I ran the context and my thoughts by GPT around the same time, which lead to the conclusion at the end of this comment:
I still think that's the right compromise:
The behavior difference is significant enough though that I'd want to bump the pathfinder minor version, which we can do; I don't have concerns about that. How do you feel about expanding this PR to fold in the CTK-root canary fallback? — I believe the code changes will still be quite modest. |
…utility After the deterministic search over the explicit trusted directories (NVIDIA wheel bin/, CONDA_PREFIX, CUDA_HOME/CUDA_PATH) misses, fall back to a CTK-root canary probe: resolve cudart through the OS dynamic loader, which honors LD_LIBRARY_PATH on Linux and the native DLL search on Windows, derive the CUDA Toolkit root from its absolute path, and search that root's bin layout. This addresses the concern raised on NVIDIA#2196: users who follow the CUDA Linux installation guide set LD_LIBRARY_PATH for libraries and PATH for executables. The bounded finder alone would stop finding the utility for them because PATH is intentionally never consulted. The canary fallback recovers that case through LD_LIBRARY_PATH instead of PATH. LD_LIBRARY_PATH is still an attack vector, but a significantly harder one to exploit than PATH, and the ambient PATH and process CWD remain unused. The canary runs only after the explicit trusted dirs miss, so the common wheel/conda/CUDA_HOME cases never spawn the resolver subprocess. The canary -> CTK-root resolution is factored into a shared resolve_ctk_root_via_canary helper reused by the dynamic-library CTK-root canary flow. Adds tests for the fallback (found, ordering, Windows bin layout, not consulted when found earlier, cached) and for resolve_ctk_root_via_canary. Adds 1.6.0 release notes for the minor version bump.
|
Thanks @rwgk, that compromise makes sense and I've folded it in (pushed What the new commit adds Search order is unchanged for the explicit trusted dirs, with the canary as a strict fallback:
Key points:
Version Added Tests Added coverage for the fallback in |
pre-commit.ci mypy flagged returning Any from resolve_ctk_root_via_canary and _resolve_ctk_root_via_canary (both declared -> str | None), because derive_ctk_root resolves to Any under the pathfinder mypy config. Bind the result to an annotated local before returning, matching the pattern used elsewhere in the package.
|
Quick check-in: is there anything else you would like changed here after the CTK-root canary fallback update, or is this mainly waiting on review / runner validation now? Happy to adjust if the version note or fallback behavior should be shaped differently. |
Summary
Fixes #2119.
find_nvidia_binary_utilitypreviously delegated toshutil.which(name, path=trusted_dirs). On Windows, CPython'sshutil.whichprepends the process current working directory even when an explicitpath=is supplied. That allowed a binary in an arbitrary CWD to shadow the CUDA / Conda / wheel binary that pathfinder is meant to discover through a bounded search.This PR replaces that delegation with an explicit resolver over trusted directories only, then folds in the maintainer-requested CTK-root canary fallback so users who follow the CUDA installation guide continue to work when the toolkit is visible through the dynamic loader.
Fix
The binary lookup order is now:
bin/directories from site-packages.CONDA_PREFIX.CUDA_HOME/CUDA_PATH.cudartthrough the OS dynamic loader, derive the CUDA Toolkit root, and search that root'sbinlayout.The current working directory and ambient
PATHare never consulted.Notes:
Tests
bin/x64,bin/x86_64, andbinlayout.resolve_ctk_root_via_canary(...).Local checks:
ruff checkandruff format --checkpass on the touched files.