Skip to content

fix(onboard): skip CDI GPU mode on Docker Desktop WSL (#5512)#5537

Open
abhi-0906 wants to merge 2 commits into
NVIDIA:mainfrom
abhi-0906:fix/wsl-docker-desktop-gpu-patch-mode
Open

fix(onboard): skip CDI GPU mode on Docker Desktop WSL (#5512)#5537
abhi-0906 wants to merge 2 commits into
NVIDIA:mainfrom
abhi-0906:fix/wsl-docker-desktop-gpu-patch-mode

Conversation

@abhi-0906

@abhi-0906 abhi-0906 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

On Docker Desktop + WSL2 with an NVIDIA GPU, onboard's [6/8] Docker GPU patch recreates the sandbox container with --device nvidia.com/gpu=all (CDI syntax) and fails:

CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all

even though preflight already logs that it will use the --gpus compatibility path. The only workaround today is --no-gpu / NEMOCLAW_SANDBOX_GPU=0, which disables GPU entirely.

Root cause

Docker Desktop advertises CDI spec directories, so dockerReportsNvidiaCdiDevices() returns true and buildDockerGpuModeCandidates() offers CDI as the first candidate. The create-only probe (docker create … true) passes, but the real recreate fails because the WSL distro exposes no usable nvidia.com/gpu spec. The Docker Desktop WSL status was detected at preflight but never reached the mode selector — selectDockerGpuPatchMode only received {image, device, backend}.

PR #5198 (which closed #5180) added the CDI-injection failure classification, the --no-gpu recovery hint, and the warning that NEMOCLAW_DOCKER_GPU_PATCH=0 is ignored on this runtime — but it did not change mode selection. This is the unaddressed root cause.

Fix

Thread the existing Docker Desktop WSL detection (isDockerDesktopWslRuntime(), already used to gate the patch) through selectDockerGpuPatchMode into buildDockerGpuModeCandidates, and skip the CDI candidate when on Docker Desktop WSL so the patch uses --gpus all — the path preflight already commits to.

Testing

  • New unit tests in docker-gpu-patch-wsl.test.ts: CDI is skipped (first candidate is --gpus all) when dockerDesktopWsl is true even with CDI advertised, and CDI is still preferred otherwise.
  • tsc -p tsconfig.src.json clean; GPU-patch suites pass (remaining failures are pre-existing Windows-only /etc/cdi path tests, identical on main).

Notes / follow-up

Fixes #5512.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced GPU configuration for Docker Desktop on Windows Subsystem for Linux (WSL). The system now properly detects WSL runtime environments and automatically selects GPU acceleration modes that work reliably on Docker Desktop WSL, avoiding GPU modes that may not be available or incompatible within that specific environment.

Signed-off-by: Abhimanyu Kumar abhimanyukumar7290@gmail.com

On Docker Desktop + WSL2, onboard's [6/8] Docker GPU patch recreates the
sandbox with `--device nvidia.com/gpu=all` (CDI) and fails with "CDI
device injection failed: unresolvable CDI devices nvidia.com/gpu=all",
even though preflight already commits to the `--gpus` compatibility path.
Docker Desktop advertises CDI spec directories, so dockerReportsNvidiaCdiDevices()
returns true and buildDockerGpuModeCandidates offers CDI first; the
create-only probe passes but the real recreate fails because the WSL
distro exposes no usable nvidia.com/gpu spec.

Thread the existing Docker Desktop WSL detection (isDockerDesktopWslRuntime,
already used to gate the patch) through selectDockerGpuPatchMode into
buildDockerGpuModeCandidates, and skip the CDI candidate when on Docker
Desktop WSL so the patch uses `--gpus all`. Native Docker-CDI hosts are
unaffected and still prefer CDI (preserving the NVIDIA#4948 gateway
supervisor-wiring contract).

Reached only after the [2/8] gateway-bind issue (NVIDIA#5513 / NVIDIA#5534). A
follow-up is still needed for the orphaned `*-nemoclaw-gpu-backup-*`
container left behind on an early patch failure.

Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cv

cv commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@abhi-0906 can you add a DCO 'Signed-off-by' to the PR description, please?

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a dockerDesktopWsl boolean flag to GPU mode candidate selection to suppress CDI candidates on Docker Desktop WSL runtimes where CDI is advertised but unusable. The flag is threaded through four function signatures in docker-gpu-patch.ts, sourced from isDockerDesktopWslRuntime() in docker-gpu-sandbox-create.ts, and validated by new tests.

Changes

Docker Desktop WSL CDI skip in GPU patch flow

Layer / File(s) Summary
CDI skip logic and propagation chain
src/lib/onboard/docker-gpu-patch.ts
Expands buildDockerGpuModeCandidates options type with dockerDesktopWsl?: boolean and adds a guard that omits CDI from the candidate list when the flag is true. Threads the flag through selectDockerGpuPatchMode, recreateOpenShellDockerSandboxWithGpu, and applyDockerGpuPatchOrExit.
Runtime probe wiring in sandbox-create
src/lib/onboard/docker-gpu-sandbox-create.ts
Adds dockerDesktopWsl?: boolean to DockerGpuSandboxCreatePatchOptions (defaulting to isDockerDesktopWslRuntime()) and passes it into applyOptions.
Tests for WSL CDI skip behavior
src/lib/onboard/docker-gpu-patch-wsl.test.ts
Imports buildDockerGpuModeCandidates and adds a describe block asserting CDI is excluded when dockerDesktopWsl is true and that CDI remains first for non-Docker-Desktop-WSL hosts that advertise CDI.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5198: Modifies docker-gpu-patch.ts to adjust CDI-related GPU patch eligibility and messaging for Docker Desktop WSL, directly overlapping with the CDI candidate skip and dockerDesktopWsl threading introduced in this PR.

Suggested labels

bug-fix, platform: wsl, area: sandbox, v0.0.65

Suggested reviewers

  • cv

Poem

🐇 On WSL the CDI spec hides away,
So --gpus all shall carry the day.
The flag hops through each function in line,
Skipping the path that would fail every time.
No orphan containers, no exit 1 cry —
The GPU patch lands and bunnies say "hi!" 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: skipping CDI GPU mode selection on Docker Desktop WSL to fix the onboard process failure.
Linked Issues check ✅ Passed The PR implementation threads the dockerDesktopWsl flag through the GPU patch selection logic to skip CDI mode on Docker Desktop WSL, directly addressing both issue requirements to use --gpus compatibility path instead of CDI.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to GPU mode candidate selection and parameter threading; no unrelated refactoring, documentation changes, or error classification improvements are included.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@abhi-0906

Copy link
Copy Markdown
Contributor Author

Thanks @cv — added the Signed-off-by line to the PR description. I've also added it to the descriptions of the related PRs in this chain (#5534, #5536, #5541) so they're DCO-ready for squash-merge. All commits are individually signed off too; let me know if anything else is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants