fix(onboard): bind gateway to 0.0.0.0 on Docker Desktop WSL (#5513)#5534
fix(onboard): bind gateway to 0.0.0.0 on Docker Desktop WSL (#5513)#5534abhi-0906 wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthrough
ChangesDynamic Gateway Bind Address for Docker Desktop WSL
Sequence Diagram(s)sequenceDiagram
participant Onboard as nemoclaw onboard
participant GatewayEnv as docker-driver-gateway-env
participant WSLProbe as wsl-docker-desktop-gpu
participant DockerAdapter as adapters/docker (lazy)
Onboard->>GatewayEnv: resolveGatewayBindAddress(deps?)
GatewayEnv->>GatewayEnv: check NEMOCLAW_GATEWAY_BIND_ADDRESS in env
alt env override present
GatewayEnv-->>Onboard: return env override address
else no override
GatewayEnv->>WSLProbe: resolveWslDockerDesktopStatus()
WSLProbe->>DockerAdapter: require('../adapters/docker').dockerInfoFormat()
DockerAdapter-->>WSLProbe: docker info output
WSLProbe-->>GatewayEnv: "docker-desktop" | other
alt WSL docker-desktop
GatewayEnv-->>Onboard: return 0.0.0.0 (wildcard)
else native Linux
GatewayEnv-->>Onboard: return 127.0.0.1 (default)
end
end
Onboard->>GatewayEnv: getGatewayStartNetworkEnv(deps?)
GatewayEnv-->>Onboard: { OPENSHELL_BIND_ADDRESS: resolvedAddr, ... }
Onboard->>GatewayEnv: getGatewayPortCheckOptions(deps?)
GatewayEnv-->>Onboard: { host: resolvedAddr }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
) On Docker Desktop + WSL2, onboard [2/8] starts the host-mode OpenShell gateway bound to NemoClaw's default 127.0.0.1, but sandbox containers reach it via Docker's host-gateway route, which Docker Desktop maps to its own bridge IP rather than the WSL distro loopback. The [2/8] sandbox-bridge reachability probe then fails with Connection refused, 100% of the time, blocking onboarding on every Docker Desktop WSL host. Resolve the effective bind address with Docker Desktop WSL awareness: with no explicit NEMOCLAW_GATEWAY_BIND_ADDRESS override, bind 0.0.0.0 on Docker Desktop WSL so the host-gateway route can reach the gateway. This mirrors the containerized compat path, which already binds 0.0.0.0 for the same reason. An explicit override still wins, and the existing wildcard-bind warning now fires for the auto-widened case too. The decision is computed in the gateway env layer so it flows consistently into the launch, the runtime drift identity/marker, the preflight port check, and the systemd env file. To keep the detector out of source-level unit tests, wsl-docker-desktop-gpu now lazy-loads the Docker adapter only on an actual probe. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
b61603e to
a3590d1
Compare
|
Opened a small follow-up, #5536, for the secondary orphaned-gateway symptom in #5513: on a genuine sandbox-bridge probe failure, onboard now tears down the gateway it started (via the existing The two are complementary: this PR (#5534) keeps the probe from failing on Docker Desktop WSL in the first place; #5536 handles cleanup for any genuine probe failure on any host. They touch different files ( |
…VIDIA#5513) When onboard's [2/8] sandbox-bridge reachability probe fails, NemoClaw aborts via process.exit(1) without stopping the OpenShell gateway it started (or reused/adopted) earlier in the same run. The gateway is left running, bound to the loopback address, so the accompanying "restart Docker and re-run" hint is misleading: the stale listener survives a Docker restart and collides with the next attempt. Add an onUnreachable hook to verifySandboxBridgeGatewayReachableOrExit that fires only on a genuine unreachable result (not the soft probe_unavailable skip or a successful probe), and wire it at the three host-mode gateway paths in startDockerDriverGateway (fresh start, reuse, adopt) to tear the gateway down via the existing stopDockerDriverGatewayProcess(). That helper reads the pid file written in every path and only terminates a verified gateway process, so it is a safe no-op otherwise. Follow-up to the bind-address fix in NVIDIA#5534: that change keeps the probe from failing on Docker Desktop WSL in the first place, while this ensures any genuine probe failure no longer orphans the gateway. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
|
@abhi-0906 can you add a DCO "Signed-off-by: ..." line to the PR description, please? |
|
Thanks @cv — added the Signed-off-by line to the PR description. |
Summary
On Docker Desktop + WSL2,
nemoclaw onboardstep[2/8]starts the host-mode OpenShell gateway bound to NemoClaw's default127.0.0.1. Sandbox containers reach the gateway via Docker'shost-gatewayroute, which Docker Desktop maps to its own bridge IP rather than the WSL distro loopback — so the[2/8]sandbox-bridge reachability probe fails withConnection refused, 100% of the time, blocking onboarding on every Docker Desktop WSL host.Root cause
NemoClaw chooses the gateway bind address and passes it to OpenShell via
OPENSHELL_BIND_ADDRESS. In host mode that value defaulted to127.0.0.1(src/lib/core/gateway-address.ts,src/lib/onboard/docker-driver-gateway-env.ts) with no Docker Desktop WSL branch. The containerized compatibility path already binds0.0.0.0for exactly this reason; the host-mode gateway (used on modern-glibc WSL distros) did not.Fix
Resolve the effective bind address with Docker Desktop WSL awareness:
NEMOCLAW_GATEWAY_BIND_ADDRESSoverride → bind0.0.0.0on Docker Desktop WSL so the host-gateway route reaches the gateway;127.0.0.1everywhere else.127.0.0.1back on).getGatewayConnectHostmaps0.0.0.0→127.0.0.1); only the listen surface widens, and the existing wildcard-bind warning now fires for the auto-widened case.The decision is computed in the gateway env layer so it flows consistently into the launch, the runtime drift identity/marker, the preflight port check, and the systemd env file (no drift-driven restart loop). Detection is memoized so onboard runs
docker infoat most once. To keep the detector out of source-level unit tests,wsl-docker-desktop-gpunow lazy-loads the Docker adapter only on an actual probe.Testing
docker-driver-gateway-env.test.tscover wildcard bind on Docker Desktop WSL, loopback on native Linux, explicit-override precedence in both directions, and the start-env / port-check / warning wiring.tsc -p tsconfig.src.jsonclean; gateway/onboard suites pass (remaining failures are pre-existing Windows-onlychmod/path tests, identical onmain).Fixes #5513.
Summary by CodeRabbit
Release Notes
New Features
NEMOCLAW_GATEWAY_BIND_ADDRESSoverride.Tests
Refactor
Signed-off-by: Abhimanyu Kumar abhimanyukumar7290@gmail.com