fix(onboard): stop the gateway when its sandbox-bridge probe fails (#5513)#5536
fix(onboard): stop the gateway when its sandbox-bridge probe fails (#5513)#5536abhi-0906 wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughAdds an optional ChangesGateway orphan cleanup on sandbox-bridge unreachability
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2193-2206: The src/lib/onboard.ts file has exceeded the growth
budget with a net addition of 17 lines, blocking the CI guardrail. To fix this,
reduce the net additions by either condensing the lengthy inline comment block
(the "// `#5513`:" explanation) that precedes the stopGatewayStartedDuringOnboard
function definition, or move the helper function and its supporting
documentation to a separate utility file. Focus on compressing the explanatory
text while preserving the essential logic and functionality of the
stopGatewayStartedDuringOnboard implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 841e023b-2b23-41d5-9288-3a50f900392a
📒 Files selected for processing (3)
src/lib/onboard.tssrc/lib/onboard/gateway-sandbox-reachability.test.tssrc/lib/onboard/gateway-sandbox-reachability.ts
…VIDIA#5513) When onboard's [2/8] sandbox-bridge reachability probe fails, NemoClaw aborts via process.exit(1) without stopping the OpenShell gateway it started (or reused/adopted) earlier in the same run. The gateway is left running, bound to the loopback address, so the accompanying "restart Docker and re-run" hint is misleading: the stale listener survives a Docker restart and collides with the next attempt. Add an onUnreachable hook to verifySandboxBridgeGatewayReachableOrExit that fires only on a genuine unreachable result (not the soft probe_unavailable skip or a successful probe), and wire it at the three host-mode gateway paths in startDockerDriverGateway (fresh start, reuse, adopt) to tear the gateway down via the existing stopDockerDriverGatewayProcess(). That helper reads the pid file written in every path and only terminates a verified gateway process, so it is a safe no-op otherwise. Follow-up to the bind-address fix in NVIDIA#5534: that change keeps the probe from failing on Docker Desktop WSL in the first place, while this ensures any genuine probe failure no longer orphans the gateway. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
8aa9e98 to
bb23247
Compare
Summary
Follow-up to #5534 (the Docker Desktop WSL gateway bind fix). When onboard's
[2/8]sandbox-bridge reachability probe fails, NemoClaw aborts viaprocess.exit(1)without stopping the OpenShell gateway it started (or reused/adopted) earlier in the same run. The gateway is left running, bound to the loopback address — so the accompanying "restart Docker and re-run" hint is misleading: the stale listener survives a Docker restart and collides with the next attempt (the orphaned-gateway symptom called out in #5513).Fix
onUnreachablehook toverifySandboxBridgeGatewayReachableOrExitthat fires only on a genuine unreachable result — not the softprobe_unavailableskip, and not a successful probe.startDockerDriverGateway(fresh start, reuse, adopt) to tear the gateway down via the existingstopDockerDriverGatewayProcess(). That helper reads the pid file written in every path and only terminates a verified gateway process, so it is a safe no-op otherwise.Relationship to #5534: #5534 stops the probe from failing on Docker Desktop WSL in the first place (bind
0.0.0.0); this PR ensures that any genuine probe failure — on any host — no longer orphans the gateway, and makes the "re-run onboard" guidance accurate.Testing
gateway-sandbox-reachability.test.ts:onUnreachablefires on a genuine unreachable probe, and not on a successful probe or a softprobe_unavailableresult.tsc -p tsconfig.src.jsonclean; reachability + onboard-runtime suites pass (remaining failures are pre-existing Windows-only path / CLI-not-found tests, identical onmain).Refs #5513.
Summary by CodeRabbit
Bug Fixes
Tests
Signed-off-by: Abhimanyu Kumar abhimanyukumar7290@gmail.com