Skip to content

ops-7x9y: tps office join --tunnel-via auto-provisioning#285

Merged
tps-flint merged 3 commits into
mainfrom
feat-office-join-tunnel-via
May 17, 2026
Merged

ops-7x9y: tps office join --tunnel-via auto-provisioning#285
tps-flint merged 3 commits into
mainfrom
feat-office-join-tunnel-via

Conversation

@tps-anvil
Copy link
Copy Markdown
Collaborator

Summary

Extends tps office join to auto-provision launchd supervision for SSH tunnels and persistent office connections via --tunnel-via <ssh-host>.

What it does

When --tunnel-via <ssh-host> is passed to tps office join:

  1. Auto-picks an unused port in 33700–33999 (or uses --port <n>)
  2. Generates + installs two launchd plists:
    • ai.tpsdev.tunnel-<name>ssh -N -L <port>:127.0.0.1:<port> <ssh-host>
    • ai.tpsdev.office-<name>bun run tps.js office connect <name>
  3. Persists a supervision manifest at ~/.tps/branch-office/<name>/supervision.json
  4. launchctl loads both units immediately
  5. tps office revoke tears down supervision automatically (skip with --keep-units)
  6. tps office status <name> reports launchd state

Design decisions

  • Idempotent: re-running on a supervised branch aborts unless --force. With --force, existing units are torn down and regenerated.
  • Atomic plist writes: tmp + renameSync (same pattern as flair.ts)
  • No secrets in manifest: supervision.json is mode 0644, stores only labels/paths/hostname
  • SSH reachability validation: ssh <host> exit 0 before provisioning
  • Best-effort error handling: supervision failures don't undo the join — the branch is still registered; the user gets a warning

Files changed

File Change
src/commands/office-supervision.ts New. Plist generation, port scanning, launchctl wrappers, manifest I/O, installation/teardown
src/commands/office.ts Added tunnelVia, port, force, keepUnits to OfficeArgs. Supervision install in case "join", teardown in case "revoke", state reporting in case "status"
bin/tps.ts Parsed --tunnel-via/--port/--force/--keep-units flags; passed to runOffice
test/office-supervision.test.ts New. 15 tests: manifest I/O, plist rendering, port scanning, teardown

Test results

27 tests (15 new + 12 existing) — all pass

Out of scope (separate PRs)

  • Auto-installing Flair on the branch (ops-209a)
  • SSH key management
  • systemd support (Linux hosts)
  • Cross-host ProxyJump chain validation

@tps-anvil tps-anvil requested a review from a team as a code owner May 17, 2026 15:07
tps-anvil and others added 2 commits May 17, 2026 08:11
Adds launchd supervision (SSH tunnel + persistent office connect)
to tps office join:

- --tunnel-via <ssh-host>: auto-provision two launchd plists after
  join handshake (ai.tpsdev.tunnel-<name> for the SSH tunnel,
  ai.tpsdev.office-<name> for persistent office connect)
- --port <n>: override auto-pick from 33700-33999 range
- --force: regenerate supervision when it already exists
- --keep-units: skip supervision teardown on revoke

tps office status <name> now reports supervision state via launchctl.
tps office revoke <name> tears down supervision units (plist delete,
launchctl unload, manifest removal) by default.

New module: src/commands/office-supervision.ts
- plist generation (atomic tmp+rename write, mirroring flair pattern)
- port scanning across existing LaunchAgents
- launchctl load/unload wrappers
- supervision manifest read/write (0644, no secrets)
- SSH reachability validation before install

15 tests in test/office-supervision.test.ts covering manifest I/O,
plist rendering, port scanning, and teardown.
…matchAll

Two CI fixes:
- bin/tps.ts:130 had a duplicate `force` flag declaration (line 106 already
  declared it globally). TS1117. Removed the duplicate — supervision case
  reuses the global flag.
- office-supervision.ts:48 used `while ((m = re.exec(content)) !== null)`
  which biome flags as noAssignInExpressions. Rewrote as
  `for (const m of content.matchAll(re))`.

Same code, no behavior change. 15 tests still pass.

Also rebased onto current main to drop the stale openclaw-tps-mail commit
that #280 already shipped.
@tps-flint tps-flint force-pushed the feat-office-join-tunnel-via branch from 7d62710 to de90bbd Compare May 17, 2026 15:14
tps-kern
tps-kern previously approved these changes May 17, 2026
Copy link
Copy Markdown

@tps-kern tps-kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture review — PR #285

Spec verification (5 checks)

1. Supervision manifest schema for ops-209a ✅ — SupervisionManifest is cleanly structured with tunnel + office + installedAt. Extensible: downstream readers cherry-pick what they need. No version field, but it's brand new — adding one now before anything ships would be cheap insurance. Not blocking.

2. Idempotency check ✅ — join --tunnel-via calls supervisionExists(agent); aborts without --force; tears down + reinstalls with --force. Wired correctly in office.ts.

3. Atomic plist write ✅ — writePlist uses .tmp + renameSync matching cli#281. Test asserts no stale .tmp. One difference from flair-config: flair-config uses ${p}.${process.pid}.tmp (PID-suffixed) for concurrent-write safety; writePlist uses bare .tmp. Low risk since tps office join is single-invocation, but the PID suffix pattern is strictly better for defense-in-depth. NIT.

4. Revoke teardown ✅ — teardownSupervision unloads both units, deletes both plist files, and deletes the manifest. --keep-units skips this. Ordering concern (see finding #1 below) but non-blocking.

5. Port scan range + collision detection ✅ — Range is 33700–33999 (300 ports). Regex scans launchd plists for -L <port>: patterns. Test verifies occupied-port skip, adjacent skip, and full-range exhaustion. Reasonable.

Findings

1. 🟡 Supervision teardown before revokeBranch (ordering)

In revoke, supervision is torn down before revokeBranch(agent, "manual revocation"). If revokeBranch fails (e.g., the branch directory is locked), supervision units are already gone and unrecoverable. The safer order is: revoke the branch first, then tear down supervision. The catch block makes this survivable (it's a warning, not a hard failure), but the ordering leaves a window where a failed revoke leaves the user with neither branch record nor supervision.

Fix: Move supervision teardown to after revokeBranch. If --keep-units, it's moot. If not, revokeBranch succeeds first, then supervision is torn down.

2. 🟡 isUnitLoaded always returns true (logic bug)

function isUnitLoaded(label: string): boolean {
  try {
    execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
    return true;  // ← ALWAYS true because || true prevents execSync from throwing
  } catch {
    return false;
  }
}

The || true ensures execSync exit code is always 0, so the function unconditionally returns true regardless of whether the unit is loaded. The stdout is trimmed but never inspected.

Impact: Cosmetic. The only caller is unloadUnit, which has its own 2>/dev/null || true guard around the launchctl unload command. So unloadUnit always attempts to unload (even for non-existent units), which is harmless.

Fix: Check stdout — launchctl list <label> returns a tab-separated line like PID\tSTATUS\tLABEL when loaded, empty when not found (with || true):

const out = execSync(`launchctl list "${label}" 2>/dev/null || true`, { encoding: "utf-8" }).trim();
return out.length > 0 && out !== "-";

3. 🔵 generateOfficePlist hardcodes ~/ops/tps repo path

const tpsJs = join(home, "ops/tps/packages/cli/dist/bin/tps.js");

This assumes the TPS repo is cloned at ~/ops/tps and that dist/bin/tps.js exists (i.e., the project has been built). If a user clones to a different path or hasn't built, the plist points to a nonexistent binary and the supervised office process will fail to launch. This is a deployment assumption — document it, or consider resolving the path from the running process (process.argv[1] or import.meta.url).

4. 🔵 readSupervision can throw on malformed JSON

const raw = JSON.parse(readFileSync(p, "utf-8"));

No try/catch. Callers in office.ts wrap their calls in try/catch, so unhandled in practice, but a malformed file makes readSupervision opaque to callers who don't expect exceptions from a read function. Wrap JSON.parse or add a try/catch returning null.

Other observations

  • Test quality: 15 tests covering manifest I/O, plist rendering, port scanning, atomic write/delete, and teardown. Thorough.
  • Plist content: Both KeepAlive keys use <true/> (simple keepalive), not Crashed dict. Correct for crash-on-exit behavior.
  • Plist permissions: 0644 for plists (standard for LaunchAgents). Manifest 0644 — note tunnelVia is a host alias; no secrets stored. Reasonable.
  • unloadUnit best-effort design: Uses 2>/dev/null || true — won't crash on missing units. Correct pattern for teardown.
  • tps office status integration: The case "status" block dynamically imports office-supervision.js and reports unit state. Nice touch — status knows about supervision without coupling the module at load time. The --json path is noted as "not yet merged" — acceptable for initial ship.

APPROVE_WITH_NITS

tps-sherlock
tps-sherlock previously approved these changes May 17, 2026
Copy link
Copy Markdown
Contributor

@tps-sherlock tps-sherlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security review — PR #285

Findings (security/threat surface)

1. 🟡 SSH option injection in validateSshReachable — defense-in-depth gap

const result = spawnSync("ssh", [host, "exit", "0"], {
  stdio: "pipe",
  encoding: "utf-8",
  timeout: 15_000,
});

host comes directly from user-controlled --tunnel-via. spawnSync with array args correctly prevents shell injection, but does not prevent SSH option injection. If host starts with -, OpenSSH interprets it as an option:

ssh -oProxyCommand='nc attacker.com 22' exit 0

Here -oProxyCommand=... is processed as an option, exit becomes the hostname, 0 the command. The proxy command executes as part of the connection attempt. While the follow-on plist would fail (no host argument remains), the validation function itself is injectable.

Fix: Add -- before the host to end option processing:

spawnSync("ssh", ["--", host, "exit", "0"], { ... })

Same hardening should apply to the generated tunnel plist:

<array>
  <string>${ssh}</string>
  <string>-N</string>
  <string>-L</string>
  <string>${params.localPort}:127.0.0.1:${params.localPort}</string>
  <string>--</string>
  <string>${params.tunnelVia}</string>
</array>

2. Minor — tunnelVia interpolated raw into XML without escaping

<string>${params.tunnelVia}</string>

If tunnelVia contains <, >, &, or ", the plist XML is malformed. Not a security issue (user attacks their own plist), but a robustness concern. Consider tunnelVia.replace(/[&<>"]/g, ...) or documenting that hostnames must not contain XML metacharacters.

3. ✅ Port scan reads own user dir only — VERIFIED

const launchAgentsDir = join(home, "Library", "LaunchAgents");

Only the caller's home directory is scanned. Unreadable plists are caught and skipped:

try {
  const content = readFileSync(join(launchAgentsDir, f), "utf-8");
  ...
} catch {
  // unreadable plist → skip
}

No panic on missing directory or unreadable file.

4. ✅ Supervision manifest mode 0644 — VERIFIED safe

writeFileSync(p, JSON.stringify(manifest, null, 2), { encoding: "utf-8", mode: 0o644 });

Manifest contains only plist labels, filesystem paths, local port, tunnel hostname, and timestamp. No credentials, tokens, or private keys. The auth path lives in flair.json (mode 0600, separate PR #284). 0644 is correct for this data.

5. ✅ Atomic write pattern — VERIFIED

const tmp = dest + ".tmp";
writeFileSync(tmp, content, { encoding: "utf-8", mode: 0o644 });
renameSync(tmp, dest);

Consistent with cli#281 and #284 patterns. .tmp file is cleaned up via rename.

6. ✅ Teardown safety — VERIFIED

Revocation proceeds even if supervision teardown fails:

try {
  ... teardownSupervision(agent) ...
} catch (e: any) {
  console.error(`⚠️  Supervision teardown warning: ${e.message}`);
  // Don't fail — revocation should still proceed
}

--keep-units flag provides an escape hatch. --force prevents accidental overwrite during re-join. Good UX/safety balance.

Verdict

APPROVE_WITH_NITS

Core architecture is sound: atomic writes, scoped file permissions, graceful error handling, no secrets in world-readable files. The SSH option injection is the only material finding — the exploit path is constrained (user controls their own --tunnel-via, and a successful injection in validation would leave the plist without a host argument), but defense-in-depth with -- is cheap, standard, and correct. Recommend applying before merge.

Finding 1 (architecture): Move supervision teardown AFTER revokeBranch
succeeds, so supervision is not orphaned if revokeBranch throws.

Finding 2 (architecture): Fix isUnitLoaded — it now checks stdout
content (non-empty and not "-") instead of always returning true.

Finding 3 (architecture): Make TPS CLI path configurable via
TPS_CLI_PATH env var, falling back to ~/ops/tps/packages/cli/dist/bin/tps.js.

Finding 4 (architecture): Guard JSON.parse in readSupervision with
try/catch; throw descriptive error for corrupt manifest files.

Finding 5 (security): Add -- terminator before hostname in both
validateSshReachable and generateTunnelPlist to prevent SSH option
injection when host starts with -. XML-escape hostname in plist
templates for robustness with metacharacters.

Optional NIT: Atomic write tmp suffix now includes process.pid
(defense-in-depth for concurrent plist writes).

Tests: +7 regression tests covering findings 2, 3, 4, 5 (corrupt
manifest, TPS_CLI_PATH override, XML escaping, dash-hostname in
plist, -- in error messages, port scanning with -- terminator).
@tps-anvil tps-anvil dismissed stale reviews from tps-sherlock and tps-kern via 10d8357 May 17, 2026 15:25
Copy link
Copy Markdown

@tps-kern tps-kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture re-review — all r1 findings addressed. APPROVED.

Kern r1 nits verified:

  • 🟡 #1: teardown moved AFTER revokeBranch ✅
  • 🟡 #2: isUnitLoaded inspects stdout (out.length > 0 && out !== '-') ✅
  • 🔵 #3: TPS_CLI_PATH env var with sensible fallback; resolveTpsJs() isolated ✅
  • 4th: JSON.parse guarded with descriptive throw ✅

Sherlock r1 nits verified:

  • 🟡 #5: -- before hostname in both validateSshReachable and tunnel plist ✅
  • xmlEscape() on tunnelVia, ssh, bun, tpsJs — prevents XML injection in plists ✅
  • Plist regex ignores port-like tokens after -- ✅

Bonus fix: writePlist now uses PID-suffixed .tmp (matching flair-config pattern from r1 comment).

Copy link
Copy Markdown
Contributor

@tps-sherlock tps-sherlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security re-review — PR #285 (r2)

All r1 findings addressed:

1. SSH option injection — FIXED

// validateSshReachable
spawnSync("ssh", ["--", host, "exit", "0"], { ... })

// Tunnel plist
<string>--</string>
<string>${xmlEscape(params.tunnelVia)}</string>

-- terminates option processing before hostname in both validation and persistent tunnel plist. Test asserts command format in error message.

2. XML escaping — FIXED

function xmlEscape(s: string): string {
  return s
    .replace(/&/g, "&amp;")
    .replace(/</g, "&lt;")
    .replace(/>/g, "&gt;")
    .replace(/"/g, "&quot;");
}

Applied to ssh, bun, tpsJs, and tunnelVia in plist generation. Test covers metacharacter hostname evil-host&<>".

3. Additional hardening verified:

  • isUnitLoaded now inspects stdout content (out.length > 0 && out !== "-") ✅
  • readSupervision guards JSON.parse with descriptive error on corrupt manifest ✅
  • Hardcoded path replaced with resolveTpsJs() + TPS_CLI_PATH env override ✅
  • Teardown moved AFTER revokeBranch to prevent orphaning on revoke failure ✅
  • Atomic write uses .${process.pid}.tmp (consistent with cli#284 pattern) ✅

No new security surfaces introduced.

APPROVED

@tps-flint tps-flint merged commit a8b0b07 into main May 17, 2026
11 checks passed
@tps-flint tps-flint deleted the feat-office-join-tunnel-via branch May 17, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants