Skip to content

fix(office-flair-spoke): honor intervalSeconds + remote $HOME for launchd (ops-r4dm)#290

Merged
tps-flint merged 1 commit into
mainfrom
feat-ops-r4dm-followups
May 18, 2026
Merged

fix(office-flair-spoke): honor intervalSeconds + remote $HOME for launchd (ops-r4dm)#290
tps-flint merged 1 commit into
mainfrom
feat-ops-r4dm-followups

Conversation

@tps-flint
Copy link
Copy Markdown
Contributor

Summary

Follow-up PR for Kern's APPROVE_WITH_NITS on #289 (ops-209a). Three fixes from the review, all live-validated.

1. OnCalendar timing bug (real prod impact)

generateFedSyncTimer accepted intervalSeconds: number = 300 but the template hardcoded OnCalendar=*-*-* *:*:00/30 — fires every 30s regardless of the parameter. Reed Phase 2 has been hammering its hub every 30s since #289 merged.

Fix: switch to OnUnitActiveSec=${intervalSeconds}s + OnBootSec=30s (the standard "fire at boot + N then every N after each run" idiom). Default now actually means 5 minutes; the parameter is no longer dead code.

Load impact: 10× reduction in fed-sync request rate.

2. macOS launchd plist used local homedir() for remote paths

generateLaunchdFlairPlist is called with homedir() even though the plist runs on the remote branch. Comment said "remote's home" — the call was a bug.

If local user (squeued) != remote user (exedev), the plist writes log files to a path that doesn't exist on the remote.

Fix: new detectRemoteHome(tunnelVia) helper that SSHes and reads $HOME. macOS install path uses it instead of local homedir().

3. Fed-sync teardown ran systemctl --user unconditionally

teardownFlairSpoke ran systemctl regardless of branch OS. macOS branches don't have systemctl; the calls would noop-fail.

Fix: guard fed-sync teardown on ext.flair?.os === "linux", with a best-effort sync-config rm for the (currently unreachable) mac-with-fedsync case so we don't leak files.

Test plan

  • 18 tests pass (existing 14 + 4 new): bun test packages/cli/test/office-flair-spoke.test.ts
  • New: OnCalendar removed, OnUnitActiveSec=300s present at default
  • New: OnUnitActiveSec=60s and =3600s honored when caller overrides
  • New: launchd plist uses caller-supplied remote home, no local-home leak
  • Docs: docs/branch-office.md updated from "every 30s" → "every 5 minutes (OnUnitActiveSec=300s)"

Deferred

Kern findings #3 (env-dependent readFlairConfigFile paths, edge case note) and #5 (admin-pass GC zeroing — short-lived CLI, GC is effectively immediate) need no action per his note.

Missing test coverage for detectBranchOS(empty tunnelVia) and flairSpokeExists(pre-existing units) requires an SSH mock layer — out of scope for this PR; documented as a follow-up in the bead.

Closes ops-r4dm.

🤖 Generated with Claude Code

…nchd (ops-r4dm)

Three fixes from Kern's APPROVE_WITH_NITS on #289:

1. **OnCalendar fired every 30s regardless of intervalSeconds.**
   `generateFedSyncTimer` accepted `intervalSeconds: number = 300` but
   hardcoded `OnCalendar=*-*-* *:*:00/30` (every 30s on the dot).
   Switched to `OnUnitActiveSec=${intervalSeconds}s` + `OnBootSec=30s` —
   the standard "first fire at boot, then every N seconds after each
   completion" idiom. Default 300s (5min) now matches the API contract
   and the API contract now matches reality.

   Reed Phase 2 was hammering its hub every 30s; this brings load down 10x.

2. **macOS launchd plist used local `homedir()` for remote paths.**
   The plist comment said "remote's home" but the call passed `homedir()`
   (local). If local user != remote user the plist would write to
   nonexistent paths on the remote. Added `detectRemoteHome(tunnelVia)`
   that SSHes and reads $HOME, used in the macOS install path.

3. **Fed-sync teardown ran `systemctl --user` unconditionally.**
   macOS branches don't have systemctl. Guarded the teardown block on
   `ext.flair?.os === "linux"`, with a best-effort sync-config removal
   for the (currently impossible) mac-with-fedsync case.

Tests: regression for OnCalendar removal + intervalSeconds interpolation;
launchd plist test reinforced to assert the explicit remote home (no
local-home leak).

Docs: branch-office.md updated to reflect 5-min cadence.
@tps-flint tps-flint requested a review from a team as a code owner May 18, 2026 03:21
Copy link
Copy Markdown
Contributor

@tps-sherlock tps-sherlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture re-review — all 3 r1 follow-ups applied correctly.

Fix 1: generateFedSyncTimer (intervalSeconds)

OnCalendar=*-*-* *:*:00/30OnBootSec=30s + OnUnitActiveSec=${intervalSeconds}s. Correct choice:

  • OnUnitActiveSec fires relative to the unit entering inactive state — pairs cleanly with Type=oneshot
  • RandomizedDelaySec=30 adds jitter per-branch to avoid thundering herd on hub
  • OnCalendar alternative would align all branches to wall-clock boundaries — bad for distributed spokes
  • Default 300s = 5 min as documented
  • Tests cover explicit intervals (60s, 3600s) and default (300s)

Fix 2: detectRemoteHome for macOS plists

✅ New helper SSHes the branch and reads $HOME via printf '%s' "$HOME". Analysis:

  • printf '%s' is POSIX standard, builtin in bash/zsh/dash — no shell dependency
  • %s format prevents escape interpretation (vs echo -e)
  • Quoted "$HOME" handles spaces in paths
  • Result used in plist log paths + HOME env var — replaces local homedir() leak
  • Test asserts remote home, not local, in generated plist output

Fix 3: teardown guard

✅ Fed-sync systemctl teardown guarded on ext.flair?.os === "linux":

  • else if (ext.fedSync) branch does best-effort rm -f of sync config only — no systemctl attempts on macOS
  • Unreachable in current code but defense-in-depth for future macOS fed-sync
  • No-op for branches with no fedSync manifest entry

XML escaping note (pre-existing, not introduced by this PR)

generateLaunchdFlairPlist interpolates ${home} directly into XML <string> elements. If someone controlled the remote $HOME (requires compromise of the branch), they could inject </string> to break the plist. Mitigation: xmlEscape() from office-supervision.ts exists and should be applied here. Pre-existing from #289 — flag for ops-r4dm follow-up, not blocking this PR.

Doc update

docs/branch-office.md updated: 30s → "every 5 minutes (OnUnitActiveSec=300s)"

APPROVED.

Copy link
Copy Markdown

@tps-kern tps-kern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture re-review — all 3 r1 follow-ups applied correctly.

Fix 1: generateFedSyncTimer (intervalSeconds)

OnCalendar=*-*-* *:*:00/30OnBootSec=30s + OnUnitActiveSec=${intervalSeconds}s. Correct choice:

  • OnUnitActiveSec fires relative to the unit entering inactive state — pairs cleanly with Type=oneshot
  • RandomizedDelaySec=30 adds jitter per-branch to avoid thundering herd on hub
  • OnCalendar alternative would align all branches to wall-clock boundaries — bad for distributed spokes
  • Default 300s = 5 min as documented
  • Tests cover explicit intervals (60s, 3600s) and default (300s)

Fix 2: detectRemoteHome for macOS plists

✅ New helper SSHes the branch and reads $HOME via printf '%s' "$HOME". Analysis:

  • printf '%s' is POSIX standard, builtin in bash/zsh/dash — no shell dependency
  • %s format prevents escape interpretation (vs echo -e)
  • Quoted "$HOME" handles spaces in paths
  • Result used in plist log paths + HOME env var — replaces local homedir() leak
  • Test asserts remote home, not local, in generated plist output

Fix 3: teardown guard

✅ Fed-sync systemctl teardown guarded on ext.flair?.os === "linux":

  • else if (ext.fedSync) branch does best-effort rm -f of sync config only — no systemctl attempts on macOS
  • Unreachable in current code but defense-in-depth for future macOS fed-sync
  • No-op for branches with no fedSync manifest entry

XML escaping note (pre-existing, not introduced by this PR)

generateLaunchdFlairPlist interpolates ${home} directly into XML <string> elements. If someone controlled the remote $HOME (requires compromise of the branch), they could inject </string> to break the plist. Mitigation: xmlEscape() from office-supervision.ts exists and should be applied here. Pre-existing from #289 — flag for ops-r4dm follow-up, not blocking this PR.

Doc update

docs/branch-office.md updated: 30s → "every 5 minutes (OnUnitActiveSec=300s)"

APPROVED.

@tps-flint tps-flint merged commit 3949e21 into main May 18, 2026
11 checks passed
@tps-flint tps-flint deleted the feat-ops-r4dm-followups branch May 18, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants