Conversation
Today AI clients (Claude Code, Codex, Cursor, …) running shell commands
go through their built-in Bash tool — invisible to the user. The user
asked for shell work to happen in their actual BossTerm window while
the agent still receives stdout/stderr + exit code, so they can watch
what's running and the agent isn't blind.
New write tool mcp__bossterm__run_command, blocking variant of
run_in_panel:
- Spawns (or reuses) a visible BossTerm pane, writes the script, waits
for OSC 133;D, returns {exitCode, output, durationMs, paneId,
truncated, error}.
- Pane reuse: first call splits below the focused pane (or whatever
mcpRunCommandDefaultPanel says); subsequent calls in the same tab
reuse that pane. The cache (tabId → paneId) is guarded by a per-tab
Mutex so two concurrent calls can't both miss the cache and orphan
a pane. A per-pane Mutex serializes overlapping calls into the same
pane's stdin.
- Output capture: absolute history-line tracking via the OSC 133;B and
;D events, so the slice stays correct when output scrolls into
history mid-command. ANSI-stripped via the emulator's per-line .text
accessor (same path read_scrollback uses).
- Alt-screen / TUI detection: polls textBuffer.isUsingAlternateBuffer
while waiting for D; on transition, completes with
error: "TUI detected …" instead of timing out. The process stays
alive so the caller can drive it via send_input + read_scrollback.
- Reserved tool: added to UNDISABLABLE_TOOLS so manage_tools rejects
attempts to hide it (otherwise the PreToolUse hook below would route
Bash to a tool that's no longer on the wire). applyDisabledSet also
filters reserved names defensively against hand-edits of
disabledMcpTools in settings.json.
Cross-client preference advertising:
- Server now passes a BOSSTERM_MCP_INSTRUCTIONS string into the SDK's
4-arg Server constructor. The MCP initialize response surfaces it
into every client's system prompt, so the "prefer run_command over
your built-in shell" guidance lands without per-project config.
Port marker file for cheap external probes:
- BossTermMcpManager atomically writes the actual bound port (after
the 7676→7685 fallback) to ~/.bossterm/mcp.port on every successful
bind, deletes it on clean stop. Lets the user-global PreToolUse hook
decide reachability with stat + nc -z (~5ms) instead of an HTTP
probe with timeout (~300ms worst case).
Settings:
- mcpRunCommandDefaultTimeoutMs (default 120_000, UI input)
- mcpRunCommandDefaultPanel (default horizontal_split, UI dropdown;
invalid values silently fall back to horizontal_split — never bubble
the user's bad setting up as a per-call error)
- mcpRunCommandMaxOutputBytes (default 120_000, advanced — sized to
fit under mcpMaxAnswerChars=150_000 with JSON-wrapper headroom)
- mcpRunCommandShellReadyTimeoutMs (default 1_500, advanced)
UI:
- Two new rows under "BossTerm MCP Server" (panel dropdown + timeout
input).
- ExposedToolsSection grays out reserved tools and gains a blurb
surfacing the always-exposed manage_tools meta-tool (which lives
outside BUILT_IN_READ/WRITE_TOOLS).
Docs:
- README intro updated; new "Using as Claude Code's default shell"
subsection pointing at the docs.
- docs/mcp-server.md: full run_command schema + response shape +
error contract, ~/.bossterm/mcp.port marker file note, initialize-
time instructions section, settings reference rows, "Using as
Claude Code's default shell" how-to (copy-paste hook script +
settings.json + CLAUDE.md), multi-instance + sudo + nc caveats in
Troubleshooting.
Verification:
- ./gradlew :compose-ui:compileKotlinDesktop --no-daemon clean.
- Hook script smoke-tested across all four branches (no marker /
dead port / live port / no nc).
- End-to-end (MCP Inspector + two concurrent run_command calls + real
Bash hooking) is still manual; the listener-before-send, alt-screen
detection, and cache lock can't be exercised without a live UI.
User-global config (~/.claude/CLAUDE.md, hooks/prefer-bossterm.sh,
settings.json hook entry) is per-user and lives outside this repo —
the docs walk through it.
Generated with [Claude Code](https://claude.com/claude-code)
Review of PR #262 —
|
…stered Today McpTerminalRegistry.primaryState() returns the first-registered window, so run_command / run_in_panel / get_active_tab without a tab_id target the wrong window whenever multiple BossTerm windows are open — the user sees commands run somewhere they aren't looking. The correct invariant is "the window the calling MCP client is running INSIDE." New file ProcessAncestry.kt resolves that window by: 1. Resolving the PID owning the loopback TCP socket on the client side (lsof on macOS; /proc/net/tcp + /proc/<pid>/fd on Linux). 2. Walking the parent process tree from that PID. 3. Matching the first ancestor PID against every tracked pane's shell PID across all registered states. 4. Returning the state owning that pane — or null if no ancestor matches (client running outside any BossTerm pane: Claude Desktop, external Inspector, CI…). BossTermMcpManager's Ktor interceptor calls the resolver on every incoming MCP request and records the result in McpTerminalRegistry.lastResolvedClientWindow (AtomicReference). The three tool handlers that previously used registry.primaryState() (get_active_tab, run_in_panel, run_command) now read the resolved client window first, with the existing primaryState() as fallback. Design notes: - Resolution failures are silent — caller falls back to first- registered. Never crashes a tool call on a parent-walk hiccup. - Concurrent multi-client racing is acceptable last-writer-wins for v1 — the dominant single-client case is consistent. - Resolver runs INSIDE the existing intercept block, AFTER the DNS-rebinding check, so a hostile request never triggers lsof. - Stale lastResolvedClientWindow entries (window closed) are detected lazily on read (state is no longer in `states` list) and cleared. Platform support: macOS + Linux. Windows and unknown platforms skip the resolver (resolveClientWindow returns null) and the server falls back to first-registered. Cost: ~50-150ms per request on macOS (two shell-outs per parent hop — lsof, then ps for each ancestor up to 4 hops deep), ~5-15ms on Linux (pure /proc reads). Docs: new "Caller-window resolution" subsection in mcp-server.md covering mechanism, scenarios, platform support, and the multi-client race caveat. Verification: - ./gradlew :compose-ui:compileKotlinDesktop --no-daemon clean. - Live end-to-end testing needs a restart of BossTerm (currently running binary is pre-change) and a multi-window scenario — manual, reviewer to confirm before merge. Generated with [Claude Code](https://claude.com/claude-code)
Follow-up review pass — ProcessAncestry and items the prior review did not surfaceThe previous review covered the main run_command correctness issues (hook script nc-missing branch, panel new_tab tab-vs-pane id confusion, multi-statement script slice, stale OSC 133;D from concurrent writers, UTF-16-chars vs bytes for maxOutputBytes, per-pane mutex leak, doc nit on 200 KB). Those still stand. This pass focuses on ProcessAncestry.kt and a couple of items that pass did not cover. ProcessAncestry — bugs and performance
Items the prior review did not surface
Test coverage — same as prior reviewStill no automated coverage for any of: slice math (history delta, wraparound), cache-race resolution, UNDISABLABLE_TOOLS filter, ProcessAncestry IPv6 / subprocess-timeout paths. The prior review three suggested unit tests would catch most of what is flagged here too. Net: lots of careful engineering (per-tab cache lock + per-pane mutex layering, absolute-history slice, sealed PaneResolution for exhaustive when, atomic marker write, defensive effectiveDisabled filter). The ProcessAncestry subprocess hang and per-request cost are the items most worth fixing before merge alongside the prior review P0/P1 list. Generated with Claude Code |
Today consecutive `run_in_panel` calls with `panel: "horizontal_split"` each split whichever pane currently has focus — so a second long-running script lands wherever the user happened to click last, often splitting the agent's own pane and pushing the prior MCP pane around. With three or four parallel fire-and-forget launches the layout becomes random and unpleasant. Anchor the new pane off the last MCP-created scratch pane instead of the focused pane: a horizontal_split splits the existing scratch pane *vertically* (right of it), and the new pane becomes the next anchor. Result: consecutive MCP scratch panes line up as a horizontal strip along the bottom of the tab, and the agent's primary pane keeps its real estate. New: `TabbedTerminalState.splitVerticalFromPane(tabId, anchorPaneId, ratio, initialCommand)`. Implemented by extending the existing `performSplit` helper with an optional `anchorPaneId` argument — when provided, `SplitViewState.setFocusedPane` aims focus at the anchor before the existing `splitFocusedPane` call runs, so the same code path handles both flavors. Anchor lookup degrades gracefully: if the cached paneId no longer resolves to a session (user closed the pane), the caller falls back to the original `splitHorizontal` (focused-pane) behavior. run_in_panel changes: - When `panel == "horizontal_split"` and `registry.getScratchPane(tab)` points at a still-live pane, route through `splitVerticalFromPane` to stack the new pane right of the anchor. - Every successful split now updates the scratch-pane cache to the new paneId, so the *next* call sees the new pane as its anchor (chains horizontally). Previously only run_command updated the cache; now run_in_panel does too, and the two tools share the same anchor for the same tab. `vertical_split` and `new_tab` keep their existing semantics — no behavior change for callers that explicitly opt out of the anchor stacking. run_command is intentionally untouched: it reuses the cached pane on the next call (no new pane created), so the anchor-stacking rule doesn't apply. Verification: - `./gradlew :compose-ui:compileKotlinDesktop --no-daemon` clean. - Manual (reviewer): two run_in_panel horizontal_split calls in a row should yield two panes side-by-side in the bottom row, not nested. Generated with [Claude Code](https://claude.com/claude-code)
Review of PR #262 —
|
P0 #1 — hook script in docs/mcp-server.md "Using as Claude Code's default shell" didn't match the actual hook file shipped to ~/.claude/hooks/ prefer-bossterm.sh: it falls through and emits the deny JSON when nc is missing, contradicting the doc's own "fails closed" prose. Match the shipped script's `if ! command -v nc; then exit 0; fi; nc -z … || exit 0` shape so a missing nc never routes Claude to an unverified port. P0 #2 — `panel: "new_tab"` in run_command was structurally broken. `state.createTab(...)` returns a TAB id, but the code stuffed it into `newPaneId`, cached `mcpScratchPanes[origTabId] = newTabId`, and returned `RunCommandResult(tabId = origTabId, paneId = newTabId)`. Consequences: the next call with the same `tab_id` would look up `findSession(origTab, newTab)` (which can't resolve — newTab isn't a pane in origTab's split tree), evict as stale, and create yet another fresh tab — so the new_tab mode never got its reuse. The response's tabId was also wrong. Refactor: - `PaneResolution.Hit` now carries an `effectiveTabId` field. Splits and cache-hits carry the input tab id forward; new_tab path overrides with the freshly-created tab's id (and uses `findSession(newTab, newTab)` for a single-pane lookup that actually resolves). - The cache writes / reads land on the new tab id when applicable, so `panel: "reuse"` follow-ups on the same effective tab actually reuse. - `RunCommandResult.tabId` is now `resolveResult.tabId` (the effective tab id), so callers passing it back land in the right tab. P0 #3 — Multi-statement scripts (raw "\n" inside `script`) corrupted the result. The shell fires B/D once per statement; my `onCommandStarted` unconditionally overwrote historyAtB/cursorYAtB, so by the time the awaiter wakes up the start mark might point at the SECOND statement while the exit code is from the first. Fix: guard the snapshot with an `ourCommandStarted` AtomicBoolean flipped via `compareAndSet(false, true)`. Only the first B after our write records markers; later statements' B events no-op. Also added a schema note in the docs telling callers to use `bash -lc '…'` for compound logic, since the captured exit code is still only the first statement's even with the fix. P0 #4 — Stale OSC 133;D from a prior writer on the same pane (a fire-and-forget run_in_panel script, a send_input keystroke, the user manually typing) could complete `finishedSignal` before our script even left the gate, returning that command's exit code with a garbled slice. The per-pane Mutex serializes against other run_command calls but not other writers on the pane. Fix: an AtomicBoolean `weHaveWritten` is flipped right BEFORE the `session.writeUserInput(...)` call. The listener ignores any B until `weHaveWritten` is set, and ignores any D until `ourCommandStarted` has been set (which only happens on a B we accepted). Together they gate the entire B/D cycle to OUR command. Captured-var visibility across threads is handled via atomics — captured `var`s become kotlin.jvm.internal.Ref whose .element field isn't volatile. P1 #5 — `mcpRunCommandMaxOutputBytes` was a misnomer: `sb.length` and `line.length` are Kotlin String lengths (UTF-16 code units), not UTF-8 bytes. For ASCII it matches; for emoji-heavy output the real byte count can be 4× the configured value. Renamed everywhere to `mcpRunCommandMaxOutputChars` and updated the kdoc to spell out the unit choice (apples-to-apples with the also- char-counting `mcpMaxAnswerChars`). Settings.json entries set under the old name silently revert to the default — acceptable for a setting that shipped on the same dev branch. P1 #6 — `paneMutexes` only sheds entries on the cache-miss-with-stale- pane path. If a user closes the cached pane and never re-runs against that tab, the mutex sits forever. Bounded by pane-creation rate but unbounded over long-lived sessions. Added opportunistic GC in `McpTerminalRegistry.setScratchPane`: when the mutex map exceeds 32 entries, walk every state's tabs + pane snapshots, collect live ids, evict everything else. Naturally batches behind real new-pane work; no listener wiring needed. Also sheds matching stale `mcpScratchPanes` entries while the sweep is hot. Build: `./gradlew :compose-ui:compileKotlinDesktop --no-daemon` clean. Generated with [Claude Code](https://claude.com/claude-code)
Review — PR #262
|
Summary
mcp__bossterm__run_commandthat runs a shell command in a visible BossTerm pane and returns its stdout/stderr, exit code, and duration. AI clients (Claude Code, Codex, Cursor) now have a default shell that the user can watch in their real terminal — no more invisible Bash calls.instructionsfield — every spec-compliant client sees "preferrun_commandover your built-in shell" in its system prompt at handshake, no per-project config needed.~/.bossterm/mcp.portmarker file written atomically on every successful bind, deleted on clean stop — lets external hooks (e.g. a Claude Code PreToolUse hook) decide reachability with a 5 msstat + nc -zinstead of a 300 ms HTTP probe.What's in the box
Server tool (
BossTermMcpServer.registerRunCommand):pane_id> cached "MCP scratch pane" for the tab > newly-created viasplitHorizontal/splitVertical/createTab.Mutexso two concurrent calls can't both miss the cache and orphan a pane.Mutexso two concurrent calls into the same pane can't interleave their scripts in the shell's stdin.CommandStateListenerBEFORE the script is written so OSC 133;A/B/D are all captured.isUsingAlternateBufferwhile waiting for D. On transition, returnserror: "TUI detected"and leaves the process alive so the caller can drive it viasend_input+read_scrollback.UNDISABLABLE_TOOLS):manage_toolsrejects attempts to hide it, andapplyDisabledSetdefensively filters reserved names against hand-edits ofdisabledMcpToolsinsettings.json.Settings (
TerminalSettings.kt,McpSettingsSection.kt):mcpRunCommandDefaultTimeoutMs(120_000, UI input).mcpRunCommandDefaultPanel("horizontal_split", UI dropdown; invalid values silently fall back tohorizontal_splitso a typo'd setting never bubbles up as a per-call error).mcpRunCommandMaxOutputBytes(120_000, advanced — sized to fit undermcpMaxAnswerChars=150_000with JSON-wrapper headroom so the response-shortening ladder never has to truncaterun_commandoutput).mcpRunCommandShellReadyTimeoutMs(1_500, advanced).manage_tools" blurb so users see that tool exists even though it's not in the toggle list.Docs (
README.md,docs/mcp-server.md):run_commandsection: schema, response shape, error contract (TUI / timeout / shell-integration missing).~/.bossterm/mcp.portmarker file note + initialize-time instructions section.~/.claude/hooks/prefer-bossterm.sh),settings.jsonsnippet, optionalCLAUDE.mdaddendum.sudo/ missingnc.Rigorous review pass
Three independent review agents (server-side, settings/UI, integration/hook) audited the implementation before this commit. They surfaced 3 P0 bugs, 5 P1 issues, 4 P2 doc/UX items, and verified 11 items as correct. All P0 and P1 findings are addressed here:
ncis missing — previously it would emit deny without a probe and could route Claude to a dead port.Mutexprevents the concurrent-call orphan-pane race.mcpRunCommandDefaultPanel(e.g."reuse") silently falls back tohorizontal_split; error messages reference both the per-callpanelarg and the setting.clearScratchPane(tabId, paneId?); gradle-build kdoc note added; shell-ready=0semantics tightened.What's NOT in this PR
~/.claude/CLAUDE.md,~/.claude/hooks/prefer-bossterm.sh,~/.claude/settings.jsonhook entry) — per-user, lives outside the repo. The docs walk through wiring them up.Test plan
./gradlew :compose-ui:compileKotlinDesktop --no-daemon— BUILD SUCCESSFUL.nc— all PASS.ls— pane should appear in the active tab and output returns to Claude.run_commandcalls in parallel from MCP Inspector against the same tab — confirm only ONE new pane spawns (cache race fix).run_commandwithscript: "vim\n"— expecterror: "TUI detected", pane stays alive.~/.bossterm/mcp.portis gone.~/.bossterm/settings.jsonto"mcpRunCommandDefaultPanel": "reuse"— firstrun_commandcall should silently fall back tohorizontal_splitinstead of erroring.🤖 Generated with Claude Code