Skip to content

feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116

Open
marton78 wants to merge 19 commits into
dirac-run:masterfrom
marton78:acp-fixes
Open

feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116
marton78 wants to merge 19 commits into
dirac-run:masterfrom
marton78:acp-fixes

Conversation

@marton78

Copy link
Copy Markdown
Contributor

Summary

This branch implements and stabilises the ACP (Agent Communication Protocol) session layer, taking the server from a bare skeleton to a fully functional ACP endpoint. The branch builds on top of fix-tests (PR #115) and contains 11 logical commits.


1. feat(acp) — acp-probe CLI script for end-to-end testing

Adds cli/scripts/acp-probe.mts, a standalone ACP client that drives the --acp server without an editor. Speaks JSON-RPC 2.0 over stdin/stdout — same transport path as Zed.

  • prompt subcommand: session/create → session/prompt, streams updates, exits on stopReason
  • load-prompt subcommand: loadSession → session/prompt, exercises the session continuation path
  • --pre-set key=value: apply session config (e.g. mode, apiProvider) before the prompt
  • --approve-delay-ms: configurable delay before auto-approving permissions
  • DIRAC_ACP_DEBUG=1: log raw JSON-RPC to stderr

2. fix(acp) — init refactor: shared helper, ErrorService, surface I/O errors

Extracts cli/src/initCoreServices.ts called by both ACP and interactive paths. Surfaces init failures (missing API key, bad config path, etc.) as JSON-RPC errors instead of silent EOF. Adds test coverage for the init-failure path.


3. fix(acp) — robust turn lifecycle: error paths, error semantics, cancel, retry cleanup

  • Terminate every error path: unhandled runPromise rejections now resolve the pending prompt via handleUnhandledHandlerError() instead of hanging. Exposes task.runError from core.
  • Errors as failed tool_call: failures emit tool_call("error") + tool_call_update(status:"failed") instead of looking like model text.
  • session/cancelstopReason:"cancelled": as required by the ACP spec.
  • Retry/terminal-failure state reset at the start of each new turn.

4. feat(acp) — session mode picker with per-session Plan/Act/Auto/YOLO overrides

session/setMode RPC lets clients switch modes per session. Auto-approve and YOLO flags are now stored in a per-session SessionOverrides map in StateManager instead of global state — no more cross-session leakage.


5. fix(acp) — atomic cancel flag claim and listener leak on task reinit

  • Cancel atomicity: cancel() claims pending.resolved.value = true synchronously before await cancelTask() so no concurrent path can steal the slot.
  • Listener leak: diracMessagesChanged listener now captures the task reference once at registration, not via controller.task which can be replaced by cancelTask().

6. fix(acp) — file edits apply after permission approval

ACPDiffViewProvider.saveChanges() was never called on approval, so edits were silently discarded. Fixed.


7. fix(acp) — loadSession continuity: sessionId as taskId, history replay, user messages, timing fix

  • sessionId = taskId: tasks are now findable after restart. New acp-session-tasks.ts helper; dead code in backfill.ts / utils.ts removed.
  • History replay on loadSession: re-emits all prior tool_call / agent_message_chunk events so reconnecting clients can reconstruct state.
  • User messages in replay: previously skipped.
  • Timing fix: handleWebviewAskResponse was called before ask("resume_task") fired, causing TaskMessenger.ask() to wipe the pre-set response. Now waits for the diracMessagesChanged event before delivering the prompt.

8. fix(acp) — format tool results as markdown

  • Read tools (readFile, getFunction, etc., newFileCreated): content replaced with * path: [File Hash: xxx] bullets; full source still in rawOutput.
  • List tools (listFilesTopLevel, etc.): converted to * item bullet lists.
  • Tool call titles: file paths deduplicated (preserving insertion order).

9. fix(acp) — streaming text deduplication

  • Doubled completion_result text
  • Delta tracker race producing duplicate streamed chunks
  • execute_command output leaking through the generic streaming path as plain text
  • Duplicate followup / plan_mode_respond text in streaming turns

10. fix(acp) — multi-command permission: variable shadowing and wrong tool attribution

  • Permission storm: shadowed command variable caused approval of one command to trigger requests for all. Fixed + refactored ask:command handling in messageTranslator.ts.
  • Wrong toolCallId on permission requests in multi-tool turns: now correctly attributed per command.

11. fix(acp) — surface execute_command output and completion in ACP stream

execute_command mutates ask:command in place (no say:command_output); the translator previously ignored commandCompleted and output entirely. Now emits tool_call_update with aggregated output and terminal status (completed/failed) once done.

Also fixes a Zed rendering issue: the in_progress update no longer includes a placeholder content string when there is no output yet — Zed renders the first content it sees and ignores subsequent updates, so the placeholder was permanently shown.

marton78 and others added 19 commits May 30, 2026 15:06
…ctor

Tool-call label changed from "Tool:" to "Tool Call", command and
browser_action_launch ask labels changed to "Use tool?", and browser_action
moved to the verbose-only fallback path. Update the four affected
assertions in display.test.ts to match the current output.
ink v7 causes TerminalInfoProvider to emit �[16t (terminal pixel-size
query) during tests, corrupting stdout and breaking assertions. Add
vi.mock("ink-picture", ...) to App, App.resize-initial-prompt, ChatView,
and QuitCommand tests. Also rewrite QuitCommand's stdin.write-based
keyboard simulation to use a useChatInputHandler wrapper mock, which the
ink v7 event model requires.
…mock

ink v7 no longer delivers stdin events to useInput handlers via
ink-testing-library, so all keyboard-interaction tests in
SkillsPanelContent.test.tsx were failing. Replace stdin.write() calls with
a vi.mock("ink") that captures the registered useInput handler into a ref,
then invoke it directly via a pressKey() helper.
…pt text changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arameters

Previously only added in strict mode, causing the OpenAI no-params tool test
to fail. Also switches tools snapshot comparison to deep object equality so
JSON property ordering no longer causes spurious failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sage output

openRouterDefaultModelInfo now has non-zero pricing, so stub models with
zero prices and assert totalCost:0 explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- debug_syntax: add missing messageState stub (syncCache reads conversation history)
- RenameSymbol: resolve displayPath to absolute before writing in saveChanges mock,
  and add showReview/hideReview stubs required by the non-auto-approve path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ialMessageIfExistsWithType

The wrapper was always passing the optional third argument explicitly, even
when undefined. sinon's calledWithExactly treats f("a","b",undefined) and
f("a","b") as different calls, causing SubagentToolHandler partial-streaming
tests to fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds cli/scripts/acp-probe.mts, a standalone ACP client that drives the
--acp server without an editor, enabling reproducible end-to-end tests of
the full ACP session lifecycle.

The probe speaks JSON-RPC 2.0 over stdin/stdout, exercising the same
transport path as a real ACP client (e.g. Zed).

Subcommands:
- prompt <text>: session/create → session/prompt, streams all session
  updates to stdout and exits once a stopReason is received.
- load-prompt <sessionId> <text>: loadSession → session/prompt,
  exercising the session continuation path (resumeTaskFromHistory).

Flags:
- --pre-set <key=value>: apply session-config entries (e.g. mode,
  apiProvider) before the first prompt, so tests can exercise different
  session modes without interactive setup.
- --approve-delay-ms <ms>: insert a configurable delay before
  auto-approving permission requests, useful for testing long-running
  permission waits.
- DIRAC_ACP_DEBUG=1: emit raw JSON-RPC messages to stderr for tracing.
…rors

Extracts a new cli/src/initCoreServices.ts helper called by both the ACP
path and the interactive CLI path, ensuring ErrorService, workspace paths,
and core service registrations are initialised consistently regardless of
which mode is active.

Also surfaces initialisation I/O failures (missing API key, bad config
path, workspace resolution errors, etc.) as structured JSON-RPC error
responses instead of crashing the process or hanging on stdin. Clients
receive an error object with a descriptive message rather than a silent EOF.

Adds DiracAgent.review.test.ts with test coverage for the init-failure path.
…l, retry cleanup

Several fixes to the prompt-turn state machine in DiracAgent:

1. Terminate every error path: previously, certain runPromise rejections
   and unhandled handler errors left the prompt promise permanently
   unresolved, causing the client to hang. A universal catch handler plus
   handleUnhandledHandlerError() now resolves the pending prompt with a
   structured failure on any un-caught exception. Also exposes
   task.runError from src/core/task/index.ts and
   src/core/controller/index.ts so the agent can distinguish provider
   errors from task errors.

2. Render errors as a failed tool_call: instead of emitting error text as
   an agent_message_chunk (which looks like model output in the UI),
   failed turns now emit a tool_call("error") + tool_call_update(status:
   "failed") pair. Clients can distinguish structured failures from real
   model output.

3. session/cancel responds with stopReason "cancelled": the cancel()
   handler now resolves the pending prompt with { stopReason: "cancelled" }
   as required by the ACP spec, rather than leaving the turn unresolved.

4. Clean up retry/terminal-failure state across turns: retry and
   terminal-failure tracking variables are reset at the start of each new
   message and turn, preventing stale state from a previous turn bleeding
   into the next. The relevant tracking fields are added to public-types.ts
   and the message-handling logic in messageTranslator.ts.
…errides

Exposes a session/setMode RPC that clients use to switch between Plan,
Act, Auto, and YOLO modes on a per-session basis.

- DiracAgent handles a new "setMode" action in the session request handler,
  mapping it through setSessionOverride() so the mode is stored as a
  per-session override rather than a global state mutation.

- Auto-approve and YOLO flag overrides are now scoped per session via a
  SessionOverrides map in StateManager. Previously these were written to
  global state, meaning a YOLO session leaked its approvals to any
  subsequent session on the same process. Each session now carries its own
  overrides; cancelling or ending a session clears them.
…reinit

Two correctness fixes in DiracAgent:

1. Atomic cancel flag claim: cancel() now claims pending.resolved.value =
   true synchronously before awaiting cancelTask(), ensuring no concurrent
   resolution path can steal the slot during the async gap. The handler
   then calls pending.resolve({ stopReason: "cancelled" }) after the await,
   so the response is sent once the task is truly stopped. Satisfies the
   invariant: claim-then-act, never act-then-claim.

2. Listener leak on cancel→reinit: the diracMessagesChanged listener
   previously closed over controller.task, but cancelTask() can replace
   controller.task with a freshly initialised task. The stale reference
   caused the listener to fire against the old task and never subscribe to
   the new one. The fix captures the task reference once at listener
   registration time.
ACPDiffViewProvider.saveChanges() was not being called when the user
approved a file-edit permission request in the ACP flow. AcpAgent now
invokes saveChanges() on the provider after the permission is granted, so
edits are actually written to disk instead of being silently discarded.
Also removes a stale event-listener registration that was no longer needed.
…y, user messages, timing fix

Four related fixes to make loadSession + prompt work correctly after a
process restart:

1. sessionId as taskId: after a restart the agent must locate the task by
   ID. Previously taskId was derived from a controller-internal counter,
   causing a mismatch on reload. Now the ACP sessionId IS the taskId,
   routed through a new acp-session-tasks.ts helper and stored in
   state-keys. Dead code in backfill.ts / utils.ts that worked around the
   old mismatch is removed.

2. Replay session history on loadSession: AcpAgent.loadSession now calls
   replayHistory() in DiracAgent to re-emit all prior tool_call /
   agent_message_chunk events, so a client that reconnects after a restart
   can reconstruct the full conversation state.

3. Restore user messages in history replay: the replay loop previously
   skipped "user" role messages, causing gaps in clients that display the
   full thread. All message types are now emitted.

4. Timing fix for continuation: the previous implementation called
   handleWebviewAskResponse immediately after reinitExistingTaskFromId,
   but resumeTaskFromHistory() runs asynchronously — TaskMessenger.ask()
   clears taskState.askResponse at its start, wiping the pre-set response
   before pWaitFor sees it and causing the task to hang. The fix subscribes
   to task.messageStateHandler's diracMessagesChanged event and delivers
   the user's prompt only once ask("resume_task" | "resume_completed_task")
   actually fires.
…s, dedup paths

Three display fixes so tool results render usefully in clients like Zed:

1. Format read-tool content as hash bullets: instead of passing full file
   source verbatim into the content field, read tools (readFile,
   readLineRange, getFunction, getFileSkeleton, newFileCreated) now produce
   "* path: [File Hash: xxx]" bullets. Full source is still available in
   rawOutput for programmatic use.

2. Format list-tool content as markdown bullets: directory listings from
   listFilesTopLevel, listFilesRecursive, and listCodeDefinitionNames are
   converted to "* item" bullet lists, stripping the header/count lines.

3. Deduplicate file paths in tool call titles: multi-file operations (e.g.
   readFile with paths=[a, b, a]) previously showed duplicate paths in the
   title. Paths are now deduplicated while preserving insertion order.

Adds test coverage for all three cases.
…ecute_command leakage, duplicate followup

Four related streaming/deduplication fixes:

1. Prevent doubled completion_result text: completion_result content was
   emitted twice — once as an agent_message_chunk and again as a
   tool_call_update content field. Added a guard that skips re-emitting
   text already surfaced as a completion_result.

2. Fix streaming text duplication from delta tracker race: two concurrent
   delta-tracker flushes each emitted the full accumulated text, producing
   doubled output on the client. The tracker is now reset correctly after
   each flush.

3. Stop execute_command streaming from leaking as raw text: the command
   output branch in messageTranslator.ts was inadvertently falling through
   to the generic streaming path, showing raw command text as plain agent
   text in the client. The command-output branch now handles its own
   content and explicitly skips the generic path.

4. Fix duplicate followup/plan_mode_respond text in streaming turns:
   followup and plan_mode_respond messages appeared twice in streaming turns
   (once streamed, once as a final agent_message_chunk). DiracAgent now
   tracks whether text was already streamed and skips the terminal re-emit.
… tool attribution

Two bugs in the command-execution permission path:

1. Permission storm from variable shadowing: a let/const shadowing bug in
   the command handler caused the approval callback to close over the outer
   `command` binding instead of the inner loop variable, so approving any
   single command triggered permission requests for all pending commands
   simultaneously. Fixed by renaming the inner binding. Also refactors the
   ask:command handling in messageTranslator.ts to correctly distinguish
   the approval, preview, executing, and completed states.

2. Wrong tool call for permission on multi-command turns: in turns with
   multiple tool calls the permission request was always attributed to the
   first toolCallId, not the one actually requesting permission. The handler
   now tracks and surfaces the correct toolCallId per command. Adds test
   coverage in acp/index.ts and messageTranslator.test.ts.
execute_command results were invisible in the ACP stream: the tool_call
stayed permanently "pending" with no output and no terminal status.

Root cause: execute_command does not emit a say:command_output message —
it mutates the ask:command message in place, accumulating output in
multiCommandState.commands[].output and flipping commandCompleted
false→true. The ask:command translator ignored those fields entirely.

Fix (messageTranslator.ts): once commandCompleted is defined, emit
tool_call_update(s) carrying aggregated output and a terminal status
(in_progress → completed/failed). New helpers formatMultiCommandOutput
and buildCommandExecutionUpdates produce the content path plus
terminal_output/terminal_exit metadata for terminal-capable clients.
A pendingToolCalls status guard prevents a late update from regressing a
completed command back to in_progress.

Also fixes a Zed rendering issue: the initial in_progress update no longer
includes a placeholder content string for commands with no output yet.
Only include content when output is non-empty; Zed renders the first
content it sees and ignores subsequent updates, so the placeholder was
permanently shown instead of the actual output.

Adds 5 unit tests for the command-output path in messageTranslator.test.ts.
@marton78

Copy link
Copy Markdown
Contributor Author

@pleibers I spent the few days testing ACP in Zed, this branch makes the ACP connection pretty much functional!

@pleibers

pleibers commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Amazing!
Thank you

@pleibers pleibers mentioned this pull request Jun 1, 2026
@dirac-run

Copy link
Copy Markdown
Owner

tried to merge this but it conflicts with a just PR #114

also, a lot of this will change fundamentally after the modular tooling changes are pushed today. I will pin this for now since the change is quite large. is there any way to redo some of those after pulling the latest changes? if not, i will do another pass in the new master and try to absorb these later

@pleibers

pleibers commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

I think this PR does it better than what I did, since it makes the ACP fully functional and not just polish what was there. So if possible prefer this over #114.

@dirac-run

Copy link
Copy Markdown
Owner

thanks! okay so I just finished with the merge conflicts from other PRs. I think what i will do is, just update the master first and then separately work on absorbing this PR. it is valuable i agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants