fix: Gemini ACP runs recorded as false 15m timeouts by seancdavis · Pull Request #43 · netlify/axis

seancdavis · 2026-06-22T20:29:55Z

Problem

Some Gemini eval runs were recorded as "Scenario time limit reached (15m)" failures (exit 1, no score, 0 tokens) even though the agent completed the task — the transcripts end with a terminal prompt_result/end_turn and real token usage.

Root cause, all in the shared ACP adapter:

It ignored input.timeoutMs (the scenario limit) and used its own hardcoded 10‑minute default, so runs were killed at ~10m. The two affected runs both died at ~600s.
It never handled the runner's abort signal, and only killed the CLI process — leaving subprocesses it spawned (e.g. node --test) lingering and holding the job open.
A clean turn that produced only tool calls (no final assistant text) was marked failed: cleanup SIGTERMs the lingering CLI → non‑zero exit → a fabricated "exited with non-zero code" error.

Fix

Honour input.timeoutMs (falling back to the adapter default) so runs get their full configured budget instead of a hidden 10m cap.
Tear down the process tree: spawn the CLI detached and signal the whole process group on completion/timeout/abort, so lingering subprocesses are reaped and can't hold the job open.
Wire up the abort signal (SIGTERM → SIGKILL), matching the base adapter, so token/run limits and Ctrl‑C actually stop the agent.
Don't fail a clean completion for a cleanup exit code: a non‑zero/signal exit after a terminal prompt_result is our own teardown, not a failure — even when the agent produced no final text. (Addresses the CodeRabbit review finding.)
Treat a hard cutoff as a failure: a time limit, token limit, or Ctrl‑C fails the run even if a terminal result lands in the same instant — but the token usage observed before the cut is still attributed.

Tests

Adapter unit tests cover: input.timeoutMs is honoured over the adapter default; an abort tears down the process group; timeout/abort cutoffs fail with token usage attributed; and a clean completion with no final text is a success despite a non‑zero exit.

Builds on #44 (isFailedRun), which scores a completed run as passed even when its process exited via signal.

netlify · 2026-06-22T20:30:03Z

✅ Deploy Preview for axis-docs ready!

Name	Link
🔨 Latest commit	`68f837e`
🔍 Latest deploy log	https://app.netlify.com/projects/axis-docs/deploys/6a3acc997ef33c0008dfd958
😎 Deploy Preview	https://deploy-preview-43--axis-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2026-06-22T20:30:10Z

✅ Deploy Preview for axisproject ready!

Name	Link
🔨 Latest commit	`68f837e`
🔍 Latest deploy log	https://app.netlify.com/projects/axisproject/deploys/6a3acc99638c470008fee49e
😎 Deploy Preview	https://deploy-preview-43--axisproject.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-06-22T20:30:17Z

📝 Walkthrough

Walkthrough

createAcpBasedAdapter's run() now spawns ACP CLI child processes with detached: true to form a process group. A new killProcessTree helper signals the entire group via process.kill(-pid, signal) with a direct-child fallback, and the Ctrl-C cleanup path is updated to use it.

Timeout handling switches from a fixed adapter default to input.timeoutMs (with adapter default fallback). A new input.signal abort path is added, tracking abortedBySignal/abortReason and scheduling SIGTERM→SIGKILL. A completedCleanly flag (set when stopReason !== "cancelled") prevents timeout or abort teardown from overriding a result that already completed successfully. Exit codes and errors are normalized when the child was killed after clean completion.

Tests add a process.kill spy and four race-condition scenarios covering completion-before-timeout, completion-before-abort, timeout without completion, and abort-before-completion process-group signaling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main fix: resolving false 15-minute timeout recordings in Gemini ACP runs that had actually completed.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the problem, root causes, and the implemented fix with specific details about timeouts, process management, and result handling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/gemini-acp-completion-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

test/unit/adapters/acp-adapter.test.ts (1)
924-1071: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider moving fake timer cleanup to afterEach for robustness.

The tests correctly call vi.useRealTimers() at the end, but if a test fails midway, fake timers could leak. Since these tests are grouped together and each starts with vi.useFakeTimers(), this is unlikely to cause issues in practice.
♻️ Suggested approach

Either add cleanup to the existing afterEach:
  afterEach(() => {
    processKillSpy.mockRestore();
+   vi.useRealTimers();
  });
Or wrap individual tests with try/finally:
it("...", async () => {
  vi.useFakeTimers();
  try {
    // test body
  } finally {
    vi.useRealTimers();
  }
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit/adapters/acp-adapter.test.ts` around lines 924 - 1071, Move the
vi.useRealTimers() cleanup call from the end of each test method to a
centralized afterEach hook to ensure fake timers are properly reset even if a
test fails before reaching the cleanup code. The affected tests are the ones
starting with vi.useFakeTimers() such as the tests for timeout behavior, abort
signal handling, and the timeout failure scenarios. This prevents fake timer
state from leaking between tests when exceptions occur during test execution.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/unit/adapters/acp-adapter.test.ts`:
- Around line 924-1071: Move the vi.useRealTimers() cleanup call from the end of
each test method to a centralized afterEach hook to ensure fake timers are
properly reset even if a test fails before reaching the cleanup code. The
affected tests are the ones starting with vi.useFakeTimers() such as the tests
for timeout behavior, abort signal handling, and the timeout failure scenarios.
This prevents fake timer state from leaking between tests when exceptions occur
during test execution.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f198963-f31c-4100-90ef-d29fad26846e

📥 Commits

Reviewing files that changed from the base of the PR and between 51539fe and 695d46c.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (2)

src/adapters/base/acp-adapter.ts
test/unit/adapters/acp-adapter.test.ts

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

netlify/blueprints (manual)

The ACP adapter used only its own 10m DEFAULT_TIMEOUT_MS and ignored `input.timeoutMs`, so Gemini scenarios with the default 15m limit were actually killed at 10m. It also ignored `input.signal` entirely, so per-scenario/overall token limits and Ctrl-C never tore down the child. Use `input.timeoutMs ?? default` for the timeout, wire up the abort signal (SIGTERM → SIGKILL) the way the base adapter does, and attribute any observed token usage on the timeout/abort paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When the gemini CLI (or a subprocess it spawned, e.g. `node --test`) lingered after the agent finished, the run could be killed by the timeout/abort right after `connection.prompt()` returned a terminal `prompt_result`/`end_turn` — and the timeout path discarded the result and token usage, turning a success into a "timeout" failure with 0 tokens and no score. Track whether the prompt reached a genuine terminal result (any stop reason other than `cancelled`). If it did, record the run as completed with its real result and token usage even when the timeout or abort fired, and normalise the child's kill-induced non-zero exit code. Runs that never reach a terminal result still fail as timeouts/aborts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ACP agents shell out to subprocesses (e.g. `node --test`) that can outlive the CLI. Killing only the CLI left those grandchildren running, leaking resources and — when they held the child's stdio pipes open — delaying the `close` event that ends the job. Spawn the CLI detached so its pid doubles as a process-group id, and add killProcessTree() to signal the whole group (with a direct child.kill fallback) at every teardown site: timeout, abort, Ctrl-C cleanup, and the normal post-prompt shutdown. The acp-adapter unit tests now stub process.kill so the group signal can't reach the real OS during the run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds regression tests for the false-timeout bug: a run whose terminal prompt_result lands after the timeout or abort fires is recorded as a completed success (real result, real token usage, exit 0); a run that never reaches a terminal result still fails as a timeout; the per-run input.timeoutMs is honoured over the adapter default; and an abort before completion tears down the whole process group. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/adapters/base/acp-adapter.ts`:
- Around line 454-462: The `killedAfterCompletion` logic on line 456 does not
account for the SIGTERM sent by the finally block after clean completion,
causing clean completions with no assistant text to be incorrectly marked as
failed. Modify the error gate condition on line 459 (the check that includes
`exitCode !== 0 && !state.lastAssistantMessage`) to also gate on
`!completedCleanly` so that when the process completes cleanly, the non-zero
exit code from the finally block's SIGTERM is not treated as an error. This
ensures clean completions are not fabricating an error message just because of
the normal cleanup SIGTERM.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f76c1aad-d89c-4f44-bf46-a98352a7a6ca

📥 Commits

Reviewing files that changed from the base of the PR and between 695d46c and 98ffa1b.

📒 Files selected for processing (2)

src/adapters/base/acp-adapter.ts
test/unit/adapters/acp-adapter.test.ts

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

netlify/blueprints (manual)

🚧 Files skipped from review as they are similar to previous changes (1)

test/unit/adapters/acp-adapter.test.ts

A run that reached a terminal result (end_turn) but produced only tool calls — no final assistant text — left lastAssistantMessage null. The cleanup path closes stdin and SIGTERMs a lingering CLI, so the child exits non-zero, and the exit-code error gate then manufactured "Agent process exited with non-zero code", marking a successful run as failed. Gate that error (and the exit-code normalisation) on completedCleanly rather than only on a timeout/abort kill, so any non-zero/signal exit after a clean turn is treated as our own cleanup, not a failure. Addresses CodeRabbit review finding on PR #43. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A scenario time limit, token limit, or Ctrl-C is a hard cap. Previously the adapter preserved a run that completed in the same instant the timeout/abort fired as a success — but the runner already re-stamped aborts as failures, so time-limit and token-limit races resolved differently (pass vs. fail). Drop the completion-race preservation from the timeout/abort paths so a cutoff always fails, regardless of a last-instant terminal result. Token usage observed before the cut is still attributed. The original false-timeout bug stays fixed: those runs finish well within the limit (honouring input.timeoutMs) and are reaped via the process tree, so no cutoff fires for them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

seancdavis and others added 5 commits June 23, 2026 13:43

test(gemini): harden ACP adapter fake timer cleanup

98ffa1b

seancdavis force-pushed the fix/gemini-acp-completion-timeout branch from 695d46c to 98ffa1b Compare June 23, 2026 17:45

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread src/adapters/base/acp-adapter.ts Outdated

seancdavis and others added 2 commits June 23, 2026 14:08

seancdavis merged commit 3eacdba into main Jun 23, 2026
9 checks passed

seancdavis deleted the fix/gemini-acp-completion-timeout branch June 23, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Gemini ACP runs recorded as false 15m timeouts#43

fix: Gemini ACP runs recorded as false 15m timeouts#43
seancdavis merged 7 commits into
mainfrom
fix/gemini-acp-completion-timeout

seancdavis commented Jun 22, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seancdavis commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests

Uh oh!

netlify Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for axis-docs ready!

Uh oh!

netlify Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for axisproject ready!

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seancdavis commented Jun 22, 2026 •

edited

Loading

netlify Bot commented Jun 22, 2026 •

edited

Loading

netlify Bot commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading