Skip to content

fix(ai): recover from invalid tool-call input instead of aborting the agent stream#2192

Closed
boomyao wants to merge 4 commits into
vercel:mainfrom
boomyao:fix/durable-agent-recover-invalid-tool-input
Closed

fix(ai): recover from invalid tool-call input instead of aborting the agent stream#2192
boomyao wants to merge 4 commits into
vercel:mainfrom
boomyao:fix/durable-agent-recover-invalid-tool-input

Conversation

@boomyao

@boomyao boomyao commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Problem

In DurableAgent, the two kinds of tool error are handled in opposite ways inside executeTool:

  • execute() throws → caught and converted to an error-text tool result fed back to the model, so the agent recovers and the stream continues. The code comment even says this "aligns with AI SDK's streamText behavior for individual tool failures."
  • Tool-call arguments fail inputSchema validation (and no experimental_repairToolCall fixes them) → throw, which propagates out of executeTool, aborts agent.stream(), and fails the entire durable workflow run.

A model occasionally emitting a slightly-malformed tool call (an empty array where .min(1) is required, a missing required field, a wrong type, truncated-then-JSON-repaired args) is a recoverable event — the model will usually fix it if told. But today it is fatal: one bad tool call kills a long-running task, with no chance for the agent to self-correct. The only hook on this path, experimental_repairToolCall, can't help here because its returned tool call must itself pass the schema — so it can fix malformed JSON syntax but cannot express "tell the model its arguments were invalid and let it regenerate."

This is inconsistent (the framework already recovers from the harder case — execute() throwing) and looks like an oversight rather than intent.

Reproduction

import { z } from 'zod';
import { DurableAgent } from '@workflow/ai/agent';
import { MockLanguageModelV3, convertArrayToReadableStream } from 'ai/test';

const toolCall = (toolName: string, input: string) =>
  convertArrayToReadableStream<any>([
    { type: 'stream-start', warnings: [] },
    { type: 'tool-call', toolCallId: 'c1', toolName, input },
    { type: 'finish', finishReason: 'tool-calls', usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } },
  ]);
const stop = () =>
  convertArrayToReadableStream<any>([
    { type: 'stream-start', warnings: [] },
    { type: 'text-start', id: 't' }, { type: 'text-delta', id: 't', delta: 'ok' }, { type: 'text-end', id: 't' },
    { type: 'finish', finishReason: 'stop', usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } },
  ]);

async function run(toolName: string, inputSchema: any, execute: any, input: string) {
  let n = 0;
  const model = new MockLanguageModelV3({ doStream: async () => (++n === 1 ? { stream: toolCall(toolName, input) } : { stream: stop() }) });
  const agent = new DurableAgent({ model: () => model, instructions: 'x', tools: { [toolName]: { description: 'd', inputSchema, execute } } });
  try {
    await agent.stream({ messages: [{ role: 'user', content: 'go' }], activeTools: [toolName], maxSteps: 5, writable: new WritableStream({ write() {} }), preventClose: true, sendFinish: false });
    return 'STREAM SURVIVED';
  } catch (e) { return `STREAM ABORTED: ${(e as Error).message}`; }
}

// A: strict schema, model sends invalid args (empty string violates .min(1))
console.log(await run('strict', z.object({ x: z.string().min(1) }), () => ({ ok: true }), '{"x":""}'));
// B: permissive schema, but execute() throws
console.log(await run('thrower', z.object({}), () => { throw new Error('boom'); }, '{}'));

Before this PR:

A (schema-invalid input) → STREAM ABORTED: Invalid input for tool "strict": [ ... too_small ... ]
B (execute() throws)     → STREAM SURVIVED

A should survive too.

Fix

executeTool already funnels both malformed-JSON and the re-thrown "Invalid input for tool ..." schema-validation error through a single throw parseError at the end of the parse/validate block. This PR changes that one escape point to return the error as an error-text tool result — identical to how execute() errors are handled a few lines below — so the agent receives the error as a tool result and can correct its arguments and retry within maxSteps. experimental_repairToolCall still runs first; only the final give-up changes from throw to recover.

After this PR, both A and B print STREAM SURVIVED.

Notes

  • Behavior change: a tool call with invalid arguments that previously rejected the stream now feeds the validation error back to the model (bounded by maxSteps), consistent with execute() errors and AI SDK streamText. Happy to gate it behind an option (e.g. onInvalidToolInput: 'feedback' | 'throw', default 'feedback') if you'd prefer to preserve the throw for some callers — let me know.
  • Added a regression test mirroring the existing "tool execution error → error-text" test.
  • Added a changeset (@workflow/ai patch).

Verified locally: packages/ai typecheck clean, vitest run (47 tests) green, Biome clean on the changed files.

… stream

DurableAgent.executeTool threw when a tool call's arguments failed inputSchema
validation (and no experimental_repairToolCall fixed it), aborting the whole
agent stream — which fails the entire durable workflow run. Tool *execution*
errors are already recovered (returned to the model as an error-text tool
result so the agent can self-correct); this makes input parse/validation
failures consistent: return the error as an error-text tool result instead of
throwing, so a single occasionally-malformed model tool-call can no longer kill
a long-running task. Aligns with AI SDK streamText behavior.

Signed-off-by: yao <zhangyaoruo@outlook.com>
@boomyao boomyao requested a review from a team as a code owner June 1, 2026 05:19
@changeset-bot

changeset-bot Bot commented Jun 1, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: e2266dd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel

vercel Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@boomyao is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

@VaguelySerious VaguelySerious left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI review: no blocking issues

Comment thread packages/ai/src/agent/durable-agent.ts Outdated
// of aborting the entire stream. This aligns with AI SDK's streamText behavior
// for tool failures. Reaches here both for malformed JSON and for the
// re-thrown "Invalid input for tool ..." schema-validation error above.
return {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Note

Recovering the run is the right call. One side effect worth being intentional about: because nothing throws out of executeTool for invalid input anymore, this path no longer reaches the outer catch that invokes onError, and the return happens before the recordSpan block below — so an invalid tool call now produces no ai.toolCall span and no onError callback. That's consistent with how execute() errors and AI SDK's tool-error are handled (neither surfaces via onError), but it's a real change from today, and the execute() path at least still emits a span. A caller relying on onError to observe malformed model output will go silent. Suggest emitting an ai.toolCall span here with the error recorded, so the recovered failure stays visible in traces.

Related follow-up (out of scope here): a hallucinated/unknown tool name still throws and aborts the whole stream higher up in this function — the same class of recoverable model mistake — so the two paths aren't fully consistent yet.

});
});

it('should convert invalid tool input to error-text result instead of failing stream', async () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review: Nit

This verifies the validation error is fed back as error-text, but the mocked second next() returns { done: true }, so it doesn't prove the agent productively recovers. Consider having the second turn return a corrected tool call and asserting the tool's execute actually runs with the fixed input — that exercises the full "self-correct and retry within maxSteps" claim, not just the error-feedback half.

… productive recovery

Address review feedback on recovering from invalid tool-call input:

- The invalid-input recovery path no longer threw, so it produced no
  ai.toolCall span (the execute()-error path still does). Emit a span here
  that records the validation error and ERROR status, so the recovered
  failure stays observable in traces even though it is intentionally not
  surfaced via onError (matching tool-execution errors and AI SDK).
- Add a test that drives invalid -> corrected tool call and asserts the
  tool actually executes once with the fixed input, proving the agent
  productively self-corrects rather than only feeding the error back.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@VaguelySerious VaguelySerious requested a review from ijjk as a code owner June 29, 2026 19:14
@VaguelySerious

Copy link
Copy Markdown
Member

(AI) Pushed a follow-up commit addressing two of the review points directly:

  • Observability (Note): the invalid-input recovery path now emits an ai.toolCall span recording the validation error + ERROR status, so the recovered failure stays visible in traces (it remains intentionally off the onError path, matching tool-execution errors / AI SDK).
  • Test (Nit): added a test that drives invalid → corrected tool call and asserts the tool actually executes once with the fixed input, proving productive self-correction rather than just error feedback.

The no-such-tool nit is left as a separate follow-up as noted inline. @workflow/ai build + tests green locally.

VaguelySerious and others added 2 commits June 29, 2026 12:30
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@VaguelySerious

Copy link
Copy Markdown
Member

(AI) Opened #2699 as an in-repo copy of this PR so the full CI suite (including the deploy-backed E2E lanes that fork PRs can't run) executes against the change. Credit for the fix remains with @boomyao — the copy is co-authored accordingly. #2699 is set to close this PR when it merges.

@VaguelySerious

Copy link
Copy Markdown
Member

@boomyao Can't merge your PR due to signed commit requirements. I created a copy in #2699 and will merge that and make sure you end up in the git log

VaguelySerious added a commit that referenced this pull request Jun 30, 2026
… agent stream (#2699)

In-repo copy of #2192 by @boomyao, opened to run the full CI suite.

Co-authored-by: yao <zhangyaoruo@outlook.com>
github-actions Bot added a commit that referenced this pull request Jun 30, 2026
… agent stream (#2699)

In-repo copy of #2192 by @boomyao, opened to run the full CI suite.

Co-authored-by: yao <zhangyaoruo@outlook.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
VaguelySerious added a commit that referenced this pull request Jul 1, 2026
… agent stream (#2699) (#2703)

#2192 by @boomyao

Co-authored-by: yao <zhangyaoruo@outlook.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants