fix(ai): recover from invalid tool-call input instead of aborting the agent stream#2699
Conversation
… agent stream When a model emits a tool call whose arguments fail inputSchema validation (and no experimental_repairToolCall fixes it), executeTool now returns the validation error to the model as an error-text tool result — the same way tool execution errors are already handled — instead of throwing and aborting the whole agent stream. The recovery path also emits an ai.toolCall span recording the error so the failure stays observable in traces. In-repo copy of #2192 by @boomyao, opened to run the full CI suite. Co-Authored-By: yao <zhangyaoruo@outlook.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 5fe17a4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
karthikscale3
left a comment
There was a problem hiding this comment.
Approving. What I checked:
- Fix correctness:
executeToolnow returns anerror-textLanguageModelV3ToolResultPartfor unparseable/invalid input — identical in shape to the existingexecute()-error catch below it. - Bug mechanism: both call sites use
Promise.all(toolCalls.map(executeTool)); the oldthrowrejected the whole batch → outer catch →onError→ aborted the durable run. Returning instead lets the loop continue and feed the error back viaiterator.next(toolResults), bounded bymaxSteps(no infinite-retry risk). - Still-throwing paths preserved intentionally: "tool not found" / "no execute function" still throw and abort (unrecoverable, not model mistakes).
- Helpers:
recordSpanaccepts a syncfnand passesundefinedspan when telemetry is off (guarded);getErrorMessagehandles any thrown shape. - Tests: recovery test asserts
executeruns exactly once with the corrected input — proving productive self-correction, not just error feedback. - CI: 106 pass / 0 fail, including the full deploy-backed E2E matrix + unit tests (ubuntu & windows).
Non-blocking notes below.
| // matching the tool-execution-error path below. Emit an `ai.toolCall` span | ||
| // recording the failure so the recovered error stays observable in traces. | ||
| const parseErrorMessage = getErrorMessage(parseError); | ||
| return recordSpan({ |
There was a problem hiding this comment.
Verified this returns the same error-text tool-result shape as the execute()-error catch below, so Promise.all no longer rejects and the run recovers within maxSteps instead of aborting.
Two non-blocking notes:
- Telemetry asymmetry: this path sets
spanstatus=ERROR +ai.toolCall.error, but the existingexecute()-error path returns error-text without marking its span errored. New path is strictly more observable; worth unifying eventually. - Behavior change (intentional, matches
streamText): callers that relied on the throw /onErrorfor invalid tool input no longer get that signal — the run now succeeds carrying an error-text result.
|
Backport PR opened against |
In-repo copy of #2192 by @boomyao, opened so the full CI suite (including deploy-backed E2E lanes that fork PRs can't run) executes against the change. Credit for the original fix goes to @boomyao.
Problem
In
DurableAgent, a tool call whose arguments failinputSchemavalidation (and thatexperimental_repairToolCallcan't fix) causedexecuteTooltothrow, which abortsagent.stream()and fails the entire durable run. Meanwhile a tool whoseexecute()throws is already caught and fed back to the model as anerror-textresult so the agent recovers. A model occasionally emitting a slightly-malformed tool call is recoverable, so the two paths were inconsistent.Fix
executeToolnow returns the validation error as anerror-texttool result instead of throwing — identical to howexecute()errors are handled — so the agent sees the error as a tool result and can correct its arguments and retry withinmaxSteps. This matches AI SDK'sstreamTexttool-error behavior. The recovery path also emits anai.toolCallspan recording the error so the failure stays observable in traces.Tests
error-textresult instead of rejecting the stream.Changeset included (
@workflow/aipatch).Closes #2192
🤖 Generated with Claude Code