Skip to content

fix(ai): recover from invalid tool-call input instead of aborting the agent stream#2699

Merged
VaguelySerious merged 1 commit into
mainfrom
peter/durable-agent-recover-invalid-tool-input
Jun 30, 2026
Merged

fix(ai): recover from invalid tool-call input instead of aborting the agent stream#2699
VaguelySerious merged 1 commit into
mainfrom
peter/durable-agent-recover-invalid-tool-input

Conversation

@VaguelySerious

Copy link
Copy Markdown
Member

In-repo copy of #2192 by @boomyao, opened so the full CI suite (including deploy-backed E2E lanes that fork PRs can't run) executes against the change. Credit for the original fix goes to @boomyao.

Problem

In DurableAgent, a tool call whose arguments fail inputSchema validation (and that experimental_repairToolCall can't fix) caused executeTool to throw, which aborts agent.stream() and fails the entire durable run. Meanwhile a tool whose execute() throws is already caught and fed back to the model as an error-text result so the agent recovers. A model occasionally emitting a slightly-malformed tool call is recoverable, so the two paths were inconsistent.

Fix

executeTool now returns the validation error as an error-text tool result instead of throwing — identical to how execute() errors are handled — so the agent sees the error as a tool result and can correct its arguments and retry within maxSteps. This matches AI SDK's streamText tool-error behavior. The recovery path also emits an ai.toolCall span recording the error so the failure stays observable in traces.

Tests

  • Regression test: invalid tool input becomes an error-text result instead of rejecting the stream.
  • Recovery test: invalid → corrected tool call, asserting the tool executes once with the fixed input (productive self-correction, not just error feedback).

Changeset included (@workflow/ai patch).

Closes #2192

🤖 Generated with Claude Code

… agent stream

When a model emits a tool call whose arguments fail inputSchema validation
(and no experimental_repairToolCall fixes it), executeTool now returns the
validation error to the model as an error-text tool result — the same way
tool execution errors are already handled — instead of throwing and aborting
the whole agent stream. The recovery path also emits an ai.toolCall span
recording the error so the failure stays observable in traces.

In-repo copy of #2192 by @boomyao, opened to run the full CI suite.

Co-Authored-By: yao <zhangyaoruo@outlook.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@VaguelySerious VaguelySerious requested review from a team and ijjk as code owners June 29, 2026 22:08
@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 5fe17a4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment Jun 29, 2026 10:11pm
example-nextjs-workflow-webpack Ready Ready Preview, Comment Jun 29, 2026 10:11pm
example-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-astro-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-express-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-fastify-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-hono-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-nitro-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-nuxt-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-sveltekit-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-tanstack-start-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workbench-vite-workflow Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workflow-docs Ready Ready Preview, Comment, Open in v0 Jun 29, 2026 10:11pm
workflow-swc-playground Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workflow-tarballs Ready Ready Preview, Comment Jun 29, 2026 10:11pm
workflow-web Ready Ready Preview, Comment Jun 29, 2026 10:11pm

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🧪 E2E Test Results

All tests passed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 1442 0 230 1672
✅ 💻 Local Development 1605 0 219 1824
✅ 📦 Local Production 1605 0 219 1824
✅ 🐘 Local Postgres 1593 0 231 1824
✅ 🪟 Windows 152 0 0 152
✅ 📋 Other 885 0 179 1064
Total 7282 0 1078 8360

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 125 0 27
✅ example 125 0 27
✅ express 125 0 27
✅ fastify 125 0 27
✅ hono 125 0 27
✅ nextjs-turbopack 149 0 3
✅ nextjs-webpack 149 0 3
✅ nitro 125 0 27
✅ nuxt 125 0 27
✅ sveltekit 144 0 8
✅ vite 125 0 27
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 127 0 25
✅ express-stable 127 0 25
✅ fastify-stable 127 0 25
✅ hono-stable 127 0 25
✅ nextjs-turbopack-canary 133 0 19
✅ nextjs-turbopack-stable 152 0 0
✅ nextjs-webpack-canary 133 0 19
✅ nextjs-webpack-stable 152 0 0
✅ nitro-stable 127 0 25
✅ nuxt-stable 127 0 25
✅ sveltekit-stable 146 0 6
✅ vite-stable 127 0 25
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 127 0 25
✅ express-stable 127 0 25
✅ fastify-stable 127 0 25
✅ hono-stable 127 0 25
✅ nextjs-turbopack-canary 133 0 19
✅ nextjs-turbopack-stable 152 0 0
✅ nextjs-webpack-canary 133 0 19
✅ nextjs-webpack-stable 152 0 0
✅ nitro-stable 127 0 25
✅ nuxt-stable 127 0 25
✅ sveltekit-stable 146 0 6
✅ vite-stable 127 0 25
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 126 0 26
✅ express-stable 126 0 26
✅ fastify-stable 126 0 26
✅ hono-stable 126 0 26
✅ nextjs-turbopack-canary 132 0 20
✅ nextjs-turbopack-stable 151 0 1
✅ nextjs-webpack-canary 132 0 20
✅ nextjs-webpack-stable 151 0 1
✅ nitro-stable 126 0 26
✅ nuxt-stable 126 0 26
✅ sveltekit-stable 145 0 7
✅ vite-stable 126 0 26
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 152 0 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 127 0 25
✅ e2e-local-dev-tanstack-start- 127 0 25
✅ e2e-local-postgres-nest-stable 126 0 26
✅ e2e-local-postgres-tanstack-start- 126 0 26
✅ e2e-local-prod-nest-stable 127 0 25
✅ e2e-local-prod-tanstack-start- 127 0 25
✅ e2e-vercel-prod-tanstack-start 125 0 27

📋 View full workflow run

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 0.045s (+1.8%) 1.006s (~) 0.961s 10 1.00x
💻 Local Express 0.051s (+4.3%) 1.008s (~) 0.957s 10 1.13x
💻 Local Next.js (Turbopack) 0.052s (-7.2% 🟢) 1.008s (~) 0.956s 10 1.16x
🐘 Postgres Next.js (Turbopack) 0.060s (-2.0%) 1.012s (~) 0.952s 10 1.34x
🐘 Postgres Nitro 0.067s (+1.1%) 1.013s (~) 0.946s 10 1.49x
🐘 Postgres Express 0.071s (+10.5% 🔺) 1.013s (~) 0.943s 10 1.58x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 0.227s (+7.8% 🔺) 2.133s (+5.6% 🔺) 1.906s 10 1.00x
▲ Vercel Express 0.244s (-5.9% 🟢) 1.901s (-9.5% 🟢) 1.657s 10 1.07x
▲ Vercel Next.js (Turbopack) 0.401s (+63.3% 🔺) 2.094s (-11.8% 🟢) 1.693s 10 1.77x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 1.083s (~) 2.007s (~) 0.924s 10 1.00x
💻 Local Nitro 1.083s (~) 2.006s (~) 0.923s 10 1.00x
💻 Local Next.js (Turbopack) 1.084s (-0.8%) 2.007s (~) 0.922s 10 1.00x
🐘 Postgres Express 1.098s (+0.7%) 2.011s (~) 0.912s 10 1.01x
🐘 Postgres Next.js (Turbopack) 1.103s (+0.8%) 2.011s (~) 0.908s 10 1.02x
🐘 Postgres Nitro 1.103s (+0.6%) 2.011s (~) 0.908s 10 1.02x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 1.417s (+2.4%) 3.119s (+8.6% 🔺) 1.702s 10 1.00x
▲ Vercel Express 1.471s (+5.1% 🔺) 3.234s (+5.2% 🔺) 1.763s 10 1.04x
▲ Vercel Next.js (Turbopack) 2.479s (+5.3% 🔺) 3.869s (+5.1% 🔺) 1.390s 10 1.75x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 10.427s (~) 11.023s (~) 0.596s 3 1.00x
💻 Local Express 10.461s (~) 11.022s (~) 0.561s 3 1.00x
💻 Local Next.js (Turbopack) 10.468s (~) 11.021s (~) 0.553s 3 1.00x
🐘 Postgres Express 10.496s (~) 11.022s (~) 0.526s 3 1.01x
🐘 Postgres Next.js (Turbopack) 10.496s (~) 11.018s (~) 0.522s 3 1.01x
🐘 Postgres Nitro 10.531s (~) 11.020s (~) 0.489s 3 1.01x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 11.716s (~) 13.550s (+2.2%) 1.834s 3 1.00x
▲ Vercel Next.js (Turbopack) 13.702s (+4.2%) 15.669s (+5.3% 🔺) 1.967s 2 1.17x
▲ Vercel Nitro 13.755s (+17.3% 🔺) 15.189s (+11.0% 🔺) 1.434s 2 1.17x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 13.563s (~) 14.027s (~) 0.465s 5 1.00x
💻 Local Next.js (Turbopack) 13.627s (-0.8%) 14.027s (~) 0.400s 5 1.00x
🐘 Postgres Nitro 13.633s (~) 14.022s (~) 0.389s 5 1.01x
💻 Local Express 13.668s (~) 14.030s (~) 0.362s 5 1.01x
🐘 Postgres Next.js (Turbopack) 13.695s (+0.5%) 14.025s (~) 0.330s 5 1.01x
🐘 Postgres Express 13.761s (+1.0%) 14.023s (~) 0.261s 5 1.01x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 16.295s (~) 17.733s (-4.8%) 1.439s 4 1.00x
▲ Vercel Express 16.681s (+1.5%) 18.560s (+2.8%) 1.879s 4 1.02x
▲ Vercel Next.js (Turbopack) 18.035s (-3.8%) 20.082s (-2.3%) 2.047s 3 1.11x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 12.121s (-0.9%) 13.019s (~) 0.897s 7 1.00x
💻 Local Next.js (Turbopack) 12.169s (-2.1%) 13.026s (~) 0.856s 7 1.00x
💻 Local Express 12.177s (~) 13.026s (~) 0.848s 7 1.00x
💻 Local Nitro 12.263s (~) 13.025s (~) 0.762s 7 1.01x
🐘 Postgres Express 12.290s (+0.9%) 13.020s (~) 0.730s 7 1.01x
🐘 Postgres Next.js (Turbopack) 12.335s (-2.1%) 13.016s (-1.1%) 0.682s 7 1.02x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 19.816s (+10.6% 🔺) 21.730s (+10.1% 🔺) 1.914s 5 1.00x
▲ Vercel Express 20.168s (+14.1% 🔺) 22.370s (+16.7% 🔺) 2.202s 5 1.02x
▲ Vercel Next.js (Turbopack) 22.466s (+10.1% 🔺) 24.390s (+8.1% 🔺) 1.924s 4 1.13x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 1.175s (-2.7%) 2.009s (~) 0.834s 15 1.00x
🐘 Postgres Next.js (Turbopack) 1.183s (-1.4%) 2.007s (~) 0.824s 15 1.01x
🐘 Postgres Express 1.186s (~) 2.009s (~) 0.822s 15 1.01x
💻 Local Nitro 1.403s (+1.7%) 2.006s (~) 0.603s 15 1.19x
💻 Local Express 1.419s (-0.8%) 2.006s (~) 0.588s 15 1.21x
💻 Local Next.js (Turbopack) 1.445s (-4.9%) 2.073s (+3.3%) 0.628s 15 1.23x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.285s (~) 3.663s (-4.8%) 1.378s 9 1.00x
▲ Vercel Express 2.324s (+5.6% 🔺) 3.776s (+8.0% 🔺) 1.452s 8 1.02x
▲ Vercel Next.js (Turbopack) 4.234s (+32.7% 🔺) 5.679s (+15.9% 🔺) 1.445s 6 1.85x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.324s (~) 3.008s (+3.1%) 1.684s 10 1.00x
🐘 Postgres Nitro 1.330s (~) 2.736s (+14.3% 🔺) 1.407s 11 1.00x
🐘 Postgres Express 1.340s (-1.7%) 2.471s (+6.7% 🔺) 1.131s 13 1.01x
💻 Local Nitro 2.325s (-6.6% 🟢) 2.736s (-6.2% 🟢) 0.411s 11 1.76x
💻 Local Express 2.607s (-4.8%) 3.109s (-3.1%) 0.502s 10 1.97x
💻 Local Next.js (Turbopack) 2.649s (+3.1%) 3.008s (-3.2%) 0.359s 10 2.00x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.571s (+5.1% 🔺) 4.168s (+16.8% 🔺) 1.596s 8 1.00x
▲ Vercel Nitro 2.600s (+3.3%) 3.982s (+4.1%) 1.382s 8 1.01x
▲ Vercel Next.js (Turbopack) 4.399s (+22.9% 🔺) 6.351s (+19.6% 🔺) 1.953s 5 1.71x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 1.618s (+0.9%) 4.137s (~) 2.520s 8 1.00x
🐘 Postgres Express 1.647s (+2.7%) 4.264s (+3.1%) 2.617s 8 1.02x
🐘 Postgres Next.js (Turbopack) 2.731s (-7.8% 🟢) 5.680s (-5.6% 🟢) 2.949s 6 1.69x
💻 Local Nitro 6.380s (+11.2% 🔺) 7.016s (+6.0% 🔺) 0.636s 5 3.94x
💻 Local Next.js (Turbopack) 6.473s (-4.7%) 7.518s (~) 1.044s 4 4.00x
💻 Local Express 6.512s (-8.4% 🟢) 6.816s (-10.5% 🟢) 0.304s 5 4.02x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.488s (+30.1% 🔺) 5.453s (+20.7% 🔺) 1.965s 6 1.00x
▲ Vercel Nitro 3.643s (+35.2% 🔺) 5.222s (+15.5% 🔺) 1.579s 6 1.04x
▲ Vercel Next.js (Turbopack) 5.248s (+29.3% 🔺) 7.118s (+19.3% 🔺) 1.871s 5 1.50x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 1.180s (-1.2%) 2.007s (~) 0.827s 15 1.00x
🐘 Postgres Next.js (Turbopack) 1.182s (-1.9%) 2.007s (~) 0.825s 15 1.00x
🐘 Postgres Express 1.204s (+1.0%) 2.007s (~) 0.803s 15 1.02x
💻 Local Nitro 1.381s (+2.4%) 2.006s (~) 0.625s 15 1.17x
💻 Local Next.js (Turbopack) 1.388s (-6.4% 🟢) 2.006s (~) 0.618s 15 1.18x
💻 Local Express 1.438s (~) 2.007s (~) 0.569s 15 1.22x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.282s (+9.6% 🔺) 3.712s (+11.4% 🔺) 1.430s 9 1.00x
▲ Vercel Nitro 2.283s (+16.3% 🔺) 3.832s (+2.4%) 1.549s 8 1.00x
▲ Vercel Next.js (Turbopack) 3.502s (+13.8% 🔺) 5.104s (+7.1% 🔺) 1.603s 6 1.53x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.306s (-0.7%) 3.009s (~) 1.702s 10 1.00x
🐘 Postgres Express 1.319s (~) 2.676s (+11.7% 🔺) 1.357s 12 1.01x
🐘 Postgres Nitro 1.347s (+3.2%) 2.509s (~) 1.163s 12 1.03x
💻 Local Nitro 2.627s (+6.2% 🔺) 3.008s (~) 0.382s 10 2.01x
💻 Local Next.js (Turbopack) 2.834s (+5.3% 🔺) 3.109s (-7.0% 🟢) 0.275s 10 2.17x
💻 Local Express 2.869s (+13.2% 🔺) 3.345s (+4.2%) 0.476s 9 2.20x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.446s (-8.1% 🟢) 3.813s (-8.5% 🟢) 1.367s 8 1.00x
▲ Vercel Express 2.706s (+12.6% 🔺) 4.363s (+19.5% 🔺) 1.657s 7 1.11x
▲ Vercel Next.js (Turbopack) 4.449s (+32.6% 🔺) 6.167s (+22.9% 🔺) 1.718s 5 1.82x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 1.624s (+1.1%) 4.298s (+3.9%) 2.673s 7 1.00x
🐘 Postgres Express 1.657s (+1.7%) 4.139s (~) 2.483s 8 1.02x
🐘 Postgres Next.js (Turbopack) 2.935s (+14.4% 🔺) 5.849s (-2.8%) 2.914s 6 1.81x
💻 Local Next.js (Turbopack) 5.627s (-21.8% 🟢) 6.217s (-20.0% 🟢) 0.590s 5 3.46x
💻 Local Nitro 6.125s (+0.7%) 6.614s (-2.9%) 0.489s 5 3.77x
💻 Local Express 7.415s (+6.5% 🔺) 8.267s (+6.4% 🔺) 0.851s 4 4.56x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.289s (+8.5% 🔺) 5.175s (+10.6% 🔺) 1.885s 6 1.00x
▲ Vercel Nitro 3.795s (+26.5% 🔺) 5.199s (+5.6% 🔺) 1.404s 6 1.15x
▲ Vercel Next.js (Turbopack) 4.749s (+6.6% 🔺) 6.551s (+6.6% 🔺) 1.802s 5 1.44x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 10 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.550s (~) 1.006s (~) 0.456s 60 1.00x
🐘 Postgres Nitro 0.564s (+4.4%) 1.024s (~) 0.460s 59 1.02x
🐘 Postgres Express 0.565s (+3.1%) 1.007s (-1.6%) 0.442s 60 1.03x
💻 Local Next.js (Turbopack) 0.573s (-8.1% 🟢) 1.005s (~) 0.432s 60 1.04x
💻 Local Nitro 0.593s (+0.6%) 1.022s (~) 0.429s 59 1.08x
💻 Local Express 0.609s (+3.1%) 1.022s (+1.7%) 0.413s 59 1.11x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.039s (+21.0% 🔺) 4.748s (+18.4% 🔺) 1.709s 13 1.00x
▲ Vercel Nitro 3.048s (+14.4% 🔺) 4.745s (+10.9% 🔺) 1.697s 13 1.00x
▲ Vercel Next.js (Turbopack) 4.331s (+11.0% 🔺) 5.980s (+6.6% 🔺) 1.649s 11 1.43x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 25 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 1.326s (+2.4%) 2.007s (~) 0.681s 45 1.00x
🐘 Postgres Next.js (Turbopack) 1.331s (+1.8%) 2.007s (~) 0.676s 45 1.00x
🐘 Postgres Nitro 1.359s (+5.7% 🔺) 2.030s (+1.1%) 0.671s 45 1.02x
💻 Local Nitro 1.430s (~) 2.006s (~) 0.576s 45 1.08x
💻 Local Next.js (Turbopack) 1.447s (-5.9% 🟢) 2.028s (+1.1%) 0.581s 45 1.09x
💻 Local Express 1.506s (+2.4%) 2.007s (~) 0.501s 45 1.14x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 7.052s (+17.1% 🔺) 8.666s (+13.2% 🔺) 1.614s 11 1.00x
▲ Vercel Express 7.495s (+25.6% 🔺) 9.322s (+28.5% 🔺) 1.827s 10 1.06x
▲ Vercel Next.js (Turbopack) 9.462s (+14.5% 🔺) 11.434s (+17.2% 🔺) 1.972s 8 1.34x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 sequential data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 2.629s (+3.5%) 3.059s (~) 0.430s 40 1.00x
🐘 Postgres Express 2.683s (+4.2%) 3.136s (+1.7%) 0.454s 39 1.02x
🐘 Postgres Next.js (Turbopack) 2.753s (+1.8%) 3.137s (+1.7%) 0.384s 39 1.05x
💻 Local Nitro 3.091s (-2.5%) 3.790s (~) 0.699s 32 1.18x
💻 Local Next.js (Turbopack) 3.166s (-7.0% 🟢) 3.736s (-7.6% 🟢) 0.570s 33 1.20x
💻 Local Express 3.304s (+2.8%) 4.009s (~) 0.705s 30 1.26x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 13.208s (+13.2% 🔺) 15.504s (+17.3% 🔺) 2.295s 8 1.00x
▲ Vercel Nitro 13.673s (+13.4% 🔺) 15.711s (+14.3% 🔺) 2.037s 8 1.04x
▲ Vercel Next.js (Turbopack) 19.873s (+21.6% 🔺) 22.031s (+20.5% 🔺) 2.159s 6 1.50x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 10 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.183s (+3.8%) 1.006s (~) 0.824s 60 1.00x
🐘 Postgres Nitro 0.219s (+2.1%) 1.007s (~) 0.787s 60 1.20x
🐘 Postgres Express 0.224s (+4.3%) 1.006s (~) 0.782s 60 1.22x
💻 Local Nitro 0.467s (+5.0% 🔺) 1.004s (~) 0.538s 60 2.56x
💻 Local Express 0.510s (+16.5% 🔺) 1.022s (+1.7%) 0.512s 59 2.80x
💻 Local Next.js (Turbopack) 0.603s (-8.0% 🟢) 1.005s (-1.7%) 0.402s 60 3.30x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 1.158s (+11.5% 🔺) 2.464s (-6.1% 🟢) 1.307s 25 1.00x
▲ Vercel Express 1.188s (+15.8% 🔺) 2.698s (+15.1% 🔺) 1.510s 23 1.03x
▲ Vercel Next.js (Turbopack) 2.555s (+22.4% 🔺) 4.077s (+7.9% 🔺) 1.521s 15 2.21x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 25 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.280s (-0.6%) 1.017s (~) 0.737s 89 1.00x
🐘 Postgres Express 0.324s (-3.0%) 1.006s (~) 0.682s 90 1.16x
🐘 Postgres Nitro 0.335s (+3.8%) 1.018s (+1.2%) 0.683s 89 1.20x
💻 Local Nitro 2.092s (-4.4%) 2.686s (-4.0%) 0.594s 34 7.46x
💻 Local Express 2.252s (+7.0% 🔺) 2.885s (+5.4% 🔺) 0.632s 32 8.03x
💻 Local Next.js (Turbopack) 2.769s (-1.4%) 3.009s (-15.2% 🟢) 0.239s 30 9.88x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 1.787s (+34.3% 🔺) 3.399s (+21.1% 🔺) 1.612s 27 1.00x
▲ Vercel Express 1.883s (+42.6% 🔺) 3.657s (+42.9% 🔺) 1.775s 25 1.05x
▲ Vercel Next.js (Turbopack) 3.443s (+31.4% 🔺) 5.072s (+25.4% 🔺) 1.629s 18 1.93x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 concurrent data payload steps (10KB)

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 0.525s (~) 3.010s (-3.3%) 2.485s 40 1.00x
🐘 Postgres Express 0.536s (+1.1%) 1.097s (+4.5%) 0.561s 110 1.02x
🐘 Postgres Nitro 0.545s (+4.8%) 1.172s (+13.6% 🔺) 0.627s 103 1.04x
💻 Local Next.js (Turbopack) 8.991s (-11.7% 🟢) 10.024s (-10.6% 🟢) 1.033s 13 17.14x
💻 Local Nitro 9.615s (~) 10.776s (~) 1.160s 12 18.33x
💻 Local Express 9.852s (-2.0%) 11.026s (-1.6%) 1.173s 11 18.78x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.672s (+51.8% 🔺) 4.558s (+32.6% 🔺) 1.885s 27 1.00x
▲ Vercel Nitro 2.901s (+43.0% 🔺) 4.593s (+16.8% 🔺) 1.692s 27 1.09x
▲ Vercel Next.js (Turbopack) 4.461s (+13.0% 🔺) 6.508s (+16.2% 🔺) 2.047s 19 1.67x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 1.135s (-2.6%) 1.968s (~) 0.010s (-18.1% 🟢) 2.017s (~) 0.882s 10 1.00x
💻 Local Nitro 1.156s (-0.7%) 2.004s (~) 0.010s (-1.9%) 2.017s (~) 0.861s 10 1.02x
🐘 Postgres Next.js (Turbopack) 1.163s (~) 2.001s (~) 0.001s (-15.4% 🟢) 2.012s (~) 0.848s 10 1.02x
💻 Local Express 1.164s (~) 2.004s (~) 0.013s (+2.4%) 2.020s (~) 0.856s 10 1.03x
🐘 Postgres Nitro 1.164s (-0.6%) 1.998s (~) 0.001s (~) 2.011s (~) 0.847s 10 1.03x
🐘 Postgres Express 1.165s (~) 1.998s (~) 0.001s (-14.3% 🟢) 2.011s (~) 0.846s 10 1.03x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.163s (+6.9% 🔺) 3.035s (-7.4% 🟢) 2.368s (+39.3% 🔺) 5.845s (+6.1% 🔺) 3.683s 10 1.00x
▲ Vercel Express 2.330s (+16.7% 🔺) 3.447s (+16.9% 🔺) 1.887s (-8.9% 🟢) 5.803s (+5.3% 🔺) 3.473s 10 1.08x
▲ Vercel Next.js (Turbopack) 3.783s (+5.4% 🔺) 3.823s (-3.1%) 1.690s (+17.9% 🔺) 7.193s (+3.9%) 3.410s 10 1.75x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

stream pipeline with 5 transform steps (1MB)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Nitro 1.540s (~) 2.010s (~) 0.012s (+2.2%) 2.025s (~) 0.486s 30 1.00x
🐘 Postgres Nitro 1.547s (~) 2.003s (~) 0.005s (+6.8% 🔺) 2.027s (~) 0.480s 30 1.00x
💻 Local Next.js (Turbopack) 1.548s (-6.4% 🟢) 1.970s (-1.6%) 0.012s (-6.6% 🟢) 2.024s (-1.7%) 0.476s 30 1.01x
💻 Local Express 1.562s (~) 2.009s (~) 0.013s (+1.0%) 2.025s (~) 0.464s 30 1.01x
🐘 Postgres Express 1.567s (~) 2.004s (~) 0.005s (+4.1%) 2.025s (~) 0.458s 30 1.02x
🐘 Postgres Next.js (Turbopack) 1.609s (-2.6%) 2.009s (~) 0.005s (-11.9% 🟢) 2.026s (~) 0.417s 30 1.05x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 6.222s (+11.8% 🔺) 7.583s (+10.7% 🔺) 0.254s (+17.2% 🔺) 8.342s (+11.4% 🔺) 2.120s 8 1.00x
▲ Vercel Nitro 6.242s (+10.7% 🔺) 7.372s (+3.6%) 0.205s (-11.6% 🟢) 8.018s (+2.3%) 1.776s 8 1.00x
▲ Vercel Next.js (Turbopack) 10.101s (+9.7% 🔺) 11.356s (+9.6% 🔺) 0.202s (-62.1% 🟢) 12.328s (+3.7%) 2.227s 5 1.62x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

10 parallel streams (1MB each)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 0.763s (-0.5%) 1.064s (+1.8%) 0.000s (-100.0% 🟢) 1.085s (+2.3%) 0.322s 56 1.00x
🐘 Postgres Nitro 0.795s (+3.5%) 1.046s (~) 0.000s (+293.0% 🔺) 1.060s (-1.4%) 0.265s 57 1.04x
🐘 Postgres Next.js (Turbopack) 0.987s (-5.3% 🟢) 1.304s (-15.2% 🟢) 0.000s (+Infinity% 🔺) 1.324s (-14.3% 🟢) 0.337s 46 1.29x
💻 Local Nitro 1.222s (-5.7% 🟢) 2.014s (~) 0.000s (~) 2.017s (~) 0.795s 30 1.60x
💻 Local Express 1.239s (-4.3%) 1.981s (-1.6%) 0.000s (+141.9% 🔺) 1.985s (-1.6%) 0.746s 31 1.62x
💻 Local Next.js (Turbopack) 1.345s (-4.6%) 1.979s (~) 0.000s (-88.5% 🟢) 2.016s (~) 0.672s 30 1.76x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 3.383s (+17.2% 🔺) 4.569s (+9.6% 🔺) 0.011s (+15766.7% 🔺) 5.068s (+10.0% 🔺) 1.686s 12 1.00x
▲ Vercel Nitro 3.517s (+18.2% 🔺) 4.550s (+4.3%) 0.000s (-45.8% 🟢) 5.013s (+3.0%) 1.497s 12 1.04x
▲ Vercel Next.js (Turbopack) 5.058s (+12.8% 🔺) 6.226s (+14.4% 🔺) 0.000s (NaN%) 7.058s (+12.4% 🔺) 2.000s 9 1.50x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

fan-out fan-in 10 streams (1MB each)

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Express 1.778s (+5.3% 🔺) 2.303s (~) 0.000s (~) 2.334s (+0.7%) 0.556s 26 1.00x
🐘 Postgres Nitro 1.814s (+3.3%) 2.416s (+5.5% 🔺) 0.000s (-100.0% 🟢) 2.435s (+5.7% 🔺) 0.621s 26 1.02x
🐘 Postgres Next.js (Turbopack) 2.650s (-8.6% 🟢) 3.101s (-8.5% 🟢) 0.000s (-100.0% 🟢) 3.136s (-7.9% 🟢) 0.486s 20 1.49x
💻 Local Next.js (Turbopack) 3.385s (-7.2% 🟢) 3.927s (-4.8%) 0.002s (+150.0% 🔺) 3.970s (-4.8%) 0.585s 16 1.90x
💻 Local Express 3.457s (-5.8% 🟢) 4.025s (-3.4%) 0.001s (+175.0% 🔺) 4.031s (-3.3%) 0.574s 15 1.94x
💻 Local Nitro 3.506s (-6.1% 🟢) 4.027s (-4.7%) 0.001s (+25.0% 🔺) 4.033s (-4.6%) 0.528s 15 1.97x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 4.742s (+10.1% 🔺) 5.710s (~) 0.000s (NaN%) 6.101s (-1.9%) 1.359s 10 1.00x
▲ Vercel Express 4.756s (+11.2% 🔺) 5.976s (+9.0% 🔺) 0.000s (-100.0% 🟢) 6.460s (+9.0% 🔺) 1.704s 10 1.00x
▲ Vercel Next.js (Turbopack) 7.741s (+20.3% 🔺) 8.594s (+22.1% 🔺) 0.010s (+Infinity% 🔺) 9.865s (+20.8% 🔺) 2.123s 7 1.63x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Nitro 14/21
🐘 Postgres Next.js (Turbopack) 8/21
▲ Vercel Nitro 11/21
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 🐘 Postgres 14/21
Next.js (Turbopack) 🐘 Postgres 14/21
Nitro 🐘 Postgres 15/21
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Redis + BullMQ: Community world (local development)
  • 🌐 Cloudflare: Community world (local development)
  • 🌐 MySQL: Community world (local development)
  • 🌐 Azure: Community world (local development)
  • 🌐 NATS JetStream: Community world (local development)
  • 🌐 Upstash: Community world (local development)
  • 🌐 Platformatic: Community world (local development)

📋 View full workflow run

@VaguelySerious VaguelySerious enabled auto-merge (squash) June 29, 2026 22:52

@karthikscale3 karthikscale3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. What I checked:

  • Fix correctness: executeTool now returns an error-text LanguageModelV3ToolResultPart for unparseable/invalid input — identical in shape to the existing execute()-error catch below it.
  • Bug mechanism: both call sites use Promise.all(toolCalls.map(executeTool)); the old throw rejected the whole batch → outer catch → onError → aborted the durable run. Returning instead lets the loop continue and feed the error back via iterator.next(toolResults), bounded by maxSteps (no infinite-retry risk).
  • Still-throwing paths preserved intentionally: "tool not found" / "no execute function" still throw and abort (unrecoverable, not model mistakes).
  • Helpers: recordSpan accepts a sync fn and passes undefined span when telemetry is off (guarded); getErrorMessage handles any thrown shape.
  • Tests: recovery test asserts execute runs exactly once with the corrected input — proving productive self-correction, not just error feedback.
  • CI: 106 pass / 0 fail, including the full deploy-backed E2E matrix + unit tests (ubuntu & windows).

Non-blocking notes below.

// matching the tool-execution-error path below. Emit an `ai.toolCall` span
// recording the failure so the recovered error stays observable in traces.
const parseErrorMessage = getErrorMessage(parseError);
return recordSpan({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this returns the same error-text tool-result shape as the execute()-error catch below, so Promise.all no longer rejects and the run recovers within maxSteps instead of aborting.

Two non-blocking notes:

  • Telemetry asymmetry: this path sets span status=ERROR + ai.toolCall.error, but the existing execute()-error path returns error-text without marking its span errored. New path is strictly more observable; worth unifying eventually.
  • Behavior change (intentional, matches streamText): callers that relied on the throw / onError for invalid tool input no longer get that signal — the run now succeeds carrying an error-text result.

@VaguelySerious VaguelySerious merged commit 654f959 into main Jun 30, 2026
115 checks passed
@VaguelySerious VaguelySerious deleted the peter/durable-agent-recover-invalid-tool-input branch June 30, 2026 01:45
github-actions Bot added a commit that referenced this pull request Jun 30, 2026
… agent stream (#2699)

In-repo copy of #2192 by @boomyao, opened to run the full CI suite.

Co-authored-by: yao <zhangyaoruo@outlook.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

Backport PR opened against stable: #2703. (backport job run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants