Skip to content

fix(e2e): allow dynamic multi-turn tool retries#173

Open
BlackishGreen33 wants to merge 1 commit into
vercel:mainfrom
BlackishGreen33:bg/fix-dynamic-eval-retry-122
Open

fix(e2e): allow dynamic multi-turn tool retries#173
BlackishGreen33 wants to merge 1 commit into
vercel:mainfrom
BlackishGreen33:bg/fix-dynamic-eval-retry-122

Conversation

@BlackishGreen33

@BlackishGreen33 BlackishGreen33 commented Jun 22, 2026

Copy link
Copy Markdown

What

The dynamic-tools/multi-turn eval no longer requires exactly two matching echo_dynamic tool calls across the whole run.

The eval already checks the important behavior inside each turn:

  • turn one calls echo_dynamic and gets dynamic-echo-ok-X7R2
  • turn two calls echo_dynamic after session serialization and gets the same token

This fixes a valid retry case where the model calls the turn two tool more than once. In that case, the dynamic tool is still working, but the old run-wide times: 2 assertion fails because it sees three successful calls.

Closes #122.

How

Removed the final run-wide t.calledTool(ECHO_TOOL, { ..., times: 2 }) assertion from dynamic-tools/multi-turn.

The per-turn requireToolOutput(...) checks stay in place, so the eval still verifies that the dynamic tool survives across turns and returns the expected token each time.

Tests / changeset

  • No changeset needed: this only changes an e2e fixture assertion, not the published eve package behavior.
  • No runtime or assertion API changes.

Verification

  • fnm exec --using v24.15.0 pnpm typecheck
  • fnm exec --using v24.15.0 pnpm fmt
  • fnm exec --using v24.15.0 pnpm lint
  • fnm exec --using v24.15.0 pnpm build
  • git diff --check

Signed-off-by: 墨綠BG <s5460703@gmail.com>
Copilot AI review requested due to automatic review settings June 22, 2026 07:46
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

@BlackishGreen33 is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] E2E dynamic multi-turn eval fails on valid tool retries

2 participants