Skip to content

fix(pi): narrow converter guards — trailing assistant, tool/image/surrogate, signed-thinking#32

Open
iceteaSA wants to merge 2 commits into
cortexkit:mainfrom
iceteaSA:fix/pi-improvements
Open

fix(pi): narrow converter guards — trailing assistant, tool/image/surrogate, signed-thinking#32
iceteaSA wants to merge 2 commits into
cortexkit:mainfrom
iceteaSA:fix/pi-improvements

Conversation

@iceteaSA
Copy link
Copy Markdown
Contributor

@iceteaSA iceteaSA commented May 21, 2026

Narrow, Pi-owned converter hardening for packages/pi/src/convert.ts, beyond what was merged in #17. All changes are scoped to concrete invalid shapes the Pi converter can produce — no broad payload "healing".

Commit 1: strip trailing assistant messages, improve sanitize and error guards

  • Strip trailing assistant messages from converted requests to prevent Anthropic prefill 400s on models that reject assistant-message prefill.
  • Unicode regex (/gu flag) for spec-compliant surrogate handling — valid surrogate pairs (emoji / astral characters) are preserved; only lone surrogates are replaced with \uFFFD.
  • Simplified sanitizeToolId with a cleaner length check (removes a redundant guard).
  • Error-content guard injects an Error placeholder for is_error=true tool results with empty content (!content, or an empty array).
  • Comprehensive test coverage, including a regression test asserting that valid surrogate pairs survive sanitisation.

Commit 2: preserve signed thinking verbatim, guard lone surrogates

  • Send signed thinking blocks byte-identical. The signature is computed over the original text, so sanitizing would alter it and Anthropic rejects the block as modified. Previously the converter sanitized signed thinking unconditionally; now it preserves it verbatim (the normal path for Anthropic-origin reasoning).
  • Lone-surrogate exception only. A signed block containing a lone (unpaired) UTF-16 surrogate can't keep its signature — sanitizing breaks it, and the raw surrogate is invalid UTF-8 (400). hasLoneSurrogate() detects exactly that case, and only then is the signature dropped and the text sanitized.
  • Tests cover verbatim signed-thinking preservation and the lone-surrogate downgrade.

Scope note (per review)

This PR intentionally does not include the broad "downgrade all historical signed thinking to text" healing. That approach can avoid 400s but discards valid signed/redacted reasoning when the request shape is otherwise correct. The signed-thinking reordering problem is better fixed at the OpenCode layer in anomalyco/opencode#30182. For Pi, the converter should avoid broad destructive healing unless a concrete Pi-generated invalid shape requires it — none currently does, so it is omitted here.

Greptile Summary

This PR hardens the Pi converter (packages/pi/src/convert.ts) with four targeted fixes: stripping trailing assistant messages to avoid Anthropic prefill 400s, preserving signed thinking blocks verbatim (previously they were sanitized, which broke the signature), downgrading signed blocks that contain lone surrogates, and tightening the empty-content guard for is_error=true tool results.

  • Signed-thinking verbatim pass-through (convert.ts lines 149-168): the old code called sanitize() on signed thinking text, which mutated the content and invalidated the signature. The new code sends the text byte-for-byte identical, and only falls back to sanitized text when hasLoneSurrogate() detects an unpaired surrogate that would cause a UTF-8 400 anyway.
  • Trailing-assistant stripping (buildAnthropicRequest, lines 306-312): a while loop removes any assistant messages left at the tail of the converted array before the request is sent, covering models that reject assistant-role prefill.
  • Test suite added (convert.test.ts, 428 lines): covers all new guards including surrogate sanitisation, signed-thinking preservation, error tool-result placeholder injection, and the trailing-assistant strip.

Confidence Score: 5/5

Safe to merge — all changes are narrowly scoped to concrete invalid shapes the Pi converter produces, and the new test suite exercises every guard added.

The signed-thinking verbatim pass-through correctly identifies that sanitizing a signed block breaks its signature, and the lone-surrogate fallback covers the only edge case where verbatim sending would itself cause a 400. The trailing-assistant strip is a simple while-loop applied after conversion, which cannot affect any mid-conversation message. The error-content guard correctly handles the full string | Array union returned by convertTextAndImages. Test coverage is thorough and the emoji/astral-character preservation regression test is present.

No files require special attention.

Important Files Changed

Filename Overview
packages/pi/src/convert.ts Core converter hardening: verbatim signed-thinking pass-through, lone-surrogate detection, trailing-assistant strip, and simplified empty-error-content guard — all changes are logically correct and well-commented.
packages/pi/src/tests/convert.test.ts New test file with comprehensive coverage of all changed behaviours; assertions are correct and the emoji/astral-character preservation case is included (lines 219-227).
packages/pi/package.json Adds "test": "bun test src/tests" script; no other changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[thinking block] --> B{thinkingSignature set?}
    B -- No --> E[sanitize text → emit as text block]
    B -- Yes --> C{hasLoneSurrogate?}
    C -- No --> D[emit verbatim thinking block\nwith original text + signature]
    C -- Yes --> F[drop signature\nsanitize text → emit as text block]

    G[tool_result content] --> H{isError = true?}
    H -- No --> I[emit content as-is]
    H -- Yes --> J{content empty/blank?}
    J -- No --> I
    J -- Yes --> K[inject placeholder: Error]

    L[converted messages array] --> M{trailing message role = assistant?}
    M -- Yes --> N[pop and repeat]
    M -- No --> O[send to Anthropic]
Loading

Reviews (2): Last reviewed commit: "fix(pi): preserve signed thinking verbat..." | Re-trigger Greptile

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/pi/src/tests/convert.test.ts
@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch from b659ab4 to 768e897 Compare May 22, 2026 17:08
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more

Re-trigger cubic

@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch from 768e897 to 432f588 Compare May 29, 2026 17:28
@ualtinok
Copy link
Copy Markdown
Contributor

ualtinok commented Jun 1, 2026

Reviewed this in context with OpenCode PR anomalyco/opencode#30182 and the related OpenCode-side healing PRs here (#49, #51).

My read:

  • The Pi conversion tests and the narrower conversion hardening are useful: trailing assistant stripping, empty is_error tool result handling, tool ID/image/surrogate guards are all reasonable Pi-owned converter fixes.
  • The historical signed-thinking downgrade should not be merged as-is. It is the same broad “heal the final payload by converting signed thinking to text” approach as fix(opencode): repair thinking blocks rejected by Anthropic API #49. That can avoid Anthropic 400s, but it also discards valid signed/redacted thinking that should be preserved when the converter/request shape is correct.
  • For OpenCode, PR #30182 is the better layer for the signed-thinking issue because it prevents OpenCode’s Anthropic reorder from moving signed/redacted reasoning in the first place. For Pi, OpenCode cannot fix the converter, but the Pi converter should still avoid broad destructive healing unless a concrete Pi-generated invalid shape requires it.

Recommendation: split this PR. Keep the tests and the narrow Pi conversion guards, including trailing assistant stripping. Drop or separately rework the historical signed-thinking downgrade into a much narrower fix backed by a concrete Pi repro.

@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch 2 times, most recently from c3d0d62 to 0003da7 Compare June 1, 2026 08:03
@iceteaSA iceteaSA changed the title fix(pi): strip trailing assistant messages, improve sanitize and error guards fix(pi): narrow converter guards — trailing assistant, tool/image/surrogate, signed-thinking Jun 1, 2026
@iceteaSA
Copy link
Copy Markdown
Contributor Author

iceteaSA commented Jun 1, 2026

Recommendation: split this PR. Keep the tests and the narrow Pi conversion guards, including trailing assistant stripping. Drop or separately rework the historical signed-thinking downgrade into a much narrower fix backed by a concrete Pi repro.

Dropped it

@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch from 0003da7 to ae0626e Compare June 1, 2026 14:51
…r guards

- Strip trailing assistant-role messages from converted requests to prevent
  Anthropic prefill 400 errors on some models
- Add Unicode flag to surrogate regex for spec compliance
- Simplify sanitizeToolId with cleaner length check
- Improve error content guard to catch empty string case
- Add comprehensive test coverage for all pi conversion logic
@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch from ae0626e to dbf4f8e Compare June 3, 2026 18:17
Comment thread packages/pi/src/convert.ts
Comment thread packages/pi/src/tests/convert.test.ts
Signed thinking blocks must be sent back byte-identical — the signature is
computed over the original text, so sanitizing would alter it and Anthropic
rejects the block as modified. Send signed thinking verbatim instead of
sanitizing it.

The one exception is a signed block containing a lone (unpaired) UTF-16
surrogate: the signature cannot survive sanitization and the raw surrogate is
invalid UTF-8 (400). Detect lone surrogates with hasLoneSurrogate() and, only
in that concrete case, drop the signature and downgrade to sanitized text.

No broad historical-thinking healing — valid signed/redacted thinking is
preserved. (Signed-thinking reordering is handled OpenCode-side in #30182.)
@iceteaSA iceteaSA force-pushed the fix/pi-improvements branch from dbf4f8e to 3bcd3e9 Compare June 3, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants