Skip to content

Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433

Open
Jovenkemp wants to merge 1 commit into
browser-use:mainfrom
Jovenkemp:fix/stdin-bom
Open

Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433
Jovenkemp wants to merge 1 commit into
browser-use:mainfrom
Jovenkemp:fix/stdin-bom

Conversation

@Jovenkemp

@Jovenkemp Jovenkemp commented Jun 12, 2026

Copy link
Copy Markdown

Problem

On Windows PowerShell 5.1, piped text often arrives with a UTF-8 BOM (EF BB BF): the stock [Text.Encoding]::UTF8 $OutputEncoding emits its preamble when piping to native commands, and files written by PS 5.1 (Out-File -Encoding utf8) carry BOMs that survive re-piping. sys.stdin.read() keeps that BOM in the decoded source — U+FEFF, or  when decoded via the locale code page — so exec() fails on line 1 of every piped script:

SyntaxError: invalid non-printable character U+FEFF (<string>, line 1)

Since piping code into browser-harness is the documented (and only) usage mode, this breaks the harness entirely for PowerShell 5.1 callers.

Fix

Read stdin as raw bytes and decode with utf-8-sig, which strips one leading UTF-8 BOM when present and is byte-for-byte identical to utf-8 otherwise.

Compatibility notes (each verified against CPython)

  • bash / cmd / heredoc input is unchanged. With no BOM present, utf-8-sig decodes exactly like utf-8.
  • Newline translation is not lost. Text-mode stdin used to translate \r\n to \n; compile()/exec() normalize newlines themselves, including inside string literals (exec('s="""a\r\nb"""') yields 'a\nb'), so CRLF input behaves identically.
  • UTF-16 stdin: deliberately out of scope. CPython rejects UTF-16 source files (SyntaxError: Non-UTF-8 code starting with '\xff' ... see PEP 263), so the harness keeps the same contract for piped source. A UTF-16 pipe now fails loudly at decode time (UnicodeDecodeError, or ValueError: source code string cannot contain null bytes) instead of producing mojibake.
  • Side benefit: non-ASCII piped source now decodes as UTF-8 on Windows instead of the locale code page (cp1252), matching how Python defines source encoding (PEP 3120).

This mirrors the existing UTF-8 normalization the file already does for stdout (run.py lines 3–8).

🤖 Generated with Claude Code


Summary by cubic

Fix stdin decoding so piped scripts work on Windows PowerShell 5.1. We now read bytes and decode as UTF‑8 while stripping a BOM, preventing the first-line SyntaxError when piping into browser-harness.

  • Bug Fixes
    • Read from sys.stdin.buffer and decode with utf-8-sig to drop a leading BOM from PowerShell 5.1 pipes.
    • No change for non-BOM input; newline handling unchanged. UTF‑16 stdin remains unsupported and now fails early.

Written for commit 29655f1. Summary will update on new commits.

Review in cubic

Windows PowerShell 5.1 commonly prepends a UTF-8 BOM when piping text
to a native command (its UTF8 $OutputEncoding emits a preamble, and
files written by PS 5.1 carry BOMs that survive re-piping).
sys.stdin.read() leaves that BOM in the source -- decoded as U+FEFF,
or as mojibake under the locale code page -- so exec() fails with a
SyntaxError on the first line of every piped script.

Read raw bytes and decode with utf-8-sig instead: it strips one
leading UTF-8 BOM when present and is byte-for-byte identical to
utf-8 otherwise, so piped input from bash/cmd decodes exactly as
before. Newline handling is unchanged too: compile() normalizes
\r\n itself, including inside string literals. Side benefit:
non-ASCII source now decodes as UTF-8 on Windows rather than the
locale code page.

UTF-16 stdin stays unsupported, matching CPython, which rejects
UTF-16 source files outright; it now fails loudly at decode time
instead of producing mojibake.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant