Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433
Open
Jovenkemp wants to merge 1 commit into
Open
Decode piped stdin as UTF-8 and strip any BOM (PowerShell 5.1 compat)#433Jovenkemp wants to merge 1 commit into
Jovenkemp wants to merge 1 commit into
Conversation
Windows PowerShell 5.1 commonly prepends a UTF-8 BOM when piping text to a native command (its UTF8 $OutputEncoding emits a preamble, and files written by PS 5.1 carry BOMs that survive re-piping). sys.stdin.read() leaves that BOM in the source -- decoded as U+FEFF, or as mojibake under the locale code page -- so exec() fails with a SyntaxError on the first line of every piped script. Read raw bytes and decode with utf-8-sig instead: it strips one leading UTF-8 BOM when present and is byte-for-byte identical to utf-8 otherwise, so piped input from bash/cmd decodes exactly as before. Newline handling is unchanged too: compile() normalizes \r\n itself, including inside string literals. Side benefit: non-ASCII source now decodes as UTF-8 on Windows rather than the locale code page. UTF-16 stdin stays unsupported, matching CPython, which rejects UTF-16 source files outright; it now fails loudly at decode time instead of producing mojibake. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Windows PowerShell 5.1, piped text often arrives with a UTF-8 BOM (
EF BB BF): the stock[Text.Encoding]::UTF8$OutputEncodingemits its preamble when piping to native commands, and files written by PS 5.1 (Out-File -Encoding utf8) carry BOMs that survive re-piping.sys.stdin.read()keeps that BOM in the decoded source —U+FEFF, orwhen decoded via the locale code page — soexec()fails on line 1 of every piped script:Since piping code into
browser-harnessis the documented (and only) usage mode, this breaks the harness entirely for PowerShell 5.1 callers.Fix
Read stdin as raw bytes and decode with
utf-8-sig, which strips one leading UTF-8 BOM when present and is byte-for-byte identical toutf-8otherwise.Compatibility notes (each verified against CPython)
utf-8-sigdecodes exactly likeutf-8.\r\nto\n;compile()/exec()normalize newlines themselves, including inside string literals (exec('s="""a\r\nb"""')yields'a\nb'), so CRLF input behaves identically.SyntaxError: Non-UTF-8 code starting with '\xff' ... see PEP 263), so the harness keeps the same contract for piped source. A UTF-16 pipe now fails loudly at decode time (UnicodeDecodeError, orValueError: source code string cannot contain null bytes) instead of producing mojibake.This mirrors the existing UTF-8 normalization the file already does for stdout (run.py lines 3–8).
🤖 Generated with Claude Code
Summary by cubic
Fix stdin decoding so piped scripts work on Windows PowerShell 5.1. We now read bytes and decode as UTF‑8 while stripping a BOM, preventing the first-line SyntaxError when piping into
browser-harness.sys.stdin.bufferand decode withutf-8-sigto drop a leading BOM from PowerShell 5.1 pipes.Written for commit 29655f1. Summary will update on new commits.