macOS: make Ime::Preedit cursor range surrogate-safe in setMarkedText by William-Selna · Pull Request #4596 · rust-windowing/winit

William-Selna · 2026-06-12T21:09:27Z

The macOS backend turns the IME's UTF-16 selectedRange into the UTF-8 byte offsets in Ime::Preedit by slicing substringToIndex: prefixes and measuring them with NSString::len() (lengthOfBytesUsingEncoding:NSUTF8StringEncoding). When an offset lands inside a surrogate pair the prefix ends in a lone high surrogate, which UTF-8 can't represent, and lengthOfBytesUsingEncoding: returns 0 for the whole prefix instead of the convertible part — so the offset collapses to 0.

For preedit "a😀b":

selectedRange {2, 0} emits Some((0, 0)) instead of (1, 1) — the cursor jumps to the start and the valid "a" prefix is dropped.
selectedRange {1, 1} emits Some((1, 0)), i.e. lower > upper. Highlighting the active clause with &preedit[range.0..range.1]
then panics: byte index starts at 1 but ends at 0.

This is the macOS counterpart of #3967 which was fixed for Windows in #4201; the macOS backend has its own copy of the conversion that never got the same fix.

The fix drops the substringToIndex: calls and maps the UTF-16 offset to a UTF-8 offset over the already-decoded Rust string, snapping a mid-surrogate offset down to the char boundary and clamping out-of-range offsets to s.len():

fn utf16_to_utf8_offset(s: &str, utf16_offset: usize) -> usize {
    let mut utf16_pos = 0;
    for (utf8_pos, ch) in s.char_indices() {
        if utf16_pos >= utf16_offset {
            return utf8_pos;
        }
        utf16_pos += ch.len_utf16();
        if utf16_pos > utf16_offset {
            return utf8_pos; // offset splits this char's surrogate pair
        }
    }
    s.len()
}

It's monotone and location <= end for an NSRange, so the range can't come out inverted. On well-formed input it's byte-for-byte identical to the old code. And since it no longer calls any NSString range API, it also covers the out-of-bounds case the .min(len) clamp in rust-windowing/winit#4494 was added for (no NSRangeException possible) and drops two ObjC allocations per preedit update.

Malformed offsets here aren't hypothetical: rust-windowing/winit#4494 added that clamp because the native Pinyin IME sends out-of-range indices (alacritty/alacritty#8791), and rust-windowing/winit#3967 came from a non-BMP preedit (a 🐖 from the Microsoft Japanese IME) corrupting the range — same shape, one code unit over, lands mid-pair on macOS.

Tests are a #[cfg(test)] mod tests of pure functions (no AppKit, so they run on any host): the mid-surrogate snap, the previously-inverted and previously-collapsed cases, the out-of-bounds clamp, identity on well-formed input, and a monotonicity sweep. cargo build / cargo clippy -p winit-appkit are clean on macOS.

…ed_text `setMarkedText:selectedRange:replacementRange:` converted the IME's UTF-16 `selectedRange` into UTF-8 byte offsets by taking `substringToIndex:` prefixes and measuring them with `NSString::len()` (`lengthOfBytesUsingEncoding:NSUTF8StringEncoding`). When an index falls inside a surrogate pair, the prefix ends in a lone high surrogate, which UTF-8 cannot represent, so `lengthOfBytesUsingEncoding:` returns 0 for the entire prefix and the offset silently collapses to 0. For preedit "a😀b": - `selectedRange {2,0}` emitted `Some((0, 0))` instead of `Some((1, 1))`, discarding the valid "a" prefix (cursor jumps to the start). - `selectedRange {1,1}` emitted `Some((1, 0))` — an inverted range with `lower > upper`, which panics the natural consumer pattern `&preedit[range.0..range.1]` used to highlight the selected clause. Replace the `substringToIndex:`/`len()` conversion with a pure, total helper `utf16_to_utf8_offset` that walks the already-materialized Rust `String`, snapping an offset that would split a surrogate pair down to the character boundary and clamping an out-of-bounds offset to the string length. The helper is monotone non-decreasing, so the emitted range can never be inverted; the happy path (well-formed boundary indices) is byte-for-byte unchanged. Because no `NSString` range API is called anymore, this also subsumes the out-of-bounds `.min(len)` clamp added in rust-windowing#4494 (no `NSRangeException` is possible) and drops two Objective-C string allocations per preedit update. This is the macOS twin of the Windows bug reported in rust-windowing#3967 and fixed by rust-windowing#4201; the macOS backend has its own independent copy of the conversion that was never given the equivalent fix. Adds pure unit tests (no AppKit) covering the mid-surrogate snap, the previously-inverted and previously-collapsed cases, the out-of-bounds clamp, identity on well-formed inputs, and a monotonicity sweep.

William-Selna requested a review from madsmtm as a code owner June 12, 2026 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS: make Ime::Preedit cursor range surrogate-safe in setMarkedText#4596

macOS: make Ime::Preedit cursor range surrogate-safe in setMarkedText#4596
William-Selna wants to merge 1 commit into
rust-windowing:masterfrom
William-Selna:fix/macos-ime-preedit-surrogate-cursor

William-Selna commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

William-Selna commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

William-Selna commented Jun 12, 2026 •

edited

Loading