Improvements to the safety of boa_string#5385
Merged
Merged
Conversation
68f925f to
58ba1c2
Compare
Test262 conformance changes
Fixed tests (1):Tested main commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5385 +/- ##
===========================================
+ Coverage 47.24% 60.21% +12.97%
===========================================
Files 476 566 +90
Lines 46892 63023 +16131
===========================================
+ Hits 22154 37951 +15797
- Misses 24738 25072 +334 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
Author
|
Idk why the test is failing, could that be intermittent somehow? |
hansl
approved these changes
May 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I had it on my todo list to unsafe review
boa_string. I didn't get around to it: it's a lot of unsafe code, but I did spend some time asking an appropriately primed agent to perform a review, and it had some findings.It found some issues around ascii-checking.
Full report:
Details
Unsafe Rust Review
This review performs a first pass safety audit of the
unsafecode inboa_stringv0.21.1), applying the Rust Safety Comments guidelines.Executive Summary
The codebase generally follows good safety practices, including many explicit
// SAFETY:comments. However, we identified:Segment::is_asciiwhich affects safe builder behavior (confusing behavior and missed optimizations, though technically sound).JsStringBuilder::<u16>::is_asciiwhich returns incorrect results for non-ASCII UTF-16 characters.unwrap_uncheckedin layout math, redundant-lookingtransmute).Detailed Findings by File
1.
third_party/rust/boa_string/v0_21/src/builder.rs🔴 Inconsistent
Segment::is_ascii(Logical Inconsistency)Location: builder.rs:L616-624
Code:
Analysis:
This method is highly inconsistent:
StringandStrsegments, it checksis_latin1(), which returnstruefor any Latin1 string (values0..=255), including non-ASCII characters (e.g.,é/0xEB).Latin1(u8) andCodePoint(char) segments, it strictly checks for ASCII (<= 0x7F).This causes inconsistent behavior in safe
CommonJsStringBuilder::build():Segment::Stringcontaining non-ASCII Latin1,is_ascii()returnstrue, and it builds viabuild_as_latin1.Segment::Latin1(0xEB)(same character),is_ascii()returnsfalse, forcing a fallback toUTF-16encoding.Safety Impact Evaluation:
While this method acts as a guard for the
unsafefunctionbuild_as_latin1(), this inconsistency does not lead to unsoundness (UB):String/Str) is safe because these segments are already Latin1 and fit perfectly intou8without loss of representation or truncation.Latin1/CodePointto ASCII) are overly conservative but safe. They merely cause a safe fallback to UTF-16.Recommendation:
Rename
is_asciitocan_be_latin1(or similar) and make the checks consistent:If the intent was strictly ASCII, then
StringandStrsegments must be scanned to ensure they contain no characters> 0x7F.🔴 Severe Bug in
JsStringBuilder::<u16>::is_ascii(Potential Unsoundness)Location: builder.rs:L309-317
Code:
Analysis:
When
D = u16(i.e.,Utf16JsStringBuilder), this function casts the raw*mut u16data pointer to*const u8and checks if the raw bytes are ASCII.This is broken on Little Endian systems for certain non-ASCII UTF-16 characters:
0x0100(Ā) is represented in memory as bytes[0x00, 0x01].0x00and0x01are< 128,data.is_ascii()will returntrue.Safety Impact Evaluation:
Currently, this method is only called on
Latin1JsStringBuilder(JsStringBuilder<u8>), where it behaves correctly. However, becauseJsStringBuilderis generic and this method ispub, this is a critical latent bug. If future code or external users rely onis_ascii()to guard unsafe operations onUtf16JsStringBuilder(e.g., assuming it can be safely downgraded to Latin1/u8), it would lead to silent data corruption or UB.Recommendation:
Move
is_asciito a specializedimpl JsStringBuilder<u8>block so it cannot be called on theu16variant, or implement it correctly foru16(e.g., checking if allu16elements are<= 0x7F).🟡 Missing Safety Justification for
unwrap_uncheckedunwrap_uncheckedis used onLayout::arrayandLayout::extend. This is unsafe if they returnErr(which happens on overflow).self.inner). Since it was successfully allocated previously (vianew_layout), we know the capacity and type size do not overflow.// SAFETY:comment explaining this invariant.🟢
extend_from_slice_uncheckedPointers copyThe safety comment is correct but could be more thorough by addressing the other preconditions of
copy_nonoverlapping(alignment, overlap):2.
third_party/rust/boa_string/v0_21/src/common.rs🟡 Redundant/Confusing
transmuteinget_stringRAW_STATICS_CACHEis defined asLazyLock<FxHashSet<&'static JsStr<'static>>>.RAW_STATICS_CACHE.getshould returnOption<&&'static JsStr<'static>>.*) should yield&'static JsStr<'static>.stris already&'static JsStr<'static>, the transmute from&JsStr<'_>to&'static JsStr<'static>is redundant and confusing.transmuteand see if it compiles. It might require binding the cache deref to a static lifetime variable first:transmuteis indeed required, clarify the safety comment:3.
third_party/rust/boa_string/v0_21/src/str.rsJsStrusesstd::slice::from_raw_partsto reconstruct slices from raw pointers invariant()andas_latin1().is_latin1tag and the fact thatJsStris constructed from valid Rust slices with lifetime'a(which is preserved inJsStr<'a>).SyncandSendimplementations are correctly documented and sound (the type is immutable).