Skip to content

Add UTF-16 Strings#34

Open
JoltedJon wants to merge 7 commits into
Redot-Engine:masterfrom
JoltedJon:strings
Open

Add UTF-16 Strings#34
JoltedJon wants to merge 7 commits into
Redot-Engine:masterfrom
JoltedJon:strings

Conversation

@JoltedJon

@JoltedJon JoltedJon commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Add UTF-16 Strings

Summary by CodeRabbit

  • New Features
    • Added UTF-16 string type with comparison, searching, and substring operations
    • Added string builder for efficient string construction with integer formatting support
    • Added string iterator for character-by-character traversal
    • Added math utility functions (min/max operations)
    • Added integral type concept for enhanced type safety

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Unicode primitive type aliases (utf8, utf16, utf32, rune) to core.stdtypes, an integral concept to core.defs, and min/max templates to core.math. Introduces a new core.containers.string module implementing String (immutable UTF-16 view), StringIterator (surrogate-aware decoder), and StringBuilder (allocator-backed growable buffer). Registers the containers CMake library target.

Changes

UTF-16 String Module

Layer / File(s) Summary
Unicode type aliases, integral concept, and min/max utilities
Engine/cpp/runtime/core/definitions/stdtypes.cppm, Engine/cpp/runtime/core/definitions/definitions.cppm, Engine/cpp/runtime/core/math/functions.cppm
Adds utf8/utf16/utf32 aliases, changes rune from u32 to char32_t, introduces the draco::integral concept via std::is_integral_v<T>, and adds constexpr min/max templates constrained by arithmetic.
String, StringIterator, and StringBuilder declarations
Engine/cpp/runtime/core/containers/string.cppm (lines 1–135)
Exports the core.containers.string module declaring the immutable UTF-16 String view with comparison/substring/search API, the EMPTY_STRING constant, StringIterator with Advance, and StringBuilder with its Error enum, Write overloads, and WriteInt template signature.
StringLength, String algorithms, StringIterator, and WriteInt
Engine/cpp/runtime/core/containers/string.cppm (lines 137–394)
Implements StringLength via alignment-aware 64-bit chunk scanning; StringCompare with consteval element-wise and runtime memcmp paths; Substr, StartsWith, EndsWith, FindFirst, FindLast; StringIterator::Advance with surrogate-pair validation; and WriteInt for base-2/10/16 UTF-16 integer formatting.
StringBuilder capacity management and Write overloads
Engine/cpp/runtime/core/containers/string.cpp
Implements Reserve (allocate/copy/free UTF-16 buffer), GrowCapacity (half-growth heuristic), and four Write overloads covering single code units, rune to UTF-16 surrogate-pair encoding, null-terminated strings, and String objects.
CMake wiring and containers umbrella module
Engine/cpp/runtime/core/CMakeLists.txt, Engine/cpp/runtime/core/containers/containers.cppm
Registers the containers library target with dependencies on definitions, memory, and math; declares the core.containers umbrella module that re-exports core.containers.string.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Redot-Engine/DraconicEngine#23: Directly modifies core.stdtypes including the draco::rune alias and common type aliases that this PR also extends with utf8/utf16/utf32.

Suggested reviewers

  • OldDev78
  • VictorSohier
  • mcdubhghlas

Poem

🐇 Hippity-hop through UTF-16 land,
A StringBuilder grows right on demand!
Surrogate pairs? No problem here,
min, max, and integral all appear.
The bunny writes runes with a cheerful tap —
core.containers.string fills the gap! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add UTF-16 Strings' accurately and concisely describes the main change: introducing UTF-16 string container types and utilities across multiple new files and modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 503-507: The WriteInt function has two issues: (1) negating
minimum signed integer values like INT_MIN causes undefined behavior, and (2)
GrowCapacity return values are not checked for failures, potentially leading to
out-of-bounds writes. In the signed value negation block (checking if val < 0),
use a safer approach such as converting to an unsigned type, using absolute
value safely, or checking for the minimum value before negating. Additionally,
wherever GrowCapacity is called within WriteInt, verify its return value
indicates success before continuing with the write operation; if allocation
fails, return an appropriate error code or handle the failure gracefully to
prevent buffer overruns.
- Around line 351-368: The `StringBuilder::Create` method has a stack-allocated
`fixed_allocator` variable that is used to initialize the `allocator` member,
but the allocator stores a pointer to this local variable which becomes invalid
after the function returns. To fix this, the `fixed_allocator` must be stored as
a member of the `StringBuilder` struct itself to ensure it persists for the
lifetime of the object, allowing subsequent growth and free operations to safely
access it without dereferencing dangling memory.
- Around line 437-446: The StringBuilder::Write methods call GrowCapacity with
only the incoming string length, not accounting for the existing buffer size.
Change the GrowCapacity calls from GrowCapacity(length) to GrowCapacity(size +
length) in both Write overloads (the utf16 version and the other overload
mentioned as also applying to lines 449-457) to ensure the total required
capacity is correctly calculated when appending to a non-empty builder.
- Around line 228-233: The Substr method incorrectly rejects valid substring
requests by returning EMPTY_STRING whenever the requested range touches or
exceeds the string boundary. Instead of returning EMPTY_STRING when pos + len >=
size, check only if pos >= size (completely out of bounds). When pos is valid
but the requested length extends beyond the string end, truncate len to size -
pos so the method returns the available substring. This truncation contract fix
will also resolve the false negatives in StartsWith and EndsWith methods that
depend on Substr.
- Around line 320-324: The `StringBuilder::Create` method has a unit mismatch
where the `capacity` parameter represents UTF-16 elements but is passed directly
to `allocator.alloc` as a byte size without conversion. Multiply `capacity` by
the size of a single UTF-16 element (sizeof(utf16)) when calling
`allocator.alloc` to correctly convert from element count to byte count. This
ensures the allocated storage is large enough to hold the requested number of
UTF-16 characters without under-allocating and risking buffer overflow.
- Around line 371-385: The Reserve method in StringBuilder has two issues: the
memcpy call copies size elements unconditionally even when newCapacity is
smaller, risking buffer overflow, and the oldDst.size field is set as an element
count rather than bytes, causing the allocator.free call to receive incorrect
size units. Fix the memcpy to copy only the minimum of size and newCapacity
elements (to avoid overflow), and set oldDst.size to capacity * sizeof(utf16)
instead of just capacity to ensure the allocator.free call receives the size in
bytes as expected.
- Around line 182-205: The StringCompare method has two critical bugs: the
min_size variable is initialized using itself as arguments to draco::math::min
instead of using the actual size members of both strings, which causes undefined
behavior; additionally, the null-pointer checks don't handle the case where both
string texts are null, causing two empty strings to incorrectly compare as
unequal. Fix the min_size initialization by comparing the size member of the
current string with the size member of the parameter string s, and add a check
at the beginning of the method to return 0 (equal) when both text pointers are
null, ensuring that EMPTY_STRING == EMPTY_STRING evaluates correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9a636d9-6f20-4bfc-a1a3-f770f0cfaa40

📥 Commits

Reviewing files that changed from the base of the PR and between 28e62ca and 1d6442e.

📒 Files selected for processing (6)
  • engine/native/core/CMakeLists.txt
  • engine/native/core/containers/containers.cppm
  • engine/native/core/containers/string.cppm
  • engine/native/core/definitions/definitions.cppm
  • engine/native/core/definitions/stdtypes.cppm
  • engine/native/core/math/functions.cppm

Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread engine/native/core/containers/string.cppm Outdated
Comment thread Engine/cpp/runtime/core/containers/string.cppm
@VictorSohier

Copy link
Copy Markdown
Contributor

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

@JoltedJon

Copy link
Copy Markdown
Contributor Author

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

The constexpr items and templates need to be in the module, I'll see what @OldDev78 says about the .cpp implementation to see if it would make sense to put the very few StringBuilder functions into a .cpp file

@OldDev78

OldDev78 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

The one structural thing that was made obvious to me when I was making my PR for allocators was that @OldDev78 wants as much implementation to be in .cpp files and there are some functions that are amenable to having their implementation in .cpp files.

The constexpr items and templates need to be in the module, I'll see what @OldDev78 says about the .cpp implementation to see if it would make sense to put the very few StringBuilder functions into a .cpp file

If a function serves best as template and/or can be effectively constrexpr (can be constant evaluated), then it's okay to be implemented in the .cppm. Otherwise, the implementation should just be separated into the .cpp. Take note that my biggest concern is to expose dependency on headers as little as possible (so, when I can, also move the includes to the .cpp).

@OldDev78

Copy link
Copy Markdown
Contributor

The integral concept is a merge conflict for me :/

return result;
}

bool constexpr String::operator==(String s) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when the function is this simple, I´d rather it be inlined in the declaration above.

Comment on lines +86 to +97
/// Initialize with preallocated buffer with initial capacity
///
/// @param allocator Initalized allocator that allows for string builder to grow
/// @param buffer Preallocated buffer that was created with allocator
/// @param capacity The initial capacity that buffer has before it needs to be reallocated
memory::Error Create(Allocator allocator, utf16 *buffer, isize capacity);

/// Initialize with preallocated constant sized buffer, size in number of elements
///
/// @param buffer Preallocated buffer with no access to allocations
/// @param bufferMaxCapacity Max Capacity of preallocated buffer
memory::Error Create(utf16 *buffer, isize bufferMaxCapacity);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constructor functions aren't needed. A fixed allocator already provides this functionality by wrapping a fixed block of memory and can be abstracted to an Allocator type. It just returns the whole block of memory and aligns it for you. The abstraction is there to avoid duplicates like this.

Comment on lines +319 to +323
// Initializes with initial capacity
memory::Error StringBuilder::Create(Allocator allocator, isize capacity) {
// TODO(Jon) Assert capacity is greater than 0
Slice dst;
memory::Error err = allocator.alloc(&dst, capacity, alignof(utf16));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this initial allocation be defered to the first write? It would make this function error free.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can change the design to work with that. I was initially concerned over capacity if it starts out at 0 but that should be fine as all of the write functions ensure it adds a minimum amount, which I didn't know at the time as I wrote this function first


memory::Error StringBuilder::Write(rune r) {
if (r > 0x10FFFF || (r >= 0xD800 && r <= 0xDFFF)) {
return Error::Other;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another error type unioned with the allocator error type would be a better approach. It would allow for more granularity in this interface.

Comment on lines +480 to +497
static constexpr utf16 hexadecimalNibbles[] = {
u'0',
u'1',
u'2',
u'3',
u'4',
u'5',
u'6',
u'7',
u'8',
u'9',
u'A',
u'B',
u'C',
u'D',
u'E',
u'F',
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a base64 table and reused for all bases less than 64 (I wouldn't worry about padding equals signs here). Unfortunately, the ascii table isn't laid out so conveniently. It'd also give the last case the utility of all bases supported by the table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is probably the least important note here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this can be a later addition if we need it

Comment thread engine/native/core/containers/string.cppm Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
engine/native/core/containers/string.cppm (2)

349-354: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Infinite loop: base-2 loop condition checks val instead of unsignedVal.

The loop condition while (val != 0) checks the original (unchanged) parameter, but the loop body only modifies unsignedVal. Since val is never updated, this creates an infinite loop for any non-zero input when using base 2.

The base-16 (line 356) and base-10 (line 361) cases correctly use unsignedVal != 0.

Suggested fix
 	} else if (base == 2) {
-		while (val != 0) {
+		while (unsignedVal != 0) {
 			std::memcpy(buff + index, binaryNibbles[unsignedVal & 0xF], sizeof(binaryNibbles[0]));
 			index += 4;
 			unsignedVal >>= 4;
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/native/core/containers/string.cppm` around lines 349 - 354, The base-2
conversion loop condition checks the parameter `val` instead of `unsignedVal`,
causing an infinite loop since `val` is never modified while only `unsignedVal`
is updated in the loop body. Change the `while (val != 0)` condition in the
`else if (base == 2)` block to `while (unsignedVal != 0)` to match the correct
behavior used in the base-16 and base-10 cases.

194-197: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential integer overflow when sizes differ significantly.

The cast (int)(size - s.size) can overflow when the size difference exceeds INT_MAX (e.g., comparing a 3-billion-element string against an empty one). The result would be incorrect comparison semantics.

Suggested fix
 	if (result == 0) {
-		return (int)(size - s.size);
+		if (size == s.size) return 0;
+		return size < s.size ? -1 : 1;
 	}
-	return result;
+	return result < 0 ? -1 : 1;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@engine/native/core/containers/string.cppm` around lines 194 - 197, The cast
(int)(size - s.size) in the return statement can overflow when the size
difference exceeds INT_MAX, causing incorrect comparison results for strings
with significantly different lengths. Fix this by safely comparing the sizes
without casting a potentially large size_t difference to int. Instead, check if
size is greater than s.size and return 1 if true, otherwise return -1 when sizes
differ, to maintain proper comparison semantics without overflow risk.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 32-49: The comparison operators (operator==, operator!=,
operator>, operator>=, operator<, operator<=) do not modify the String object
and should be marked as const member functions. Add the const keyword after the
parameter list for each of these six operators to indicate they don't modify the
object state and to allow them to be called on const String instances.
Additionally, update the StringCompare method declaration (referenced on line
31) to also be const since it is called by these operators and should not
require a mutable String instance.

---

Outside diff comments:
In `@engine/native/core/containers/string.cppm`:
- Around line 349-354: The base-2 conversion loop condition checks the parameter
`val` instead of `unsignedVal`, causing an infinite loop since `val` is never
modified while only `unsignedVal` is updated in the loop body. Change the `while
(val != 0)` condition in the `else if (base == 2)` block to `while (unsignedVal
!= 0)` to match the correct behavior used in the base-16 and base-10 cases.
- Around line 194-197: The cast (int)(size - s.size) in the return statement can
overflow when the size difference exceeds INT_MAX, causing incorrect comparison
results for strings with significantly different lengths. Fix this by safely
comparing the sizes without casting a potentially large size_t difference to
int. Instead, check if size is greater than s.size and return 1 if true,
otherwise return -1 when sizes differ, to maintain proper comparison semantics
without overflow risk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99d10637-fa46-476e-b1ad-5f8fdfb9ad32

📥 Commits

Reviewing files that changed from the base of the PR and between 7026321 and 7a0c807.

📒 Files selected for processing (2)
  • engine/native/core/containers/string.cpp
  • engine/native/core/containers/string.cppm

Comment thread engine/native/core/containers/string.cppm Outdated
return err;
}

memcpy(newDst.data, buffer, size * sizeof(utf16));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future issue: Make slice version of memcpy.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Engine/cpp/runtime/core/containers/string.cpp`:
- Around line 79-82: The StringBuilder::Write method with the utf16 const *str
parameter calls StringLength(str) without first checking if the input pointer is
null, which will cause a null pointer dereference and crash. Add a null check at
the beginning of the StringBuilder::Write function (before calling StringLength)
and return an appropriate error code if str is null pointer, similar to how the
TODO comment suggests an assertion should be added.
- Line 96: In the string append operation at line 96 where memcpy is called to
copy str.text into the buffer, replace memcpy with memmove to safely handle the
case where str.text points into the builder's own buffer, preventing undefined
behavior and data corruption from overlapping memory regions.
- Around line 28-29: The capacity field is being assigned newDst.size which
represents byte count, but the write and grow logic throughout the code treats
capacity as the number of UTF-16 code units (elements). This mismatch causes a
critical overflow risk. Fix this by dividing newDst.size by sizeof(utf16) when
assigning to capacity at the line where capacity = newDst.size is set,
converting the byte size to the correct element count so that all subsequent
bounds checking and write operations use consistent units.

In `@Engine/cpp/runtime/core/containers/string.cppm`:
- Around line 358-363: The binary conversion loop in the base == 2 branch has a
non-terminating loop condition issue. The while loop checks `val != 0` but only
updates `unsignedVal` through right-shift operations, which means the loop
condition will never become false if val is non-zero, causing the loop to run
indefinitely and overflow the buffer. Change the loop condition from `while (val
!= 0)` to `while (unsignedVal != 0)` so that the loop terminates when
unsignedVal is completely processed to zero.
- Line 70: The StringIterator constructor performs pointer arithmetic on a
potentially null pointer when initializing the end member, which violates C++
standards. In the StringIterator constructor's member initializer list, add a
guard condition to check if s.text is null before performing the addition s.text
+ s.size. When s.text is nullptr, set end to nullptr instead of attempting the
pointer arithmetic, ensuring the constructor handles empty strings (like
EMPTY_STRING) safely without invoking undefined behavior.
- Around line 190-205: In the StringCompare function's runtime path (the else
branch following the consteval block), replace the memcmp call with
std::char_traits<char16_t>::compare to properly compare UTF-16 code units
instead of raw bytes, which will ensure correct ordering on all endianness
systems. Additionally, replace the cast from isize to int in the final return
statement that compares size and s.size with a ternary operator that returns -1
if size is less than s.size, 1 if greater, or 0 if equal, to avoid integer
overflow and maintain correct relational semantics for large size deltas.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 785a91cc-508b-4684-bbb1-58fa235dcf5a

📥 Commits

Reviewing files that changed from the base of the PR and between 226981d and c2047f9.

📒 Files selected for processing (7)
  • Engine/cpp/runtime/core/CMakeLists.txt
  • Engine/cpp/runtime/core/containers/containers.cppm
  • Engine/cpp/runtime/core/containers/string.cpp
  • Engine/cpp/runtime/core/containers/string.cppm
  • Engine/cpp/runtime/core/definitions/definitions.cppm
  • Engine/cpp/runtime/core/definitions/stdtypes.cppm
  • Engine/cpp/runtime/core/math/functions.cppm
✅ Files skipped from review due to trivial changes (1)
  • Engine/cpp/runtime/core/containers/containers.cppm

Comment thread Engine/cpp/runtime/core/containers/string.cpp Outdated
Comment on lines +79 to +82
StringBuilder::Error StringBuilder::Write(utf16 const *str) {
// TODO(Jon) assert not null
isize length = StringLength(str);
Error err = GrowCapacity(size + length);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing null handling in Write(utf16 const* str) can dereference null.

At Line 81, StringLength(str) is called unconditionally. A null input will crash before returning an error.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Engine/cpp/runtime/core/containers/string.cpp` around lines 79 - 82, The
StringBuilder::Write method with the utf16 const *str parameter calls
StringLength(str) without first checking if the input pointer is null, which
will cause a null pointer dereference and crash. Add a null check at the
beginning of the StringBuilder::Write function (before calling StringLength) and
return an appropriate error code if str is null pointer, similar to how the TODO
comment suggests an assertion should be added.

if (err != Error::Okay) {
return err;
}
memcpy(buffer + size, str.text, str.size * sizeof(utf16));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use overlap-safe copy when appending from a String view.

At Line 96, memcpy is undefined for overlapping regions. If str.text points into the builder’s own buffer, this can corrupt output or crash. memmove is the safe primitive here.

Suggested fix
-	memcpy(buffer + size, str.text, str.size * sizeof(utf16));
+	memmove(buffer + size, str.text, str.size * sizeof(utf16));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
memcpy(buffer + size, str.text, str.size * sizeof(utf16));
memmove(buffer + size, str.text, str.size * sizeof(utf16));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Engine/cpp/runtime/core/containers/string.cpp` at line 96, In the string
append operation at line 96 where memcpy is called to copy str.text into the
buffer, replace memcpy with memmove to safely handle the case where str.text
points into the builder's own buffer, preventing undefined behavior and data
corruption from overlapping memory regions.

utf16 const *start;
utf16 const *end;

constexpr StringIterator(String s) : start{ s.text }, end{ s.text + s.size } {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the string.cppm file
find . -name "string.cppm" -type f

Repository: Redot-Engine/DraconicEngine

Length of output: 120


🏁 Script executed:

# Read the relevant file to understand the String and StringIterator definitions
cat -n Engine/cpp/runtime/core/containers/string.cppm | head -100

Repository: Redot-Engine/DraconicEngine

Length of output: 3480


🏁 Script executed:

# Search for uses of StringIterator with empty strings or potential nullptr scenarios
rg -n "StringIterator\s*\(" Engine/cpp/runtime/core/containers/string.cppm | head -20

Repository: Redot-Engine/DraconicEngine

Length of output: 155


🏁 Script executed:

# Look for uses of EMPTY_STRING
rg -n "EMPTY_STRING" Engine/cpp/runtime/core/containers/

Repository: Redot-Engine/DraconicEngine

Length of output: 249


🏁 Script executed:

# Check if there are operator[] or other accesses on StringIterator
rg -n "StringIterator" Engine/cpp/runtime/core/ -A 3 | head -50

Repository: Redot-Engine/DraconicEngine

Length of output: 970


🏁 Script executed:

# Let's check if StringIterator is actually instantiated or used with String objects
rg -n "StringIterator" Engine/cpp/runtime/core/ --type cpp --type cppm -B 2 -A 2 | head -80

Repository: Redot-Engine/DraconicEngine

Length of output: 100


🏁 Script executed:

# Check the line 211 context to see what returns EMPTY_STRING
sed -n '200,220p' Engine/cpp/runtime/core/containers/string.cppm

Repository: Redot-Engine/DraconicEngine

Length of output: 533


🏁 Script executed:

# Search for StringIterator usage/instantiation
rg -n "StringIterator" Engine/cpp/runtime/core/ -B 1 -A 1

Repository: Redot-Engine/DraconicEngine

Length of output: 739


🏁 Script executed:

# Check if there are any methods that create StringIterator objects
rg -n "return.*StringIterator\|StringIterator\s*\{" Engine/cpp/runtime/core/

Repository: Redot-Engine/DraconicEngine

Length of output: 53


🏁 Script executed:

# Look at the full string.cppm file to see if StringIterator is actually used anywhere
wc -l Engine/cpp/runtime/core/containers/string.cppm

Repository: Redot-Engine/DraconicEngine

Length of output: 122


🏁 Script executed:

# Search for any instantiation of StringIterator in the entire repository
rg -n "StringIterator\s*\(" --type-list | grep -i cppm || echo "No cppm type, trying broader search"

Repository: Redot-Engine/DraconicEngine

Length of output: 515


🏁 Script executed:

# Broader search for StringIterator constructor calls
rg -n "StringIterator" --glob "**/*.cpp" --glob "**/*.cppm" --glob "**/*.h" --glob "**/*.hpp" -A 1 -B 1

Repository: Redot-Engine/DraconicEngine

Length of output: 739


🏁 Script executed:

# Check if the Advance method is called anywhere, which would require a StringIterator
rg -n "\.Advance\(" Engine/cpp/runtime/core/ -B 3 -A 1

Repository: Redot-Engine/DraconicEngine

Length of output: 53


🏁 Script executed:

# Let's understand the C++ standard on null pointer arithmetic by examining if there are comments or documentation
grep -n -i "nullptr\|null.*pointer\|undefined" Engine/cpp/runtime/core/containers/string.cppm | head -20

Repository: Redot-Engine/DraconicEngine

Length of output: 328


Fix null-pointer arithmetic in StringIterator constructor.

The expression s.text + s.size at line 70 invokes undefined behavior when s.text == nullptr, which is a valid state for strings (see EMPTY_STRING at line 64). Even though the addition offset is zero, pointer arithmetic on null pointers violates the C++ standard. Apply the suggested guard:

Fix
-constexpr StringIterator(String s) : start{ s.text }, end{ s.text + s.size } {}
+constexpr StringIterator(String s)
+	: start{ s.text }, end{ s.text ? (s.text + s.size) : nullptr } {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constexpr StringIterator(String s) : start{ s.text }, end{ s.text + s.size } {}
constexpr StringIterator(String s)
: start{ s.text }, end{ s.text ? (s.text + s.size) : nullptr } {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Engine/cpp/runtime/core/containers/string.cppm` at line 70, The
StringIterator constructor performs pointer arithmetic on a potentially null
pointer when initializing the end member, which violates C++ standards. In the
StringIterator constructor's member initializer list, add a guard condition to
check if s.text is null before performing the addition s.text + s.size. When
s.text is nullptr, set end to nullptr instead of attempting the pointer
arithmetic, ensuring the constructor handles empty strings (like EMPTY_STRING)
safely without invoking undefined behavior.

Comment on lines +190 to +205
isize min_size = draco::math::min(size, s.size);

int result = 0;
if consteval {
for (isize i = 0; i < min_size; ++i) {
if (text[i] != s.text[i]) {
return text[i] < s.text[i] ? -1 : 1;
}
}
} else {
result = memcmp(text, s.text, min_size * sizeof(utf16));
}

if (result == 0) {
return (int)(size - s.size);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

StringCompare runtime path has incorrect UTF-16 comparison and overflow risk.

Line 200 uses memcmp which compares bytes, not UTF-16 code units—on little-endian systems this produces wrong ordering for non-ASCII values. Line 204 casts the isize difference to int, which overflows for large size deltas and breaks relational semantics. The consteval path correctly compares code units.

Replace memcmp with std::char_traits<char16_t>::compare and use a ternary operator for the size comparison:

Suggested fix
 	} else {
-		result = memcmp(text, s.text, min_size * sizeof(utf16));
+		result = std::char_traits<char16_t>::compare(text, s.text, static_cast<usize>(min_size));
 	}
 
 	if (result == 0) {
-		return (int)(size - s.size);
+		return size < s.size ? -1 : (size > s.size ? 1 : 0);
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
isize min_size = draco::math::min(size, s.size);
int result = 0;
if consteval {
for (isize i = 0; i < min_size; ++i) {
if (text[i] != s.text[i]) {
return text[i] < s.text[i] ? -1 : 1;
}
}
} else {
result = memcmp(text, s.text, min_size * sizeof(utf16));
}
if (result == 0) {
return (int)(size - s.size);
}
isize min_size = draco::math::min(size, s.size);
int result = 0;
if consteval {
for (isize i = 0; i < min_size; ++i) {
if (text[i] != s.text[i]) {
return text[i] < s.text[i] ? -1 : 1;
}
}
} else {
result = std::char_traits<char16_t>::compare(text, s.text, static_cast<usize>(min_size));
}
if (result == 0) {
return size < s.size ? -1 : (size > s.size ? 1 : 0);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Engine/cpp/runtime/core/containers/string.cppm` around lines 190 - 205, In
the StringCompare function's runtime path (the else branch following the
consteval block), replace the memcmp call with
std::char_traits<char16_t>::compare to properly compare UTF-16 code units
instead of raw bytes, which will ensure correct ordering on all endianness
systems. Additionally, replace the cast from isize to int in the final return
statement that compares size and s.size with a ternary operator that returns -1
if size is less than s.size, 1 if greater, or 0 if equal, to avoid integer
overflow and maintain correct relational semantics for large size deltas.

Comment thread Engine/cpp/runtime/core/containers/string.cppm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants