Skip to content

fix: handle invalid UTF-8 in Ruby and Vue preprocessors#19588

Merged
RobinMalfait merged 3 commits into
tailwindlabs:mainfrom
khasinski:fix/ruby-preprocessor-utf8-panic
Jun 4, 2026
Merged

fix: handle invalid UTF-8 in Ruby and Vue preprocessors#19588
RobinMalfait merged 3 commits into
tailwindlabs:mainfrom
khasinski:fix/ruby-preprocessor-utf8-panic

Conversation

@khasinski
Copy link
Copy Markdown
Contributor

Summary

This PR fixes a panic that occurs when the Ruby or Vue preprocessors encounter files with invalid UTF-8 bytes.

The issue:

  • ruby.rs:37 and vue.rs:18 used std::str::from_utf8(content).unwrap()
  • This panics when processing files containing invalid UTF-8 bytes

Error message:

thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59:
called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 45, error_len: Some(1) }

The fix:

  • Wrap UTF-8 conversion in if let Ok(...) to gracefully handle invalid UTF-8
  • Skip regex-based template extraction when UTF-8 conversion fails
  • Allow byte-level processing to continue (in Ruby's case)

This can happen in Rails projects when:

  • Binary files are inadvertently scanned
  • Files contain non-UTF-8 encodings
  • Files are truncated at multi-byte character boundaries during parallel processing

Test plan

  • Added test_invalid_utf8_does_not_panic test for Ruby preprocessor
  • Added test_valid_utf8_with_multibyte_chars test for Ruby preprocessor
  • Added test_invalid_utf8_does_not_panic test for Vue preprocessor
  • All existing tests pass (cargo test pre_processors - 43 tests)

@khasinski khasinski requested a review from a team as a code owner January 21, 2026 22:52
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 21, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 46b6b599-09c8-4e5a-873c-623fd6aadcdc

📥 Commits

Reviewing files that changed from the base of the PR and between 97427eb and e9bbde5.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Walkthrough

UTF-8 validation checks were added to two pre-processor modules. In the Ruby processor, HEREDOC extraction now runs only when content is valid UTF-8; otherwise the byte-level Ruby processing remains. In the Vue processor, template/tag processing is executed only for valid UTF-8 content. Tests were added to ensure invalid UTF-8 input does not panic and that valid UTF-8 with multibyte characters is processed as expected. No public API signatures were changed.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: fixing handling of invalid UTF-8 in Ruby and Vue preprocessors, which is the primary focus of the changeset.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the panic issue, the root cause, the fix applied, and comprehensive test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@khasinski khasinski force-pushed the fix/ruby-preprocessor-utf8-panic branch from 73128a2 to e8fb8b6 Compare February 16, 2026 16:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/oxide/src/extractor/pre_processors/ruby.rs (1)

433-445: Redundant import of PreProcessor.

Line 435 re-imports PreProcessor, which is already brought into scope at line 230. The test compiles either way, but the inner import is unnecessary.

🧹 Remove redundant import
     #[test]
     fn test_invalid_utf8_does_not_panic() {
-        use crate::extractor::pre_processors::pre_processor::PreProcessor;
-
         // Invalid UTF-8 sequence: 0x80 is a continuation byte without a leading byte
         let invalid_utf8: &[u8] = &[0x80, 0x81, 0x82];

khasinski added 2 commits June 4, 2026 12:26
The Ruby and Vue preprocessors were using `from_utf8().unwrap()` which
panics when processing files containing invalid UTF-8 bytes. This can
happen when:
- Binary files are inadvertently scanned
- Files are truncated at multi-byte character boundaries
- Files use non-UTF-8 encodings

This change wraps the UTF-8 conversion in `if let Ok(...)` to gracefully
skip the regex-based template extraction when UTF-8 conversion fails,
while still allowing the byte-level processing to continue (in Ruby's
case).

Fixes panic: `thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59`
The import is already at the module level (line 230).
@RobinMalfait RobinMalfait force-pushed the fix/ruby-preprocessor-utf8-panic branch from aa0d86b to 97427eb Compare June 4, 2026 10:27
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 4, 2026

Confidence Score: 5/5

This PR is safe to merge — it converts two definite panics into graceful no-ops with no behaviour change on valid UTF-8 input.

The change is minimal and mechanical: two .unwrap() calls replaced with if let Ok guards. Valid UTF-8 content follows exactly the same code path as before. Invalid UTF-8 now silently skips the regex extraction instead of panicking, which is the correct recovery behaviour. No other preprocessors in the codebase share the same pattern, and the new tests cover both the panic case and the happy path.

No files require special attention.

Reviews (2): Last reviewed commit: "update changelog" | Re-trigger Greptile

@RobinMalfait RobinMalfait enabled auto-merge (squash) June 4, 2026 10:31
Copy link
Copy Markdown
Member

@RobinMalfait RobinMalfait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@RobinMalfait RobinMalfait merged commit d42b34a into tailwindlabs:main Jun 4, 2026
8 checks passed
@khasinski khasinski deleted the fix/ruby-preprocessor-utf8-panic branch June 4, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants