fix: handle invalid UTF-8 in Ruby and Vue preprocessors#19588
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughUTF-8 validation checks were added to two pre-processor modules. In the Ruby processor, HEREDOC extraction now runs only when content is valid UTF-8; otherwise the byte-level Ruby processing remains. In the Vue processor, template/tag processing is executed only for valid UTF-8 content. Tests were added to ensure invalid UTF-8 input does not panic and that valid UTF-8 with multibyte characters is processed as expected. No public API signatures were changed. 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
73128a2 to
e8fb8b6
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/oxide/src/extractor/pre_processors/ruby.rs (1)
433-445: Redundant import ofPreProcessor.Line 435 re-imports
PreProcessor, which is already brought into scope at line 230. The test compiles either way, but the inner import is unnecessary.🧹 Remove redundant import
#[test] fn test_invalid_utf8_does_not_panic() { - use crate::extractor::pre_processors::pre_processor::PreProcessor; - // Invalid UTF-8 sequence: 0x80 is a continuation byte without a leading byte let invalid_utf8: &[u8] = &[0x80, 0x81, 0x82];
The Ruby and Vue preprocessors were using `from_utf8().unwrap()` which panics when processing files containing invalid UTF-8 bytes. This can happen when: - Binary files are inadvertently scanned - Files are truncated at multi-byte character boundaries - Files use non-UTF-8 encodings This change wraps the UTF-8 conversion in `if let Ok(...)` to gracefully skip the regex-based template extraction when UTF-8 conversion fails, while still allowing the byte-level processing to continue (in Ruby's case). Fixes panic: `thread panicked at crates/oxide/src/extractor/pre_processors/ruby.rs:37:59`
The import is already at the module level (line 230).
aa0d86b to
97427eb
Compare
Confidence Score: 5/5This PR is safe to merge — it converts two definite panics into graceful no-ops with no behaviour change on valid UTF-8 input. The change is minimal and mechanical: two No files require special attention. Reviews (2): Last reviewed commit: "update changelog" | Re-trigger Greptile |
Summary
This PR fixes a panic that occurs when the Ruby or Vue preprocessors encounter files with invalid UTF-8 bytes.
The issue:
ruby.rs:37andvue.rs:18usedstd::str::from_utf8(content).unwrap()Error message:
The fix:
if let Ok(...)to gracefully handle invalid UTF-8This can happen in Rails projects when:
Test plan
test_invalid_utf8_does_not_panictest for Ruby preprocessortest_valid_utf8_with_multibyte_charstest for Ruby preprocessortest_invalid_utf8_does_not_panictest for Vue preprocessorcargo test pre_processors- 43 tests)