Skip to content

Fix UBSAN issue by switching to smart_str#83

Open
ndossche wants to merge 1 commit into
kjdev:masterfrom
ndossche:use-smart-str
Open

Fix UBSAN issue by switching to smart_str#83
ndossche wants to merge 1 commit into
kjdev:masterfrom
ndossche:use-smart-str

Conversation

@ndossche
Copy link
Copy Markdown

@ndossche ndossche commented May 22, 2026

When a smart_string is never updated, it's c field remains NULL. This is possible with a brotli library failure or with an empty output. This is triggerable via e.g. uncompress_add().
When that happens, a NULL pointer is passed to zend_string_init() which is not allowed and will cause undefined behaviour in the subsequent memcpy() call that zend_string_init() performs.

One way to fix this would be to guard the invocation of RETVAL_STRINGL(). However, a cleaner way is to switch to the smart_str API which handles everything already nicely for us.
As a bonus, smart_str will avoid reallocation of the string, so the new code is not only cleaner but also faster.

Example UBSAN report:

/usr/local/include/php/Zend/zend_string.h:191:2: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x7c8d9852a291 in zend_string_init /usr/local/include/php/Zend/zend_string.h:191
    #1 0x7c8d98535adf in zif_brotli_uncompress_add /work/php-ext-brotli/brotli.c:1830
    #2 0x623731f73e26 in zend_test_execute_internal /work/php-src/ext/zend_test/observer.c:306
    #3 0x6237323a22bd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER /work/php-src/Zend/zend_vm_execute.h:2154
    #4 0x62373259894d in execute_ex /work/php-src/Zend/zend_vm_execute.h:116519
    #5 0x6237325a9126 in zend_execute /work/php-src/Zend/zend_vm_execute.h:121962
    #6 0x623732779b56 in zend_execute_script /work/php-src/Zend/zend.c:1980
    #7 0x623731fa8cce in php_execute_script_ex /work/php-src/main/main.c:2645
    #8 0x623731fa8fd8 in php_execute_script /work/php-src/main/main.c:2685
    #9 0x623732791599 in main /work/php-src/sapi/cgi/cgi_main.c:2529
    #10 0x7c8d9a8381c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0x7c8d9a83828a in __libc_start_main_impl ../csu/libc-start.c:360
    #12 0x623730c09ad4 in _start (/work/php-src/build-dbg-ubsan/sapi/cgi/php-cgi+0x1409ad4) (BuildId: 4ad96c24d1e97297ff4db76de2af34c057b26433)

Note: this was found by a hybrid static-dynamic analyzer I'm developing.

Summary by CodeRabbit

  • Refactor
    • Internal improvements to compression/decompression buffer handling for better code maintainability and performance optimization.

Review Change Stack

When a `smart_string` is never updated, it's `c` field remains NULL.
This is possible with a brotli library failure or with an empty output.
This is triggerable via e.g. `uncompress_add()`.
When that happens, a NULL pointer is passed to `zend_string_init()`
which is not allowed and will cause undefined behaviour in the
subsequent `memcpy()` call that `zend_string_init()` performs.

One way to fix this would be to guard the invocation of `RETVAL_STRINGL()`.
However, a cleaner way is to switch to the `smart_str` API which handles
everything already nicely for us.
As a bonus, `smart_str` will avoid reallocation of the string, so the new
code is not only cleaner but also faster.

Example UBSAN report:
```
/usr/local/include/php/Zend/zend_string.h:191:2: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x7c8d9852a291 in zend_string_init /usr/local/include/php/Zend/zend_string.h:191
    kjdev#1 0x7c8d98535adf in zif_brotli_uncompress_add /work/php-ext-brotli/brotli.c:1830
    kjdev#2 0x623731f73e26 in zend_test_execute_internal /work/php-src/ext/zend_test/observer.c:306
    kjdev#3 0x6237323a22bd in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER /work/php-src/Zend/zend_vm_execute.h:2154
    kjdev#4 0x62373259894d in execute_ex /work/php-src/Zend/zend_vm_execute.h:116519
    kjdev#5 0x6237325a9126 in zend_execute /work/php-src/Zend/zend_vm_execute.h:121962
    kjdev#6 0x623732779b56 in zend_execute_script /work/php-src/Zend/zend.c:1980
    kjdev#7 0x623731fa8cce in php_execute_script_ex /work/php-src/main/main.c:2645
    kjdev#8 0x623731fa8fd8 in php_execute_script /work/php-src/main/main.c:2685
    kjdev#9 0x623732791599 in main /work/php-src/sapi/cgi/cgi_main.c:2529
    kjdev#10 0x7c8d9a8381c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    kjdev#11 0x7c8d9a83828a in __libc_start_main_impl ../csu/libc-start.c:360
    kjdev#12 0x623730c09ad4 in _start (/work/php-src/build-dbg-ubsan/sapi/cgi/php-cgi+0x1409ad4) (BuildId: 4ad96c24d1e97297ff4db76de2af34c057b26433)
```
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

This PR modernizes the Brotli extension's buffer handling by replacing the legacy smart_string API with the Zend Engine's smart_str API. The change updates the module header includes and refactors all four main compression/decompression functions—both synchronous and incremental variants—to use the new buffering mechanism for encoding/decoding operations.

Changes

Buffer API Modernization

Layer / File(s) Summary
Header and include updates
brotli.c
Adds zend_smart_str.h to the module includes to enable the modern smart_str API.
Compression buffering modernization
brotli.c
Updates brotli_compress and brotli_compress_add to initialize output buffers with smart_str, append compressed chunks via smart_str_appendl, and extract/free results using the modern API.
Decompression buffering modernization
brotli.c
Updates brotli_uncompress and brotli_uncompress_add to initialize output buffers with smart_str, append decompressed chunks via smart_str_appendl, and extract/free results using the modern API.

Possibly related PRs

  • kjdev/php-ext-brotli#76: Earlier PR switching from ext/standard smart string support to Zend smart string implementation by updating includes and API usage paths.

Poem

🐰 From old strings to the new way bright,
Smart_str now makes buffering right,
Compress, decompress, back and forth,
Modern APIs prove their worth!
The Brotli hop continues strong! 🎉


🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: replacing smart_string with smart_str to fix a UBSAN issue. It is concise and clearly identifies the primary fix.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
brotli.c (1)

1806-1822: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle decoder error/finish state before returning output.

brotli_uncompress_add() returns smart_str_extract(&out) regardless of BrotliDecoderResult. If decompression ends with BROTLI_DECODER_RESULT_ERROR (or mode == BROTLI_OPERATION_FINISH but not SUCCESS), this can return partial/empty data as a successful call.

Proposed fix
     BrotliDecoderResult result = BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT;
     while (result == BROTLI_DECODER_RESULT_NEEDS_MORE_OUTPUT) {
         ctx->available_out = buffer_size;
         ctx->next_out = buffer;
         result = BrotliDecoderDecompressStream(ctx->decoder,
                                                &ctx->available_in,
                                                &ctx->next_in,
                                                &ctx->available_out,
                                                &ctx->next_out,
                                                0);
         size_t buffer_used = buffer_size - ctx->available_out;
         if (buffer_used) {
             smart_str_appendl(&out, buffer, buffer_used);
         }
     }

+    if (result == BROTLI_DECODER_RESULT_ERROR ||
+        (mode == BROTLI_OPERATION_FINISH && result != BROTLI_DECODER_RESULT_SUCCESS)) {
+        efree(buffer);
+        smart_str_free(&out);
+        php_error_docref(NULL, E_WARNING, "failed to incremental uncompress");
+        RETURN_FALSE;
+    }
+
     RETVAL_STR(smart_str_extract(&out));

     efree(buffer);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@brotli.c` around lines 1806 - 1822, The function brotli_uncompress_add
currently returns RETVAL_STR(smart_str_extract(&out)) without checking the final
BrotliDecoderResult; after the loop that calls BrotliDecoderDecompressStream,
inspect the final result and handle non-success cases: if result !=
BROTLI_DECODER_RESULT_SUCCESS (including BROTLI_DECODER_RESULT_ERROR or
incomplete/NEEDS_MORE_INPUT states), release/clear the smart_str out buffer and
return an error/false (or raise an appropriate PHP error) instead of returning
partial data; only call RETVAL_STR(smart_str_extract(&out)) when the result is
BROTLI_DECODER_RESULT_SUCCESS. Ensure you reference ctx, out,
BrotliDecoderDecompressStream, and RETVAL_STR in the fix so the check is
implemented right after the decompression loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@brotli.c`:
- Around line 1806-1822: The function brotli_uncompress_add currently returns
RETVAL_STR(smart_str_extract(&out)) without checking the final
BrotliDecoderResult; after the loop that calls BrotliDecoderDecompressStream,
inspect the final result and handle non-success cases: if result !=
BROTLI_DECODER_RESULT_SUCCESS (including BROTLI_DECODER_RESULT_ERROR or
incomplete/NEEDS_MORE_INPUT states), release/clear the smart_str out buffer and
return an error/false (or raise an appropriate PHP error) instead of returning
partial data; only call RETVAL_STR(smart_str_extract(&out)) when the result is
BROTLI_DECODER_RESULT_SUCCESS. Ensure you reference ctx, out,
BrotliDecoderDecompressStream, and RETVAL_STR in the fix so the check is
implemented right after the decompression loop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d997d699-9e43-458e-b234-b8926f5c697c

📥 Commits

Reviewing files that changed from the base of the PR and between 4924acc and a4828f5.

📒 Files selected for processing (1)
  • brotli.c

@kjdev
Copy link
Copy Markdown
Owner

kjdev commented May 24, 2026

Since version 7.0 is currently supported, this appears to be an error.

PHP Startup: Unable to load dynamic library '/path/to/brotli.so' - Error relocating /path/to/brotli.so: smart_str_extract: symbol not found in Unknown on line 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants