Skip to content

Expose normalized transport-error detail (ErrorString/ErrorSymbol/ErrorCode)#31

Open
matthargett wants to merge 5 commits into
BabylonJS:mainfrom
rebeckerspecialties:transport-error-reporting
Open

Expose normalized transport-error detail (ErrorString/ErrorSymbol/ErrorCode)#31
matthargett wants to merge 5 commits into
BabylonJS:mainfrom
rebeckerspecialties:transport-error-reporting

Conversation

@matthargett

@matthargett matthargett commented Jun 10, 2026

Copy link
Copy Markdown

Closes the transport-error information gap on the Apple and libcurl backends: failures like DNS errors, refused connections, TLS problems, and missing local files previously completed with StatusCode() == 0 and no other information. The Apple backend discarded the NSError (the existing // TODO: Consider logging or otherwise exposing the error message...), and the curl backend caught and dropped its own failure exception (necessarily, to protect its std::thread), so consumers like JsRuntimeHost's fetch/XMLHttpRequest polyfills can only say "network request failed". For applications that embed Babylon Native and live off field crash reports, every failure class is currently indistinguishable.

API (purely additive)

std::string_view ErrorString();  // "<domain>:<symbol>(<code>): <detail>", empty if no transport failure
std::string_view ErrorSymbol();  // stable token, e.g. "CURLE_COULDNT_RESOLVE_HOST", "NSURLErrorTimedOut"
int32_t          ErrorCode();    // raw numeric platform code (CURLcode / NSError code)

SendAsync() still completes normally with status 0 on transport failure, so existing consumers are completely unaffected until they read the new accessors (same opt-in shape as StatusText() in #30).

Normalized, greppable shape

<domain> and <symbol> are stable ASCII tokens, so observability pipelines (Splunk etc.) can filter classes of failure with exact substrings like "curl:CURLE_COULDNT_RESOLVE_HOST" or "nsurl:NSURLErrorTimedOut"; <detail> carries the platform's human-readable message including host/port/path specifics. Real output from this branch (macOS):

nsurl:NSURLErrorCannotConnectToHost(-1004): Could not connect to the server.
nsurl:NSURLErrorCannotFindHost(-1003): A server with the specified hostname could not be found.
nsurl:NSURLErrorFileDoesNotExist(-1100): The requested URL was not found on this server.
nsurl:NSURLErrorServerCertificateUntrusted(-1202): The certificate for this server is invalid. You might be connecting to a server that is pretending to be "expired.badssl.com" ...
urllib:AppResourceNotFound(0): no bundled resource for 'app:///missing.js'

and on Linux/curl, CURLOPT_ERRORBUFFER detail is preferred over generic curl_easy_strerror text:

curl:CURLE_COULDNT_CONNECT(7): Failed to connect to 127.0.0.1 port 47651 after 0 ms: Couldn't connect to server
curl:CURLE_FILE_COULDNT_READ_FILE(37): Couldn't open file /tmp/missing.bin

Notes:

  • Apple: NSURLErrorDomain codes map to stable symbols; other domains pass through verbatim with a synthesized symbol. The underlying-NSError chain is walked (bounded, deduped by code) so POSIX-level specifics surface when present. The <detail> text is OS-localized on Apple; domain/symbol/code are the stable filterable parts.
  • The missing-app:///-resource case (probably the most common asset-loading failure in shipped apps) is now distinguishable from a network failure via urllib:AppResourceNotFound(0).
  • The invalid-URL Open() throw on Apple now includes the offending URL string.

Scope

Apple (NSURLSession) and Linux (libcurl) backends per the priority consumers; Windows and Android backends keep reporting empty/zero through the shared base (the SetError plumbing in UrlRequest_Base.h is ready for them as a follow-up).

Tests

Tests/UrlRequestErrorReporting.cpp adds 7 offline-deterministic gtest cases: local-file success (no error), missing local file, connection refused (ephemeral bound-then-closed loopback port), NXDOMAIN (RFC 6761 .invalid TLD), grammar-shape regex, request-reuse clears prior error, and missing app:/// resource (Apple). Tests are built only when UrlLib is the top-level project (URLLIB_TESTS, default ON for desktop top-level builds, OFF when consumed via FetchContent), so JsRuntimeHost et al. see no change. I looked at how bun, nodejs, and WPT tests around this area.

CI for this repo is added separately in #32 (Win32 / macOS / Linux, mirroring the JsRuntimeHost workflow conventions). Landing order: this PR first, then #32 — its Run Tests steps execute the UrlLibTests target introduced here.

Verified: built and ran the suite on macOS arm64 (7/7 passing); curl TU compiles clean with -Wall -Wextra (clang); Linux (gcc + clang), macOS, and Win32 execution via the #32 workflows running against this branch on the fork mirror (rebeckerspecialties#1): https://github.com/rebeckerspecialties/UrlLib/actions/runs/27312871316

Context / follow-ups

This is step 1 of improving fetch-failure telemetry / crash reporting for BabylonNative/JsRuntimeHost (see the fetch polyfill added in BabylonJS/JsRuntimeHost#188, whose transport-failure rejection is currently the constant string "fetch: network request failed"). To have actionable support information for triaging/fixing, these errors need to have the non-PII fidelity and reach the top-level abort/reject handler that Sentry/bugsnag/etc hook.

Follow-up in JsRuntimeHost as the next hop in the exception/reject accuracy chain: reject with a TypeError carrying a stable message plus cause/code properties populated from these accessors, mirroring how #30's StatusText() was consumed by BabylonJS/JsRuntimeHost#195.

Noticed while here (not addressed in this PR): UrlRequest::Abort() only has an effect on the Windows backend today, and the curl backend rejects POST with "Not implemented".

Transport failures (DNS, connection refused, TLS, missing local files)
previously completed with StatusCode 0 and no further information: the
Apple backend discarded the NSError (an existing TODO), and the curl
backend caught and dropped its own failure exception. Consumers like
JsRuntimeHost's fetch/XMLHttpRequest polyfills could only report
'network request failed', which makes field crash reports undiagnosable.

New accessors, empty/zero when the request did not fail at the transport
layer, normalized for log-pipeline (Splunk etc.) substring filtering:

  ErrorString()  ->  "<domain>:<symbol>(<code>): <detail>"
  ErrorSymbol()  ->  e.g. CURLE_COULDNT_RESOLVE_HOST, NSURLErrorTimedOut
  ErrorCode()    ->  raw numeric platform code

- Apple: capture NSError domain/code/description (resolves the TODO),
  walking the underlying-error chain for distinct-code levels; missing
  app:/// bundle resources report urllib:AppResourceNotFound(0).
- curl: CURLOPT_ERRORBUFFER detail (host/port/path specifics) with
  curl_easy_strerror fallback and stable CURLE_* symbols.
- Purely additive: SendAsync still completes normally with status 0, so
  existing consumers are unaffected until they opt in.
- Windows/Android backends still report empty/zero (follow-up).

Adds offline-deterministic gtest coverage (7 cases: success, missing
file, connection refused, NXDOMAIN via RFC 6761 .invalid, grammar shape,
reuse-clears-error, missing app resource) and a CI workflow running them
on ubuntu-latest (curl path) and macos-latest (NSURLSession path).
Copilot AI review requested due to automatic review settings June 10, 2026 22:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds transport-level error introspection to UrlRequest (normalized error string + stable symbol + numeric code) and validates behavior via new cross-platform unit tests and CI.

Changes:

  • Expose UrlRequest::ErrorString(), ErrorSymbol(), and ErrorCode() in the public API and shared base implementation.
  • Populate transport error details for Apple (NSURLSession/NSError) and Unix (libcurl) backends.
  • Add GTest-based unit tests + CMake plumbing, and run them in GitHub Actions on Ubuntu/macOS.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Tests/UrlRequestErrorReporting.cpp Adds offline-deterministic tests for new transport error accessors (file, refused connection, DNS failure, reopen clearing).
Tests/CMakeLists.txt Introduces GTest via FetchContent and builds/runs a UrlLibTests executable.
Source/UrlRequest_Unix.cpp Captures CURLOPT_ERRORBUFFER detail and maps common CURLcode values to stable symbols.
Source/UrlRequest_Shared.h Wires new UrlRequest accessors through to the impl.
Source/UrlRequest_Base.h Adds shared storage/reset + normalized SetError() formatting.
Source/UrlRequest_Apple.mm Maps common NSURLErrorDomain codes to stable symbols and builds chained detail from underlying errors.
README.md Documents transport error contract, accessors, examples, and platform support.
Include/UrlLib/UrlLib.h Adds public declarations/docs for error accessors; includes <cstdint> for int32_t.
CMakeLists.txt Adds URLLIB_TESTS option and conditionally enables/creates the Tests subdir.
.github/workflows/ci.yml Adds CI job to configure/build/test on ubuntu-latest and macos-latest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Tests/UrlRequestErrorReporting.cpp Outdated
Comment on lines +27 to +40
// Block until SendAsync completes. Returns false on timeout (a hung transport), in
// which case the shared promise keeps the continuation's target alive regardless.
[[nodiscard]] bool SendAndWait(UrlLib::UrlRequest& request)
{
auto done = std::make_shared<std::promise<void>>();
auto future = done->get_future();

request.SendAsync().then(arcana::inline_scheduler, arcana::cancellation::none(),
[done](const arcana::expected<void, std::exception_ptr>&) {
done->set_value();
});

return future.wait_for(std::chrono::seconds{120}) == std::future_status::ready;
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted in 06b5287: the wait is now 30s and the helper calls request.Abort() before failing on timeout.

One honest caveat from tracing the backends while making this change: UrlRequest::Abort() only cancels m_cancellationSource, which the curl and NSURLSession backends never observe — so cancellation is only actually effective on the Windows backend today, and on the platforms this CI runs the destructor can still block until the transport gives up on its own. The real bound for this suite is that every scenario is offline-deterministic (loopback refusal, RFC 6761 .invalid NXDOMAIN, local files), so completion is fast by construction; the 30s wait and ctest --timeout are backstops. Making Abort() interrupt the Apple/curl transports is part of the follow-up chain this PR feeds (fetch AbortSignal wiring in JsRuntimeHost).

Comment thread Tests/UrlRequestErrorReporting.cpp Outdated
Comment on lines +44 to +61
uint16_t AcquireClosedPort()
{
const int fd = ::socket(AF_INET, SOCK_STREAM, 0);
EXPECT_GE(fd, 0);

sockaddr_in address{};
address.sin_family = AF_INET;
address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
address.sin_port = 0;
EXPECT_EQ(::bind(fd, reinterpret_cast<const sockaddr*>(&address), sizeof(address)), 0);

socklen_t addressLength = sizeof(address);
EXPECT_EQ(::getsockname(fd, reinterpret_cast<sockaddr*>(&address), &addressLength), 0);
const uint16_t port = ntohs(address.sin_port);

::close(fd);
return port;
}

@matthargett matthargett Jun 10, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both points adopted in 06b5287, with one twist based on my on-device testing. The helper is now a move-only RAII RefusingPort whose acquisition returns std::optional with ASSERT_TRUE at call sites, so setup failure stops the test instead of continuing on a bad fd.

On the hold-the-bind suggestion: it turns out to be platform-dependent. On Linux a SYN to a bound-but-not-listening port is refused with RST, so there the socket is kept bound for the test's lifetime exactly as you describe — race-free. On Darwin, though, that same SYN is silently dropped: the connect attempt times out (NSURLErrorTimedOut(-1001)) instead of being refused (-1004) — verified by running it. So on Apple the port number is reserved via bind/getsockname and then closed before connecting, accepting the microseconds-wide reuse window. The class documents the split.

Comment thread Tests/UrlRequestErrorReporting.cpp Outdated
Comment on lines +63 to +69
std::filesystem::path WriteTempFile(const char* name, const char* contents)
{
const auto path = std::filesystem::temp_directory_path() / name;
std::ofstream stream{path, std::ios::trunc};
stream << contents;
return path;
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted in 06b5287 as an RAII TempFile that suffixes the process id (so parallel jobs sharing a temp directory get distinct paths) and removes the file in its destructor; the missing-file test reuses the same fixture with null contents for a guaranteed-absent path. One note on the suggested API: std::filesystem::unique_path is Boost-only — it never made it into std::filesystem — hence the pid-suffixed naming.

…imeout

- RefusingPort: race-free held-bind refusal on Linux; on Darwin a SYN to a
  bound-but-not-listening port is dropped (times out, -1001) rather than
  refused, so close after reserving the port number there. optional<> +
  ASSERT at call sites instead of EXPECT inside the helper.
- TempFile: pid-suffixed unique names, removed on destruction.
- SendAndWait: 30s wait, Abort() on timeout (currently only effective on
  the Windows backend; the offline-deterministic scenarios are the real
  bound).
Replace the single-file matrix workflow with the family pattern used by
both sibling repos: a ci.yml orchestrator calling reusable per-platform
workflow_call files (build-linux.yml, build-macos.yml), checkout@v5,
timeout-minutes, Ninja + explicit gcc/clang jobs on Linux, pinned Xcode
selection with -G Xcode on macOS, Build/<platform> dirs, and running the
test binary directly. Verified the macOS steps locally (Xcode generator,
RelWithDebInfo, 7/7 tests).
@matthargett

Copy link
Copy Markdown
Author

CI evidence while first-time-workflow approval is pending here: this branch is mirrored as a draft PR on our fork (rebeckerspecialties#1), where the workflows ran green across all three jobs — Ubuntu_gcc, Ubuntu_clang, and macOS_Xcode164: https://github.com/rebeckerspecialties/UrlLib/actions/runs/27312401213

That includes the first real execution of the libcurl error path (previously only syntax-checked): the Linux jobs pass the exact-symbol assertions — CURLE_COULDNT_CONNECT(7) with the host detail from CURLOPT_ERRORBUFFER, CURLE_COULDNT_RESOLVE_HOST(6), and CURLE_FILE_COULDNT_READ_FILE(37).

Also note the CI was restructured in 4ccfaba to match the JsRuntimeHost/BabylonNative conventions: a ci.yml orchestrator calling reusable build-linux.yml / build-macos.yml workflows (checkout@v5, Ninja with gcc+clang jobs, pinned Xcode via xcode-select, test binaries run directly), so extending to Win32/UWP/Android jobs later follows the same shape as the sibling repos.

build-win32.yml mirrors JsRuntimeHost's: windows-2022 (VS2022 generator
pin), -A platform input, RelWithDebInfo with /m, crash-dump registry
staging and artifact upload on failure.

Test fixtures now compile on MSVC: Winsock variant of RefusingPort
(WSAStartup, SOCKET/closesocket), _getpid, and file:///C:/... URL
formation. The five assertions that depend on transport-error detail
GTEST_SKIP on Windows with an explicit message, since the Windows
backend does not populate the new accessors yet -- the gap stays visible
in test output while the suite (and the local-file success case, which
runs for real) stays green.
@matthargett

Copy link
Copy Markdown
Author

Win32 is now in the CI matrix per Microsoft-project table stakes: build-win32.yml mirrors JsRuntimeHost's (windows-2022 / VS2022 generator pin, -A platform input, RelWithDebInfo with /m, crash-dump staging + artifact upload on failure), with a Win32_x64 job in ci.yml.

All four jobs green on the fork mirror: https://github.com/rebeckerspecialties/UrlLib/actions/runs/27312871316

On Windows the suite builds and runs (Winsock fixture variant, _getpid, file:///C:/... URL formation): the local-file success case passes for real — validating the Win32 backend's file path and the empty-ErrorString() no-error contract — while the five assertions that need transport-error detail GTEST_SKIP with an explicit "the Windows backend does not populate transport-error detail yet" message, so the platform gap stays visible in test output without a permanently red job. Populating the Windows backend (the winrt::hresult_error catch sites in UrlRequest_Windows_Shared.h) flips those skips to live coverage as the follow-up.

The GitHub Actions setup now lives in its own PR so the infrastructure
can be reviewed independently; it depends on this PR's UrlLibTests
target and lands after it.
@matthargett

Copy link
Copy Markdown
Author

Repo-infrastructure split: the CI workflows have been moved out of this PR into #32 so they can be reviewed on their own merits — this PR is now scoped to the error-reporting code, tests, and docs (the .github/ files are gone from the diff as of c2c7fce).

Landing order: this PR first, then #32. #32's Run Tests steps execute the UrlLibTests target introduced here, so it depends on this one (noted in its description).

The fork mirror (rebeckerspecialties#1) now validates the combined state of both PRs — it targets the #32 branch as its base with this branch as the head, so every push to either PR re-runs the full Win32/macOS/Linux matrix against exactly what main will look like after both land.

@matthargett

Copy link
Copy Markdown
Author

Correction to the mirror note above: the validation mirror is now rebeckerspecialties#2 (a fork-only branch combining this PR with #32's workflows — base-side-only workflow files turn out not to trigger pull_request runs, so the combined-head shape it is). Post-split state validated green across all four jobs: https://github.com/rebeckerspecialties/UrlLib/actions/runs/27313677186

matthargett added a commit to rebeckerspecialties/UrlLib that referenced this pull request Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants