Expose normalized transport-error detail (ErrorString/ErrorSymbol/ErrorCode)#31
Expose normalized transport-error detail (ErrorString/ErrorSymbol/ErrorCode)#31matthargett wants to merge 5 commits into
Conversation
Transport failures (DNS, connection refused, TLS, missing local files) previously completed with StatusCode 0 and no further information: the Apple backend discarded the NSError (an existing TODO), and the curl backend caught and dropped its own failure exception. Consumers like JsRuntimeHost's fetch/XMLHttpRequest polyfills could only report 'network request failed', which makes field crash reports undiagnosable. New accessors, empty/zero when the request did not fail at the transport layer, normalized for log-pipeline (Splunk etc.) substring filtering: ErrorString() -> "<domain>:<symbol>(<code>): <detail>" ErrorSymbol() -> e.g. CURLE_COULDNT_RESOLVE_HOST, NSURLErrorTimedOut ErrorCode() -> raw numeric platform code - Apple: capture NSError domain/code/description (resolves the TODO), walking the underlying-error chain for distinct-code levels; missing app:/// bundle resources report urllib:AppResourceNotFound(0). - curl: CURLOPT_ERRORBUFFER detail (host/port/path specifics) with curl_easy_strerror fallback and stable CURLE_* symbols. - Purely additive: SendAsync still completes normally with status 0, so existing consumers are unaffected until they opt in. - Windows/Android backends still report empty/zero (follow-up). Adds offline-deterministic gtest coverage (7 cases: success, missing file, connection refused, NXDOMAIN via RFC 6761 .invalid, grammar shape, reuse-clears-error, missing app resource) and a CI workflow running them on ubuntu-latest (curl path) and macos-latest (NSURLSession path).
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds transport-level error introspection to UrlRequest (normalized error string + stable symbol + numeric code) and validates behavior via new cross-platform unit tests and CI.
Changes:
- Expose
UrlRequest::ErrorString(),ErrorSymbol(), andErrorCode()in the public API and shared base implementation. - Populate transport error details for Apple (
NSURLSession/NSError) and Unix (libcurl) backends. - Add GTest-based unit tests + CMake plumbing, and run them in GitHub Actions on Ubuntu/macOS.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Tests/UrlRequestErrorReporting.cpp | Adds offline-deterministic tests for new transport error accessors (file, refused connection, DNS failure, reopen clearing). |
| Tests/CMakeLists.txt | Introduces GTest via FetchContent and builds/runs a UrlLibTests executable. |
| Source/UrlRequest_Unix.cpp | Captures CURLOPT_ERRORBUFFER detail and maps common CURLcode values to stable symbols. |
| Source/UrlRequest_Shared.h | Wires new UrlRequest accessors through to the impl. |
| Source/UrlRequest_Base.h | Adds shared storage/reset + normalized SetError() formatting. |
| Source/UrlRequest_Apple.mm | Maps common NSURLErrorDomain codes to stable symbols and builds chained detail from underlying errors. |
| README.md | Documents transport error contract, accessors, examples, and platform support. |
| Include/UrlLib/UrlLib.h | Adds public declarations/docs for error accessors; includes <cstdint> for int32_t. |
| CMakeLists.txt | Adds URLLIB_TESTS option and conditionally enables/creates the Tests subdir. |
| .github/workflows/ci.yml | Adds CI job to configure/build/test on ubuntu-latest and macos-latest. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Block until SendAsync completes. Returns false on timeout (a hung transport), in | ||
| // which case the shared promise keeps the continuation's target alive regardless. | ||
| [[nodiscard]] bool SendAndWait(UrlLib::UrlRequest& request) | ||
| { | ||
| auto done = std::make_shared<std::promise<void>>(); | ||
| auto future = done->get_future(); | ||
|
|
||
| request.SendAsync().then(arcana::inline_scheduler, arcana::cancellation::none(), | ||
| [done](const arcana::expected<void, std::exception_ptr>&) { | ||
| done->set_value(); | ||
| }); | ||
|
|
||
| return future.wait_for(std::chrono::seconds{120}) == std::future_status::ready; | ||
| } |
There was a problem hiding this comment.
Adopted in 06b5287: the wait is now 30s and the helper calls request.Abort() before failing on timeout.
One honest caveat from tracing the backends while making this change: UrlRequest::Abort() only cancels m_cancellationSource, which the curl and NSURLSession backends never observe — so cancellation is only actually effective on the Windows backend today, and on the platforms this CI runs the destructor can still block until the transport gives up on its own. The real bound for this suite is that every scenario is offline-deterministic (loopback refusal, RFC 6761 .invalid NXDOMAIN, local files), so completion is fast by construction; the 30s wait and ctest --timeout are backstops. Making Abort() interrupt the Apple/curl transports is part of the follow-up chain this PR feeds (fetch AbortSignal wiring in JsRuntimeHost).
| uint16_t AcquireClosedPort() | ||
| { | ||
| const int fd = ::socket(AF_INET, SOCK_STREAM, 0); | ||
| EXPECT_GE(fd, 0); | ||
|
|
||
| sockaddr_in address{}; | ||
| address.sin_family = AF_INET; | ||
| address.sin_addr.s_addr = htonl(INADDR_LOOPBACK); | ||
| address.sin_port = 0; | ||
| EXPECT_EQ(::bind(fd, reinterpret_cast<const sockaddr*>(&address), sizeof(address)), 0); | ||
|
|
||
| socklen_t addressLength = sizeof(address); | ||
| EXPECT_EQ(::getsockname(fd, reinterpret_cast<sockaddr*>(&address), &addressLength), 0); | ||
| const uint16_t port = ntohs(address.sin_port); | ||
|
|
||
| ::close(fd); | ||
| return port; | ||
| } |
There was a problem hiding this comment.
Both points adopted in 06b5287, with one twist based on my on-device testing. The helper is now a move-only RAII RefusingPort whose acquisition returns std::optional with ASSERT_TRUE at call sites, so setup failure stops the test instead of continuing on a bad fd.
On the hold-the-bind suggestion: it turns out to be platform-dependent. On Linux a SYN to a bound-but-not-listening port is refused with RST, so there the socket is kept bound for the test's lifetime exactly as you describe — race-free. On Darwin, though, that same SYN is silently dropped: the connect attempt times out (NSURLErrorTimedOut(-1001)) instead of being refused (-1004) — verified by running it. So on Apple the port number is reserved via bind/getsockname and then closed before connecting, accepting the microseconds-wide reuse window. The class documents the split.
| std::filesystem::path WriteTempFile(const char* name, const char* contents) | ||
| { | ||
| const auto path = std::filesystem::temp_directory_path() / name; | ||
| std::ofstream stream{path, std::ios::trunc}; | ||
| stream << contents; | ||
| return path; | ||
| } |
There was a problem hiding this comment.
Adopted in 06b5287 as an RAII TempFile that suffixes the process id (so parallel jobs sharing a temp directory get distinct paths) and removes the file in its destructor; the missing-file test reuses the same fixture with null contents for a guaranteed-absent path. One note on the suggested API: std::filesystem::unique_path is Boost-only — it never made it into std::filesystem — hence the pid-suffixed naming.
…imeout - RefusingPort: race-free held-bind refusal on Linux; on Darwin a SYN to a bound-but-not-listening port is dropped (times out, -1001) rather than refused, so close after reserving the port number there. optional<> + ASSERT at call sites instead of EXPECT inside the helper. - TempFile: pid-suffixed unique names, removed on destruction. - SendAndWait: 30s wait, Abort() on timeout (currently only effective on the Windows backend; the offline-deterministic scenarios are the real bound).
Replace the single-file matrix workflow with the family pattern used by both sibling repos: a ci.yml orchestrator calling reusable per-platform workflow_call files (build-linux.yml, build-macos.yml), checkout@v5, timeout-minutes, Ninja + explicit gcc/clang jobs on Linux, pinned Xcode selection with -G Xcode on macOS, Build/<platform> dirs, and running the test binary directly. Verified the macOS steps locally (Xcode generator, RelWithDebInfo, 7/7 tests).
|
CI evidence while first-time-workflow approval is pending here: this branch is mirrored as a draft PR on our fork (rebeckerspecialties#1), where the workflows ran green across all three jobs — That includes the first real execution of the libcurl error path (previously only syntax-checked): the Linux jobs pass the exact-symbol assertions — Also note the CI was restructured in 4ccfaba to match the JsRuntimeHost/BabylonNative conventions: a |
build-win32.yml mirrors JsRuntimeHost's: windows-2022 (VS2022 generator pin), -A platform input, RelWithDebInfo with /m, crash-dump registry staging and artifact upload on failure. Test fixtures now compile on MSVC: Winsock variant of RefusingPort (WSAStartup, SOCKET/closesocket), _getpid, and file:///C:/... URL formation. The five assertions that depend on transport-error detail GTEST_SKIP on Windows with an explicit message, since the Windows backend does not populate the new accessors yet -- the gap stays visible in test output while the suite (and the local-file success case, which runs for real) stays green.
|
Win32 is now in the CI matrix per Microsoft-project table stakes: All four jobs green on the fork mirror: https://github.com/rebeckerspecialties/UrlLib/actions/runs/27312871316 On Windows the suite builds and runs (Winsock fixture variant, |
The GitHub Actions setup now lives in its own PR so the infrastructure can be reviewed independently; it depends on this PR's UrlLibTests target and lands after it.
|
Repo-infrastructure split: the CI workflows have been moved out of this PR into #32 so they can be reviewed on their own merits — this PR is now scoped to the error-reporting code, tests, and docs (the Landing order: this PR first, then #32. #32's The fork mirror (rebeckerspecialties#1) now validates the combined state of both PRs — it targets the #32 branch as its base with this branch as the head, so every push to either PR re-runs the full Win32/macOS/Linux matrix against exactly what |
|
Correction to the mirror note above: the validation mirror is now rebeckerspecialties#2 (a fork-only branch combining this PR with #32's workflows — base-side-only workflow files turn out not to trigger |
Closes the transport-error information gap on the Apple and libcurl backends: failures like DNS errors, refused connections, TLS problems, and missing local files previously completed with
StatusCode() == 0and no other information. The Apple backend discarded theNSError(the existing// TODO: Consider logging or otherwise exposing the error message...), and the curl backend caught and dropped its own failure exception (necessarily, to protect itsstd::thread), so consumers like JsRuntimeHost'sfetch/XMLHttpRequestpolyfills can only say "network request failed". For applications that embed Babylon Native and live off field crash reports, every failure class is currently indistinguishable.API (purely additive)
SendAsync()still completes normally with status 0 on transport failure, so existing consumers are completely unaffected until they read the new accessors (same opt-in shape asStatusText()in #30).Normalized, greppable shape
<domain>and<symbol>are stable ASCII tokens, so observability pipelines (Splunk etc.) can filter classes of failure with exact substrings like"curl:CURLE_COULDNT_RESOLVE_HOST"or"nsurl:NSURLErrorTimedOut";<detail>carries the platform's human-readable message including host/port/path specifics. Real output from this branch (macOS):and on Linux/curl,
CURLOPT_ERRORBUFFERdetail is preferred over genericcurl_easy_strerrortext:Notes:
NSURLErrorDomaincodes map to stable symbols; other domains pass through verbatim with a synthesized symbol. The underlying-NSErrorchain is walked (bounded, deduped by code) so POSIX-level specifics surface when present. The<detail>text is OS-localized on Apple; domain/symbol/code are the stable filterable parts.app:///-resource case (probably the most common asset-loading failure in shipped apps) is now distinguishable from a network failure viaurllib:AppResourceNotFound(0).Open()throw on Apple now includes the offending URL string.Scope
Apple (
NSURLSession) and Linux (libcurl) backends per the priority consumers; Windows and Android backends keep reporting empty/zero through the shared base (theSetErrorplumbing inUrlRequest_Base.his ready for them as a follow-up).Tests
Tests/UrlRequestErrorReporting.cppadds 7 offline-deterministic gtest cases: local-file success (no error), missing local file, connection refused (ephemeral bound-then-closed loopback port), NXDOMAIN (RFC 6761.invalidTLD), grammar-shape regex, request-reuse clears prior error, and missingapp:///resource (Apple). Tests are built only when UrlLib is the top-level project (URLLIB_TESTS, default ON for desktop top-level builds, OFF when consumed via FetchContent), so JsRuntimeHost et al. see no change. I looked at how bun, nodejs, and WPT tests around this area.CI for this repo is added separately in #32 (Win32 / macOS / Linux, mirroring the JsRuntimeHost workflow conventions). Landing order: this PR first, then #32 — its
Run Testssteps execute theUrlLibTeststarget introduced here.Verified: built and ran the suite on macOS arm64 (7/7 passing); curl TU compiles clean with
-Wall -Wextra(clang); Linux (gcc + clang), macOS, and Win32 execution via the #32 workflows running against this branch on the fork mirror (rebeckerspecialties#1): https://github.com/rebeckerspecialties/UrlLib/actions/runs/27312871316Context / follow-ups
This is step 1 of improving fetch-failure telemetry / crash reporting for BabylonNative/JsRuntimeHost (see the fetch polyfill added in BabylonJS/JsRuntimeHost#188, whose transport-failure rejection is currently the constant string
"fetch: network request failed"). To have actionable support information for triaging/fixing, these errors need to have the non-PII fidelity and reach the top-level abort/reject handler that Sentry/bugsnag/etc hook.Follow-up in JsRuntimeHost as the next hop in the exception/reject accuracy chain: reject with a
TypeErrorcarrying a stable message pluscause/codeproperties populated from these accessors, mirroring how #30'sStatusText()was consumed by BabylonJS/JsRuntimeHost#195.Noticed while here (not addressed in this PR):
UrlRequest::Abort()only has an effect on the Windows backend today, and the curl backend rejectsPOSTwith "Not implemented".