Skip to content

Add UDS/RPC bridge benchmark suite with standalone app#55

Open
gmaclennan wants to merge 36 commits intomainfrom
claude/benchmark-uds-rpc-bridge-1Zahz
Open

Add UDS/RPC bridge benchmark suite with standalone app#55
gmaclennan wants to merge 36 commits intomainfrom
claude/benchmark-uds-rpc-bridge-1Zahz

Conversation

@gmaclennan
Copy link
Copy Markdown
Member

@gmaclennan gmaclennan commented May 1, 2026

Summary

Adds a UDS/RPC bridge benchmark suite as a standalone Expo app under apps/benchmark/. Exercises the same RN → native → nodejs-mobile path real users hit, but with @comapeo/core stripped out so framing / IPC / JSON-RPC overhead measures cleanly. Runs locally via Maestro and on a curated BrowserStack device sweep; results land as NDJSON spans plus a refreshed RESULTS.md.

Production-code surface for the bench app is intentionally narrow — see "Module override surface" below.

What lands

Bench app (apps/benchmark/)

  • backend/ — minimal nodejs-mobile entry that reuses the production state machine (pre-listeningstartedready) and path-imports the framing helpers (server-helper.js, simple-rpc.js, message-port.js) from the production backend/lib/ so the wire framing is bit-identical. Drops @comapeo/core entirely. BenchRpcServer registers echo, payload(sizeBytes), and ingestSpans methods; rolled up to a single dist/index.bench.mjs.
  • App.tsx — RN-side bench client. Talks to the bench backend via unstable_messagePort (see below). Runs warmup + steady-state sweep across 64 B / 1 KB / 64 KB / 1 MB payload classes (10 + 100 iterations per size), records per-RPC RTT, renders an on-screen p50/p95/p99 panel, exports NDJSON via the share sheet.
  • backend/lib/telemetry-sink.js — pluggable sink interface with LogSink (default; one BENCH_SPAN <json> stdout line per span — surfaces in Android logcat and iOS device console), JsonFileSink (NDJSON to <app sandbox>/Documents/comapeo-bench/<runId>.ndjson), and NoopSink.
  • backend/lib/boot-spans.js — wraps listen-control / init / construct boot phases with Sentry-shaped spans so the eventual Sentry adapter (§7.4.2 of the Sentry plan) can adopt the same call sites.
  • .maestro/ — local-run flows (one per payload class plus a sweep), plus workspace config.yaml.
  • plugins/with-comapeo-bench/ — Expo config plugin (see "Module override surface" for what it sets). Idempotent across expo prebuild reruns.
  • scripts/build-ipa.sh — builds a Development-export IPA (com.comapeo.core.benchmark bundle id) for BrowserStack; auto-resigned on upload.
  • README.md + RESULTS.md — architecture, run instructions, current measured baseline.

Module override surface (production code)

The bench app drops a sibling entry file (index.bench.mjs) into the consumer's nodejs-project/ and tells the module's loader to run it instead of the production index.mjs. Single override:

  • comapeoEntryFile — Gradle property → BuildConfig.COMAPEO_ENTRY_FILE on Android; ComapeoEntryFile Info.plist key on iOS. Defaults to index.mjs. AGP merges the bench file with the library bundle on Android; an Xcode Run Script build phase copies it into <App>.app/nodejs-project/ after CocoaPods' resource-copy phase on iOS.

Two adjacent additions that aren't bench-only:

  • --device=<MANUFACTURER MODEL (Android REL)> / --device=Apple <model> (<systemName> <systemVersion>) appended to the nodejs-mobile argv unconditionally. Production backend ignores unknown positional flags; the bench backend reads it for span attribution; Sentry tagging will read it once that lands. Pure no-op for current production consumers.
  • backend/lib/message-port.js hardeningSocketMessagePort.postMessage now drops writes after close(), and a socket-level error listener swallows the ERR_STREAM_WRITE_AFTER_END race that otherwise surfaces as uncaughtException during graceful shutdown. Both are real production fixes (the race exists today; the bench shutdown sequence just makes it routine).

iOS-only opt-in (off by default for production):

  • ComapeoStdoutToOsLog Info.plist BOOL — when true, NodeMobileBridge.mm dup2s nodejs-mobile's stdout/stderr onto a pipe and forwards each line to os_log under the com.comapeo.nodejs subsystem. Lets BrowserStack capture BENCH_SPAN lines from the device console. Production consumers leave it unset and inherit the legacy routing (so unredacted JS log lines stay out of the unified log).

unstable_messagePort export (src/ComapeoCoreModule.ts)

Raw CoreMessagePort singleton — escape hatch for consumers that need to bypass the MapeoClient request/response machinery and speak directly to whatever backend bundle they've wired in. The bench app uses it. unstable_ prefix follows the React/RN convention for surfaces whose shape may change without notice; production consumers should keep using comapeo.

Host-side runner

  • scripts/run-on-browserstack.tsnpm run bench:browserstack. Queries BrowserStack /plan.json for parallel + queued cap, chunks a curated 10-device Android sweep into builds that fit, dispatches each, polls until terminal, pulls per-device logcat, greps BENCH_SPAN lines into one NDJSON per device under apps/benchmark/results/. No BrowserStackLocal tunnel, no host-side receiver process, no cleartext-traffic config.
  • scripts/bench-summarize.tsnpm run bench:summarize. Refreshes apps/benchmark/RESULTS.md from the pulled NDJSONs (per-device p50/p95/p99 per payload class, plus rttSide:"backend" vs rttSide:"rn" columns for the bridge-overhead diff).

CI plan (docs/bench-ci-plan.md)

Scaffolding for a manual-trigger benchmark workflow that posts artifacts back to GitHub. Implementation lands in a follow-up PR.

Notable design choices

  • Wire framing is bit-identical to production. The bench backend imports server-helper.js, simple-rpc.js, and message-port.js from the production backend/lib/ via path-relative imports so any divergence in framing would invalidate the benchmark's premise. Rollup inlines them at bundle time.
  • One single ESM bundle for both platforms. The bench code never imports @comapeo/core (no iOS maps-plugin stub needed) and never loads native addons (no per-platform __loadAddon banner needed).
  • No HTTP transport, no receiver, no tunnel. Span transport is console.log("BENCH_SPAN " + JSON.stringify(span)) from the backend. Android picks up via logcat; iOS picks up via the opt-in os_log redirect; the BS runner pulls device logs after each build and greps. RN-side spans round-trip through the bench RPC's ingestSpans method so they emit through the same path (iOS release builds suppress JS console.log via RCTLog's level filter, so RN-direct logging doesn't reach the device console).
  • payload cache. Pre-allocates and caches synthesized payloads per size (capped at 4 MiB resident) so a mixed-size sweep doesn't spend its time in String.repeat.

Sentry alignment

Boot phases (boot.listen-control, boot.init, boot.construct) and per-call RPC spans (rpc.echo, rpc.payload) follow the Sentry-shaped span taxonomy in §7.4.2 of the Sentry plan. The eventual SentryAdapterSink (Phase 5 in the bench README) implements the same surface, so the call sites stay unchanged when the production loader adopts shared instrumentation.

Dependency

  • Built on top of fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key #57 (drop setUnlockedDeviceRequired from rootkey wrapper key, which landed on main). Without that, BrowserStack's stock no-screen-lock fleet would fail wrapper-key generation at FGS startup; an earlier iteration of this branch carried a comapeoStubRootKey opt-out hook to work around it, which is no longer needed and has been removed.

Test plan

  • cd apps/benchmark && npm install && npm run prebuild && npm run android — bench app reaches STARTED, run-benchmark tap produces a results panel.
  • Same on iOS via npm run ios.
  • npm run --prefix apps/benchmark/backend build produces dist/index.bench.mjs (and sourcemap).
  • apps/example/ (production consumer) builds + runs unchanged — comapeoEntryFile defaults to index.mjs, no Info.plist keys set, FGS reaches STARTED, RPC works.
  • npm run bench:browserstack -- --app-android <apk> --app-ios <ipa> (with credentials in .env) dispatches against the curated sweep, pulls NDJSONs into apps/benchmark/results/.
  • npm run bench:summarize refreshes apps/benchmark/RESULTS.md.
  • Manual smoke on a no-screen-lock device confirms wrapper-key generation succeeds (relies on fix(android): drop setUnlockedDeviceRequired from rootkey wrapper key #57).

🤖 Generated with Claude Code

claude and others added 4 commits May 1, 2026 00:08
Plans an opt-in, host-app-driven Sentry integration covering:
- error capture across backend (Node), JS/RN, and native layers
- RPC tracing via @comapeo/ipc onRequestHook (mirrors comapeo-mobile)
- forwarding @comapeo/core OpenTelemetry spans (PR digidem/comapeo-core#1051)
- app-specific gating so non-CoMapeo consumers ship no Sentry traffic

https://claude.ai/code/session_01EcVXzczA1TVkhEkgUg9DKX
Closes the FGS-cold-start gap where the prior draft required RN to
be alive before backend Sentry could initialize:

- §4 reworked: Expo config plugin writes DSN/environment/release
  into Android manifest meta-data and iOS Info.plist at prebuild
  time. Native reads those at process start, no JS round-trip,
  before booting @sentry/node and @sentry/android.

- §7.4 added: native telemetry data design mapped onto Sentry
  primitives (breadcrumbs for state transitions, transaction +
  spans for boot/shutdown phases, captureMessage for timeouts,
  tags/contexts for cross-process attribution). Categorizes
  captures as essential vs opt-in and documents a hard
  never-capture list for PII.

- §9 added: persisted "capture application data" toggle with
  restart-to-activate semantics. Snapshot read at boot, embedded
  in the init frame; gates per-RPC spans, sync-session
  transactions, memory checkpoints, and storage-size sampling.
  Never unlocks the never-capture list.

- §10 phasing and §13 file-change list updated. New open
  questions added for release tagging, plugin no-op behavior,
  toggle UI, and boot sample rate.

https://claude.ai/code/session_01EcVXzczA1TVkhEkgUg9DKX
Adds a stripped bench backend (`backend/index.bench.js` + bench RPC
server with echo / payload methods) and a sibling `apps/benchmark/`
app that drives it through the same RN→native→Node UDS path as
production, isolating the framing / IPC / RPC bridge from
@comapeo/core init noise.

Consumer isolation is enforced three ways:
- the bench bundle lands at sibling paths (`android/src/bench/assets/`,
  `ios/nodejs-project-bench/`) the production flavor / podspec don't
  reference;
- a new Android `bench` productFlavor + iOS `ENV['COMAPEO_BENCH']`
  podspec toggle is opt-in only;
- `package.json` files array negates both bench paths so they cannot
  leak via `npm publish`.

`apps/benchmark/` does not check in `android/` or `ios/` — the new
`with-comapeo-bench` Expo config plugin re-applies the variant /
env-var / Xcode rename build phase wiring on every `expo prebuild`.

Standalone-runnable: NDJSON sink + on-screen p50/p95/p99 work without
any host-side infrastructure. Optional HTTP toggle posts spans to the
bundled `bench-receiver.ts` for orchestrated BrowserStack runs.
Maestro flows (bench-rpc + per-payload-size variants) drive the bench
end-to-end.

See `docs/uds-rpc-bridge-benchmark-plan.md` for the full design.

https://claude.ai/code/session_01SC1Sc9AvULHQkQSoQ2SMzJ
Three fixes surfaced when running the bench app end-to-end on a
Pixel 7a API 29 emulator:

- **Replace Android productFlavor with a project property.** The
  `bench` / `production` flavor dimension on the lib triggered AGP /
  Gradle 9 strict variant ambiguity in consuming Expo apps that don't
  declare matching flavors of their own (apps/expo#18315 etc.):
  `missingDimensionStrategy` + `matchingFallbacks` weren't enough to
  disambiguate `benchDebugApiElements` vs. `productionDebugApiElements`.
  The lib now reads `rootProject.findProperty('comapeoBench')` and
  swaps `assets.srcDirs` with `=` (assignment, not `srcDirs '<...>'`
  which AGP treats as additive). Also empties `src/debug/assets` when
  bench is active so the production debug bundle doesn't overlay
  bench in debug builds. The `with-comapeo-bench` config plugin
  switches from `withAppBuildGradle` to `withGradleProperties` and
  writes `comapeoBench=true` into the consuming app's
  `android/gradle.properties`.

- **Pin Expo modules to SDK 55.** `expo-file-system@19.0.18` and
  `expo-sharing@14.0.7` (the latest npm versions) are SDK-incompatible
  with Expo 55 and crashed the JS app at launch with a
  `NoClassDefFoundError: FilePermissionModuleInterface` autolinking
  failure. `npx expo install` resolves them to `~55.0.17` /
  `~55.0.18` which match the rest of the SDK.

- **Add `bench-rpc-ios.yaml` Maestro flow.** The Android flow's
  `clearState: true` triggers a deep-link confirmation dialog on iOS
  that blocks the rest of the run. The iOS flow drops `clearState`
  and dismisses the dialog with a guarded `runFlow.when` block.

Validation results on Pixel 7a API 29 emulator (debug build, RN-thread
RTT in ms, 100 iterations after 10-iteration warmup):

  size  n    p50   p95   p99
  64B   100  1.65  2.56  7.34
  1KB   100  1.68  2.76  4.45
  64KB  100  2.48  4.70  6.29

iOS run blocked by a pre-existing lifecycle issue
(`AppLifecycleDelegate.applicationDidBecomeActive` doesn't fire under
scene-based app lifecycle, so `NodeJSService.start()` is never called)
— same code path the example app uses, so this is not a bench
regression. Tracked separately.

https://claude.ai/code/session_01SC1Sc9AvULHQkQSoQ2SMzJ
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an isolated UDS/RPC bridge benchmarking setup (standalone Expo app + minimal bench backend bundle) plus host-side span collection, with build-time gating to prevent benchmark artefacts from leaking into normal consumer apps/packages.

Changes:

  • Exposes benchMessagePort from @comapeo/core-react-native and adds a bench-only backend entrypoint (backend/index.bench.js) with minimal RPC + span instrumentation helpers.
  • Introduces a standalone Expo benchmark app (apps/benchmark/) and Maestro flows to automate benchmark runs across payload sizes.
  • Adds build/packaging plumbing for a separate bench bundle output tree and a host-side HTTP receiver to collate NDJSON + CSV summaries.

Reviewed changes

Copilot reviewed 28 out of 34 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/index.ts Re-exports benchMessagePort from the module surface.
src/ComapeoCoreModule.ts Exposes the raw CoreMessagePort singleton as benchMessagePort.
scripts/lib/bench-receiver.ts Adds a localhost HTTP receiver that persists spans and rewrites a CSV summary.
scripts/build-backend.ts Adds --bench mode to build only the bench JS bundle into bench-specific output paths.
package.json Updates files allowlist to exclude bench output paths from publishing.
ios/ComapeoCore.podspec Adds ENV['COMAPEO_BENCH'] conditional resource selection for bench bundle.
e2e/.maestro/bench-rpc.yaml Maestro flow for the default benchmark sweep on Android.
e2e/.maestro/bench-rpc-ios.yaml iOS-specific Maestro flow variant (handles the “Open” dialog and avoids clearState).
e2e/.maestro/bench-payload-64KB.yaml Maestro flow for a 64KB-only payload run.
e2e/.maestro/bench-payload-64B.yaml Maestro flow for a 64B-only payload run.
e2e/.maestro/bench-payload-1MB.yaml Maestro flow for a 1MB-only payload run.
e2e/.maestro/bench-payload-1KB.yaml Maestro flow for a 1KB-only payload run.
docs/uds-rpc-bridge-benchmark-plan.md Adds a design/verification plan for the benchmark suite and consumer isolation.
backend/rollup.config.ts Adds BENCH=1 rollup mode and trims static assets for the bench bundle.
backend/lib/telemetry-sink.js Adds pluggable telemetry sinks and span helpers (startSpan).
backend/lib/boot-spans.js Adds startBootSpan helper with a fixed boot-phase taxonomy.
backend/lib/bench-rpc.js Adds a minimal bench RPC server (echo/payload) with payload caching and span emission.
backend/index.bench.js Adds bench-only node entrypoint reusing the lifecycle framing but skipping @comapeo/core.
apps/benchmark/tsconfig.json Bench app TS config + local path mapping to the working tree module source.
apps/benchmark/plugins/with-comapeo-bench/index.js Expo config plugin to opt an app into bench resources (Gradle property + Podfile env + Xcode rename script).
apps/benchmark/package.json Benchmark app package manifest + dependencies and run scripts.
apps/benchmark/metro.config.js Metro config (mirrors example) for monorepo-style dev and avoiding duplicate peers.
apps/benchmark/index.ts Bench app entrypoint registering root component.
apps/benchmark/babel.config.js Bench app Babel config.
apps/benchmark/assets/splash-icon.png Bench app splash asset.
apps/benchmark/assets/icon.png Bench app icon asset.
apps/benchmark/assets/favicon.png Bench app favicon asset.
apps/benchmark/assets/adaptive-icon.png Bench app adaptive icon asset.
apps/benchmark/app.json Bench app Expo config + plugin wiring.
apps/benchmark/App.tsx Bench UI + RPC client + NDJSON writing + optional POST-to-receiver flow.
apps/benchmark/.gitignore Ignores generated native folders and local Expo/Metro artifacts for the bench app.
android/build.gradle Adds comapeoBench property gate to swap module asset source dirs for bench bundle selection.
.gitignore Ignores bench bundle output dirs alongside the existing production bundle outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/benchmark/App.tsx Outdated
} from "@comapeo/core-react-native";
import { Directory, File, Paths } from "expo-file-system";
import * as Sharing from "expo-sharing";
import React, { useCallback, useEffect, useMemo, useRef, useState } from "react";
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useMemo is imported but never used in this file. This will trip lint/TS unused import checks; please remove it or use it.

Suggested change
import React, { useCallback, useEffect, useMemo, useRef, useState } from "react";
import React, { useCallback, useEffect, useRef, useState } from "react";

Copilot uses AI. Check for mistakes.
Comment thread apps/benchmark/App.tsx Outdated
Comment on lines +113 to +118
// Linear interpolation between closest ranks. For our sample sizes
// (~100), `Math.floor((n-1) * p)` is good enough and avoids the
// off-by-one trap of `Math.floor(n * p)` (which would index past the
// end at p=1).
const idx = Math.floor((sortedAsc.length - 1) * p);
return sortedAsc[idx]!;
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percentile() claims linear interpolation, but the implementation returns a nearest-rank element (Math.floor((n-1)*p)). Either implement the stated interpolation (so p95/p99 match the documented method) or adjust the comment/docs to match the actual behavior.

Suggested change
// Linear interpolation between closest ranks. For our sample sizes
// (~100), `Math.floor((n-1) * p)` is good enough and avoids the
// off-by-one trap of `Math.floor(n * p)` (which would index past the
// end at p=1).
const idx = Math.floor((sortedAsc.length - 1) * p);
return sortedAsc[idx]!;
// Linear interpolation between closest ranks.
const position = (sortedAsc.length - 1) * p;
const lowerIdx = Math.floor(position);
const upperIdx = Math.ceil(position);
if (lowerIdx === upperIdx) return sortedAsc[lowerIdx]!;
const lower = sortedAsc[lowerIdx]!;
const upper = sortedAsc[upperIdx]!;
const weight = position - lowerIdx;
return lower + (upper - lower) * weight;

Copilot uses AI. Check for mistakes.
Comment thread apps/benchmark/App.tsx Outdated
Comment thread scripts/lib/bench-receiver.ts Outdated
Comment on lines +77 to +80
function percentile(sortedAsc: number[], p: number): number {
if (sortedAsc.length === 0) return Number.NaN;
return sortedAsc[Math.floor((sortedAsc.length - 1) * p)]!;
}
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

percentile() currently uses a nearest-rank lookup (Math.floor((n-1)*p)). The PR description/plan mention linear interpolation for p50/p95/p99; if that's the intended definition, this summary CSV will not match it. Either implement the intended interpolation here or document that the receiver uses nearest-rank percentiles.

Copilot uses AI. Check for mistakes.
Comment thread docs/uds-rpc-bridge-benchmark-plan.md Outdated
Comment on lines +98 to +104
- Android: the plugin uses `withAppBuildGradle` to append
`flavorDimensions += "comapeo"` and
`missingDimensionStrategy 'comapeo', 'bench'` to the bench app's
`android/app/build.gradle` `defaultConfig`. The module's own
`android/build.gradle` declares the `bench` flavor + sourceSet;
consumers that don't activate it (`apps/example/`, third-party
apps) get the default flavor and never see `src/bench/`.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc describes Android consumer isolation in terms of a bench productFlavor and missingDimensionStrategy, but the implementation in this PR uses a comapeoBench=true Gradle property (set via withGradleProperties) to swap assets.srcDirs instead. Please update this section so the plan matches what actually shipped (otherwise it's misleading for anyone following it).

Copilot uses AI. Check for mistakes.
Comment thread apps/benchmark/App.tsx Outdated
Comment on lines +23 to +26
* stripped `backend/index.bench.js` (via the `bench` Android
* productFlavor / `ENV['COMAPEO_BENCH']` iOS opt-in) — so timings
* isolate the framing / IPC / JSON-RPC bridge from `@comapeo/core` init
* noise. See `docs/uds-rpc-bridge-benchmark-plan.md`.
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says the bench backend is selected via an Android bench productFlavor, but the actual wiring in this PR uses the comapeoBench=true Gradle property (see android/build.gradle). Please update the comment so it matches the implementation.

Suggested change
* stripped `backend/index.bench.js` (via the `bench` Android
* productFlavor / `ENV['COMAPEO_BENCH']` iOS opt-in) so timings
* isolate the framing / IPC / JSON-RPC bridge from `@comapeo/core` init
* noise. See `docs/uds-rpc-bridge-benchmark-plan.md`.
* stripped `backend/index.bench.js` (via the `comapeoBench=true`
* Gradle property on Android / `ENV['COMAPEO_BENCH']` iOS opt-in)
* so timings isolate the framing / IPC / JSON-RPC bridge from
* `@comapeo/core` init noise. See
* `docs/uds-rpc-bridge-benchmark-plan.md`.

Copilot uses AI. Check for mistakes.
Comment thread apps/benchmark/App.tsx Outdated
* noise. See `docs/uds-rpc-bridge-benchmark-plan.md`.
*
* UI surface:
* - boot status (state observer): waits for "READY" before enabling
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI comment says the app waits for "READY" before enabling the run button, but the code gates on serviceState === "STARTED" (and ComapeoState doesn't include "READY"). Please update the comment to avoid confusion about which state is required for RPC.

Suggested change
* - boot status (state observer): waits for "READY" before enabling
* - boot status (state observer): waits for "STARTED" before enabling

Copilot uses AI. Check for mistakes.
Comment thread scripts/build-backend.ts Outdated
Comment on lines +46 to +49
// Bench bundle output. Lives under `src/bench/assets/` so AGP's
// per-flavor sourceSet merging picks it up only when the consuming app
// has activated the `bench` productFlavor (see android/build.gradle —
// `apps/benchmark/` activates this; `apps/example/` does not).
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the bench Android assets are picked up via a bench productFlavor/sourceSet merge, but the module now switches assets via the comapeoBench Gradle property (see android/build.gradle). Please update the comment so it matches the current mechanism.

Copilot uses AI. Check for mistakes.
Comment thread backend/rollup.config.ts Outdated
Comment on lines +44 to +45
* - `android/src/bench/assets/nodejs-project/` (overlaid by the
* `bench` Android productFlavor — see android/build.gradle)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bench-mode comment refers to the Android bench productFlavor for asset overlay, but Android selection is now controlled by the comapeoBench Gradle property (not flavors). Please update the comment to avoid sending readers to a mechanism that no longer exists.

Suggested change
* - `android/src/bench/assets/nodejs-project/` (overlaid by the
* `bench` Android productFlavor see android/build.gradle)
* - `android/src/bench/assets/nodejs-project/` (selected by the
* Android build when the `comapeoBench` Gradle property is enabled;
* this is no longer controlled by an Android productFlavor)

Copilot uses AI. Check for mistakes.
claude and others added 2 commits May 1, 2026 17:10
BLOCKER (iOS rename ordering): the previous design added an Xcode Run
Script build phase via the config plugin's `withXcodeProject`, but
CocoaPods 1.x doesn't reliably position user script phases after
`[CP] Copy Pods Resources` — the rename ran before the bench files
were on disk and silently no-op'd, leaving bench builds with no
`<App>.app/nodejs-project/` and a non-bootable runtime. Switch to
pod-install-time staging in `ComapeoCore.podspec`: when COMAPEO_BENCH=1
the podspec stages a copy of `nodejs-project-bench/` to
`.bench-staging/nodejs-project/` and adds it to `s.resources`
ALONGSIDE the production `nodejs-project/`. CocoaPods rsyncs both into
`<App>.app/nodejs-project/` in declaration order, with the bench
overlay landing on top — no script phase, no ordering footgun.

MAJOR (iOS resource fallback): previous design REPLACED `nodejs-project`
with `nodejs-project-bench`, so any rename failure left the app
non-bootable. New shape ships both: bench overlays prod, but if the
bench bundle is missing (forgot to run `--bench`) the prod bundle
remains as fallback.

MAJOR (shutdown race): an in-flight `SocketMessagePort.postMessage`
landing in streamx's deferred microtask after the AF_UNIX socket has
been ended raises `ERR_STREAM_WRITE_AFTER_END` past every listener.
The race is benign (the message was already destined for a torn-down
peer). Add a state-check + underlying-socket error listener in
`message-port.js`, and a targeted `uncaughtException` /
`unhandledRejection` filter in `index.bench.js` that swallows the
specific code while a graceful shutdown is in progress. Smoke test
now exits 0 with all spans + responses recorded; previous run hit
`fatal during runtime` and exit 1.

Copilot review feedback addressed:
- App.tsx: drop unused `useMemo`; replace nearest-rank percentile
  with linear-interpolation (matches PR description); add 30s
  per-request timeout + pending-map cleanup so a lost frame doesn't
  hang the run; update stale "READY" comment to "STARTED".
- bench-receiver.ts: same linear-interpolation fix so on-device and
  host-side numbers agree.
- Stale productFlavor / withXcodeProject references in App.tsx,
  scripts/build-backend.ts, backend/rollup.config.ts, and the plan
  doc updated to describe the actual `comapeoBench` Gradle property
  + podspec staging mechanism.

https://claude.ai/code/session_01SC1Sc9AvULHQkQSoQ2SMzJ
…rpc-bridge-1Zahz

* origin/main:
  fix(android): fold waitForFile into connect retry loop (#52)
@gmaclennan gmaclennan force-pushed the claude/benchmark-uds-rpc-bridge-1Zahz branch from 47024a2 to 5acd807 Compare May 5, 2026 10:56
@gmaclennan gmaclennan changed the base branch from claude/plan-sentry-integration-9dt0T to main May 5, 2026 10:59
gmaclennan and others added 19 commits May 5, 2026 12:45
Adds a generic config knob for consumers that ship their own backend
JS bundle: `comapeoBackendDir` Gradle property → BuildConfig field on
Android, `ComapeoBackendDir` Info.plist key on iOS. Default is
`nodejs-project` so behavior is unchanged for current consumers.

This unblocks moving bench-specific wiring out of the module: the bench
app can now ship its bundle in a sibling directory and just flip this
override, instead of relying on an in-module `comapeoBench=true` toggle
that swaps Android sourceSets and runs an iOS pod-install staging copy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves all bench-only backend source (`index.bench.js`, `bench-rpc.js`,
`boot-spans.js`, `telemetry-sink.js`) and its rollup config out of the
production module and into `apps/benchmark/backend/`. The bench bundle
is built from there with its own simplified rollup config: one ESM
output, no per-platform split, no native-addon banner (the bench code
imports no addons). Shared framing helpers (server-helper.js,
simple-rpc.js, message-port.js) stay in the module's `backend/lib/`
and are path-imported from the bench source so wire framing stays
bit-identical to production.

Rewrites `with-comapeo-bench` plugin against the new
`comapeoBackendDir` override hook: drops `comapeoBench=true` Gradle
toggle, drops `ENV['COMAPEO_BENCH']` Podfile mutation, drops the iOS
`.bench-staging` rsync trick. Now sets the override property/Info.plist
key and copies the bench bundle into the consumer app's own native
asset/resource trees (Android assets dir + iOS folder reference). Same
shape as `expo-asset`'s plugin, minus its file-extension allowlist and
flat-structure constraints which don't fit a JS bundle.

Strips `BENCH=1` mode from the module's rollup.config.ts and `--bench`
mode from scripts/build-backend.ts. Dead bench wiring still in the
module (`android/src/bench/`, `ios/nodejs-project-bench/`, podspec env
branch) is removed in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With the bench app moved to apps/benchmark/ and using the new
comapeoBackendDir override hook, the production module no longer
needs:

- comapeoBench Gradle property + conditional sourceSet swap in
  android/build.gradle (sourceSets revert to AGP defaults)
- ENV['COMAPEO_BENCH'] branch + .bench-staging rsync in
  ios/ComapeoCore.podspec (s.resources is just ['nodejs-project'])
- !android/src/bench/ and !ios/nodejs-project-bench/ exclusions in
  package.json files (those dirs no longer exist in the module)
- Bench-specific .gitignore entries

Also removes the (build-artifact, gitignored) android/src/bench/ and
ios/nodejs-project-bench/ directories, and updates two stale comments
in retained source files plus a header note in the planning doc
pointing at the v2 implementation in apps/benchmark/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The export was always misnamed: it isn't a benchmark-specific API,
it's the raw `MessagePort`-shaped escape hatch one level below the
`comapeo` client. Anything paired with a custom backend bundle (the
bench app being the canonical example) goes through this port.

`unstable_` matches React's `unstable_batchedUpdates` /
`unstable_setExceptionDecorator` convention — signals "may change
without notice" without burning the API on a name like
`INTERNAL_messagePort` that implies stronger guarantees about
internal-only access. Lowercase because it's an instance, not a class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier-in-branch edits parameterized `copyStaticAssetsPlugin` and
renamed `sharedInput → prodInput` to support a `BENCH=1` mode that was
since deleted. With the bench bundle owning its own rollup config in
apps/benchmark/, none of those scaffolding changes are needed —
restoring the file to main reduces the diff and keeps the production
config minimal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Sentry plan was committed here only because this branch was
originally cut off the sentry-plan tip (fd33ffc) so the bench design
could reference it during planning. Now that the bench refactor is
self-contained, the doc shouldn't ship via this PR — it'll land on
main from the dedicated Sentry branch instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`pbxProject.addResourceFile` unconditionally calls
`correctForResourcesPath`, which dereferences `pbxGroupByName('Resources')`
without a null check. Default Expo prebuild output for an Expo SDK 55
app has no top-level `Resources` group, so the call crashed with
`Cannot read properties of null (reading 'path')`.

Fix: call `IOSConfig.XcodeUtils.ensureGroupRecursively(project, 'Resources')`
before `addResourceFile`. The group itself has no `.path`, so the
prefix-strip in `correctForResourcesPath` is a no-op, and
`addToResourcesPbxGroup` correctly attaches the file ref under it.

Verified end-to-end on iPhone 16 sim (iOS 26.2) and Pixel 7a API 29
emulator: bench app reaches STARTED state, runs the bench RPC, and
renders 100-sample 64B p50/p95/p99 results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The UDS / RPC bridge benchmark plan is now implemented as Phase 3
shipped, and the doc itself describes an earlier iteration (the
`comapeoBench=true` toggle and `ENV['COMAPEO_BENCH']` Podfile
mutation) that has since been refactored into the generic
`comapeoBackendDir` override. Keeping it would only mislead.

Refreshes App.tsx's header comment to reflect the current wiring and
points at the new `apps/benchmark/README.md` (added in the next
commit) instead of the deleted doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the deleted plan doc with a focused per-app README covering
what the bench measures, how the override hook + plugin + bundle
wiring works end-to-end, run instructions for sims/emulators + Maestro
flows, and the sink/receiver model. Phase 4/5 status sections leave
hooks for the upcoming BrowserStack work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Phase 4 plumbing for orchestrated BrowserStack runs:

- `scripts/bench-receiver.ts` — minimal localhost HTTP server (no
  deps, pure node:http). POST /spans appends each span to
  apps/benchmark/results/<runId>.ndjson; runId is path-traversal-
  guarded against the regex App.tsx generates. GET /health for
  tunnel verification.

- `scripts/run-on-browserstack.ts` — uploads APK / IPA via the
  Maestro v2 App Automate REST API, zips the bench-*.yaml flows
  under the `flows/` parent dir BrowserStack requires, uploads the
  test suite, triggers a build per platform (default device per
  platform configurable via flags), prints the dashboard URL.
  Auth and the bench-flows zip are deduplicated via custom_id so
  re-running with byte-identical artefacts is cheap. Lazy env
  resolution so `--help` and arg-validation errors don't require
  credentials.

- `e2e/.maestro/bench-rpc-receiver.yaml` — sibling of bench-rpc.yaml
  that flips the "POST spans" toggle before tapping run, so spans
  fire to localhost:8787 (reachable from BS devices via
  BrowserStackLocal). bench-rpc.yaml's stale comment about the
  removed `comapeoBench` flavor toggle is also refreshed here.

- `.env.example` + `.gitignore` updates: credentials live in `.env`
  (gitignored), receiver output in apps/benchmark/results/
  (gitignored), BrowserStackLocal's default log files
  (browserstack.{err,log}, local.log) gitignored too.

- npm scripts: `bench:receiver` and `bench:browserstack` for the
  per-run workflow documented in apps/benchmark/README.md.

Verified offline: receiver accepts valid spans, blocks path-traversal
runIds, rejects malformed JSON; runner --help and arg-validation
paths render without credentials. Online verification (real upload +
build trigger) blocked on BrowserStack account access.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`scripts/run-on-browserstack.ts` reads the project name from
`BENCH_BROWSERSTACK_PROJECT` (in `.env`) when --project isn't
passed. Required for org accounts where the access key can't create
new projects — they need to attach builds to an existing project
(verified end-to-end via `GET /app-automate/projects.json`).

`apps/benchmark/RESULTS.md` is the curated summary destination
agreed for Phase 4 results. Includes a template run section that
new runs copy from, plus column documentation. Raw NDJSON spans
remain gitignored under `apps/benchmark/results/`; a future
summarizer script can read those and rewrite a generated section
of this file.

Online dispatch verified up to the build trigger: app + test-suite
uploads succeed and return bs:// URLs; build trigger blocked on
BrowserStack org-side permissions ("You do not have the necessary
permissions to create builds in this project. Please contact your
organization admin.") — needs admin to grant build-creation rights
in the existing CoMapeo project.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default code path is to send no `project` field so BrowserStack
auto-creates one from the uploaded app's bundle ID — which is what
we want. The env var is only relevant when a key can't auto-create
and a pre-existing project must be reused. Reword the example to
not imply CoMapeo (the org's existing app project) is the right
target for the bench.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes to scripts/run-on-browserstack.ts found while smoke-testing
the dispatch end-to-end against the real BrowserStack API:

1. `execute` path doubled `flows/`. BrowserStack auto-prepends the
   zip's parent dir at extraction time, so `execute: ["flows/<flow>"]`
   resolved to `<extract>/flows/flows/<flow>` and dry-run logs
   reported "Flow path does not exist". Drop the prefix; BS appends
   it.

2. `deviceLogs` and `networkLogs` are off by default per BS docs, and
   the only way to triage a failing build's app logcat is via the
   device log endpoint. Default them on for the bench workflow —
   retention is 60 days each, the bench is debug-oriented, and the
   dispatch is human-driven not bulk-CI.

Verified end-to-end: builds dispatched after these fixes correctly
target a single bench-*.yaml flow, and `device_log` URLs surface in
the per-test response so failed runs can be diagnosed without
guessing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a paired Android/iOS hook to skip the keystore-backed rootkey
load and ship a deterministic 16-zero-byte stub on the init frame
instead. Off by default; production consumers MUST leave it off so
real identity material stays encrypted at rest.

Why: `RootKeyStore.createOrLoadWrapperKey` (Android) sets
`setUnlockedDeviceRequired(true)` on the wrapper key, which since
Android 12 requires the device's user ECDH key to be initialised
(pinned to a real screen lock setup). BrowserStack's stock fleet
ships without a screen lock, so any real-device run of the bench app
on BS hit:

    KeyStoreException: System error (code 4)
      In handle_super_encryption_on_key_init: User ECDH key missing.

The bench backend doesn't construct a `MapeoManager` and never
reads the rootkey value — so swapping the keystore path for a
zeroed stub is safe by construction for the benchmark, while a real
production deploy stays on the keystore path and on a real device
that has the prerequisites.

Wiring matches the existing `comapeoBackendDir` shape:
- Android: gradle property `comapeoStubRootKey` →
  `BuildConfig.COMAPEO_STUB_ROOTKEY` boolean → branched in
  `NodeJSService.sendInitFrame`.
- iOS: `ComapeoStubRootKey` Info.plist boolean → branched in
  `AppLifecycleDelegate`'s `rootKeyProvider` closure.
- Bench plugin (`apps/benchmark/plugins/with-comapeo-bench/`) sets
  both. README documents it alongside `comapeoBackendDir`.

Verified end-to-end: bench app build with the stub flag passes the
full `bench-rpc.yaml` Maestro flow on BrowserStack Samsung Galaxy
S23 Ultra (Android 13) — same APK that previously failed at
`STARTING -> ERROR` from the keystore step now reaches STARTED.
Local emulator path also confirmed unaffected (bypasses keystore as
expected, no regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes needed for end-to-end span aggregation on BrowserStack:

- `local: true` on the build trigger payload. Without it, the BS
  device's `localhost` resolves to its own loopback (not our host's
  via the BrowserStackLocal tunnel) and the receiver POSTs vanish.

- `usesCleartextTraffic="true"` on the bench app's release manifest
  (via `withAndroidManifest` in the bench plugin). Expo prebuild
  only sets this on debug variants; release variants on Android 14
  (targetSdk=36) silently block cleartext-to-localhost fetches by
  default. App.tsx's POST has `.catch(() => {})` so the failure was
  invisible — the bench would complete and assert results visible
  while every span POST quietly dropped.

Both confirmed by a clean run on Samsung Galaxy S23 Ultra
(Android 13): receiver collected 300 spans (3 sizes × 100 samples),
sub-ms p50 across small payloads, expected scaling at 64 KB.
RESULTS.md filled in with this first real run as the format
exemplar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three threads of work in one go, plus the first 19-device sweep's
data committed:

- `comapeoBackendArgs` Gradle property → `BuildConfig.COMAPEO_BACKEND_ARGS`
  → appended to nodejs-mobile argv. Native loader also derives a
  `--device=<MANUFACTURER MODEL (Android REL)>` arg so the bench
  backend tags spans without an extra round-trip. Telemetry sink
  takes per-process defaults that lift `runId` to top-level (matching
  the receiver's wire format) and tuck device into `attrs.device`.
  Bench plugin sets `comapeoBackendArgs=--telemetry=http://localhost:8787/spans`.

- App.tsx attaches `attrs.device` to every RN-side rpc span via
  `Platform.constants` so the summarizer can group across the
  cross-device-runId noise.

- `scripts/run-on-browserstack.ts` accepts CSV via
  `--devices-android` / `--devices-ios` and submits the array in a
  single build.

- `scripts/bench-summarize.ts` reads NDJSON files and rewrites a
  marker-delimited section in `apps/benchmark/RESULTS.md`. Curated
  commentary outside the markers is preserved across re-runs.

First 19-device sweep landed: 14 non-Samsung non-Pixel Android
devices in the BS catalog + 3 Samsung + 2 Pixel, dispatched as two
batches (5 parallel + 5 queued cap). All 19 sessions passed.
RESULTS.md gains a variance-analysis section that walks through the
p99/p50 spread (typical 2–6× ratio) and roots it in scheduler
preemption, GC pauses, CPU frequency scaling, and tail interrupt
events. Boot-phase capture is a known gap — BS Local doesn't
appear to tunnel nodejs-mobile libuv socket traffic the way it
tunnels RN's fetch; documented inline for follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bench-*.yaml flows belong with the benchmark app, not in the module's
generic e2e/.maestro/ directory. Moves all five bench flows + the
iOS-flavoured variant into apps/benchmark/.maestro/, adds a minimal
config.yaml so Maestro CLI runs only the bench-* discoveries here, and
points the BS dispatch script's FLOWS_SRC_DIR at the new location.

Also drops bench-rpc-receiver.yaml — the receiver/tunnel transport is
about to be replaced by logcat-based reporting (next commit) where the
post-spans toggle is no longer relevant.

Module-level e2e flows (app-launch, ipc-roundtrip, state-transitions,
send-multiple-rounds, node-process-starts) stay in e2e/.maestro/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The receiver+tunnel approach worked for RPC spans but had a long tail
of fragile bits: BrowserStackLocal had to be running before dispatch,
the consumer app needed `usesCleartextTraffic="true"` (Expo only sets
that on debug variants), and nodejs-mobile's libuv sockets bypassed
the BS Local intercept entirely so boot spans never landed.

Logs sidestep all of it. BS captures Android logcat verbatim when the
build sets `deviceLogs: true`, with 60-day retention and a REST
endpoint to pull post-build. Both span sources (RN bridge,
nodejs-mobile) just `console.log("BENCH_SPAN " + JSON)` and BS picks
them up under their respective log tags.

Removed:
- `scripts/bench-receiver.ts` and its `bench:receiver` npm script
- POST-spans toggle + receiver URL TextInput + RECEIVER_DEFAULT_URL
  in App.tsx (no longer relevant on the device)
- `withAndroidManifest`-driven `usesCleartextTraffic` in the bench
  plugin (was only there for cleartext-localhost POSTs)
- `comapeoBackendArgs=--telemetry=http://localhost:8787/spans` from
  the plugin (LogSink is now the default; comapeoBackendArgs stays
  as an empty escape hatch for non-default sinks)
- BrowserStackLocal default log file gitignores (no longer needed in
  the bench workflow)

Added:
- `LogSink` in apps/benchmark/backend/lib/telemetry-sink.js — writes
  one stdout line per span with the `BENCH_SPAN ` prefix and the
  same `mergeDefaults` field-lifting (runId / device) the other
  sinks already had.
- `LogSink` is now the default returned by `createSinkFromArg` when
  no `--telemetry=` arg is passed.
- The `--device=<MANUFACTURER MODEL (Android REL)>` arg is always
  appended to the nodejs-mobile argv (was conditional on
  `comapeoBackendArgs` being non-empty); the production backend
  ignores the unknown flag.

The dispatch script's log-pull + parsing change comes in the next
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…default

Five distinct improvements packed into one runner rewrite:

1. **Maestro version pin**. Adds `maestroVersion: "2.0.7"` to the
   build trigger. BS supports `latest` / `2.0.7` / `1.39.13` (the
   default 1.39.13 is older). 2.0.7 has the runner-side `http`
   client and perf fixes; pinning rather than `latest` avoids
   surprise version bumps.

2. **Auto-batch via plan capacity**. Reads
   `/app-automate/plan.json` for `parallel_sessions_max_allowed +
   queued_sessions_max_allowed` and chunks `--devices-android`
   into batches that fit. Was: human had to hand-split a list of
   19 into two dispatches.

3. **Log-pull and parse**. After each batch reaches a terminal
   status, walks `/builds/<id>/sessions/<sid>` for per-test
   `device_log` URLs, fetches each, greps `BENCH_SPAN ` lines, and
   writes one NDJSON file per device under
   `apps/benchmark/results/<device-slug>-<session-prefix>.ndjson`.
   Replaces the receiver+tunnel transport entirely.

4. **Test R&A organization**. Switches `buildName` (heuristic-
   stripped) for `customBuildName` (static, default
   `comapeo-bench`) plus `buildIdentifier` (per-run, default ISO
   timestamp). Optional `--build-tag` for free-form filtering on
   the dashboard. Both flags are exposed via CLI.

5. **10-device curated default**. `CURATED_ANDROID_DEVICES` spans
   Android 9–16 across 6 brands and the variance spectrum (S26
   Ultra → P30); fits one BS 5+5 plan dispatch. Was: single
   `Samsung Galaxy S23 Ultra-13.0` default. Override singly with
   `--device-android` or via CSV with `--devices-android`.

Also drops `local: true` (no tunnel needed for log-based
transport) and tightens the polling loop's status reporting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gmaclennan and others added 3 commits May 6, 2026 11:34
README: drops the receiver/tunnel workflow, shows the logcat path,
points at the new per-build NDJSON output and curated 10-device
default. Maestro-flow paths updated to apps/benchmark/.maestro/.
Phase 4 marked complete.

RESULTS: replaces the "boot-phase wiring caveat" with a "resolved"
note pointing at the new logcat-based transport. Phase 4 boot spans
now flow.

Skill at ~/.claude/skills/browserstack-app-automate-maestro/SKILL.md
was rewritten in tandem (not committed here — it lives outside the
repo). The skill puts logs first as the recommended transport and
keeps BS Local + Maestro runScript as documented fallbacks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…N device tag

Two-part iOS parity work for the logcat-based span transport.

1. Pipe nodejs-mobile stdout/stderr to `os_log`. On Android the
   libnode build pipes Node's stdio into logcat so `console.log`
   from the rolled-up backend lands under the `Comapeo:NodeJS`
   tag automatically. On iOS no equivalent piping exists — by
   default the writes hit fds inherited from the parent process
   (/dev/null on a release build, Xcode console on debug),
   neither of which the iOS unified log subsystem captures.

   `NodeMobileBridge.mm` now sets up a one-shot pipe via
   `pthread_once`: dup2 stdout/stderr onto the write end of a
   pipe, spawn a detached pthread that reads line-by-line and
   forwards each line to `os_log` under `com.comapeo.nodejs:stdout`.
   Process-wide redirect, so RN's console.log lands in the same
   subsystem too — handy for the bench app, which has BENCH_SPAN
   emitters on both sides.

2. Fix RN-side device tag derivation on iOS. App.tsx was reading
   `Platform.constants.systemVersion` (doesn't exist) and
   `.model` (doesn't exist), producing `"Apple device (iOS ?)"`.
   The right keys on iOS are `osVersion` and `interfaceIdiom`
   ("phone" / "pad" / "tv"). RN-side now produces
   `"Apple iPhone (iOS 26.2)"` to exactly match the backend tag
   that NodeJSService.swift derives from `UIDevice.current.model`
   + `systemName` + `systemVersion`. The summarizer's group-by-
   `attrs.device` is reliable as a result.

Also: NodeJSService.swift now appends `--device=<tag>` and the
optional `ComapeoBackendArgs` Info.plist value to the nodejs-mobile
argv (mirrors the Android `comapeoBackendArgs` Gradle property).
The bench plugin sets the Info.plist key to empty by default;
override it per-build to pass e.g. `--telemetry=file:/tmp/x.ndjson`.

Verified end-to-end on iPhone 16 simulator (iOS 26.2) running
`bench-rpc-ios.yaml`: 3 boot spans + 330 RPC spans land in
`com.comapeo.nodejs:stdout` with consistent device tagging across
both emitters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`apps/benchmark/scripts/build-ipa.sh` (exposed as `npm run
ios:archive`) wraps `xcodebuild archive` + `-exportArchive` with a
Development export method, the team id read from
`APPLE_DEVELOPMENT_TEAM_ID` in `.env`, and a generated `ExportOptions.plist`
so no per-developer plist needs to be checked in.

The path BrowserStack accepts:

- BS auto-resigns iOS apps on upload, replacing the consumer
  provisioning profile with theirs. Distribution / App Store
  signing isn't required — Development export works.
- The bundle id (`com.comapeo.core.benchmark`) only needs to exist
  as an Identifier under the developer team. No Capabilities are
  needed (the bench app sets `comapeoStubRootKey: true` so it
  doesn't touch Keychain). No App Store Connect record needed.
- `xcodebuild` runs with `-allowProvisioningUpdates` so Xcode can
  auto-create the Development cert + provisioning profile on the
  first archive without forcing manual portal setup.

Output lands at `apps/benchmark/ios-build/ipa/<scheme>.ipa`,
gitignored. Consumed by `npm run bench:browserstack -- --app-ios`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 6, 2026

gmaclennan and others added 8 commits May 6, 2026 14:35
Discovered during the first cross-platform BS dispatch that iOS
release builds suppress JS \`console.log\` (RCTLog level filter
defaults to WARN), so RN-side spans never reached the device log
even though the \`pipe + dup2 → os_log\` redirect was capturing
nodejs-mobile output cleanly. The backend boot phases came through;
the RTT samples didn't.

Three coupled changes:

1. `bench-rpc.js`: new `ingestSpans` RPC method that takes
   `{spans: [...]}` and re-emits each via `console.log` (which on
   the backend side IS captured — Android logcat directly, iOS via
   the bridge's pipe redirect). Single batched call after the bench
   loop completes, so the round-trip cost doesn't pollute the RTT
   samples.

2. `App.tsx`: dropped the per-iteration `console.log("BENCH_SPAN ...")`
   in favour of `client.request("ingestSpans", { spans: allSpans })`
   after measurement is done. Span data still gets serialised to
   the on-device NDJSON file as before.

3. `bench-summarize.ts`: filter on `attrs.rttSide === "rn"` for the
   RPC throughput table. Without this, `op:"rpc"` spans from the
   backend's per-handler tracing (sub-ms by design, mostly bench-
   rpc.js internal) get aggregated together with the user-facing
   RN-thread RTT samples, pulling p50 toward zero.

Verified end-to-end: dispatched the 10-device Android sweep + 1 iOS
device. All 11 sessions passed, 6542 spans collected, autosummary
table now shows realistic numbers across all devices including
iPhone (iOS 17.3 → 64B p50=0.19ms p99=0.75ms). RESULTS.md gains a
new run entry referencing the two BS build URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comments left over from the receiver+tunnel transport before the logcat
pivot. Updates: bench-rpc.yaml drops the "sibling bench-rpc-receiver.yaml"
paragraph; bench-summarize.ts header points at the real upstream
(run-on-browserstack.ts logcat parser, not the deleted bench-receiver);
android/build.gradle's comapeoBackendArgs doc no longer claims the
bench wires HttpSink there; RESULTS.md template uses the real flow name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces orphaned by the receiver+tunnel → logcat pivot:

- HttpSink + the http(s):// branch in createSinkFromArg: nothing
  constructs it now that the bench plugin sets comapeoBackendArgs
  empty and LogSink is the default.
- scripts/lib/bench-receiver.ts: no consumer; spans flow via logcat
  pulled by run-on-browserstack.ts.
- apps/benchmark/.maestro/bench-rpc-ios.yaml: bench-rpc.yaml works
  for both platforms (the recent 11-device sweep included an iPhone
  that passed clean), so the iOS-specific variant is dead weight.

README repo-layout table and architecture paragraph updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend handler spans were being emitted, captured into NDJSON, then
silently dropped by the summarizer's rttSide-rn filter — pure noise in
the logcat budget. Now they carry attrs.rttSide:"backend" and the
summarizer renders a second table beneath the RN-side one. The diff
against the RN row is approximately the JSI + framing + UDS overhead,
which is the most actionable diagnostic when a regression appears.

Also drops the span emit for the `ingestSpans` housekeeping RPC, whose
body is the bulk span flush itself (one big outlier per run is not
useful percentile data), and narrows the BootPhase typedef to the
three server-side phases the bench actually emits — the three native
phases the prior typedef listed are out of scope for this process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pipe + dup2 → os_log redirect was running for every consumer of
this module, not just the bench app. Two production concerns: the
%{public}s formatter on the os_log call deliberately defeats the
unified log's PII redaction (any future identity-bearing log line
would land in the device's persistent log, retrievable via sysdiagnose),
and the always-on reader pthread is overhead production apps don't
otherwise pay.

Now opt-in via the Info.plist BOOL `ComapeoStdoutToOsLog`. Production
consumers leave it unset and inherit iOS's default stdout routing.
The bench app's `with-comapeo-bench` config plugin sets it true so
BrowserStack can pull `BENCH_SPAN <json>` lines out of the device
console as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step plan for a workflow_dispatch-only GitHub Actions workflow
that builds the bench APK + IPA, dispatches to BrowserStack, pulls
device logs, and uploads NDJSON + RESULTS.md as workflow artefacts.
Defers regression detection, automated triggers, and PR comments to
later iterations — this slice is just the manual pipeline.

Covers the iOS keychain bootstrap (the only meaningfully new piece
beyond what already runs locally), the secrets surface, and an
implementation order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduces the production-code touch points exposed for non-production
consumers (the bench app being the only one) down to a single override
on each platform plus the existing nodejs-mobile stdout-redirect gate.

- Drop `comapeoBackendArgs` (Gradle property + BuildConfig field +
  Kotlin parsing on Android; Info.plist key + Swift parsing on iOS).
  Was speculative surface for future telemetry-sink overrides; nothing
  in this PR populates it. The `--device=<tag>` argv injection the
  native loader does unconditionally is unaffected — production
  backend ignores unknown flags and Sentry tagging will read it.

- Rename `comapeoBackendDir` → `comapeoEntryFile`. Override is now a
  filename inside `nodejs-project/` rather than a sibling directory.
  Bench plugin drops the bench entry into the consumer's
  `nodejs-project/` and lets AGP's asset merge (Android) / a
  Run Script Phase (iOS) co-locate it with the production bundle's
  `index.mjs`. Bench bundle's rollup output is renamed to
  `index.bench.mjs` and no longer ships a `package.json` (the
  production bundle's already does, in the same directory).

- Drop `comapeoStubRootKey` end-to-end now that #57 (drop
  setUnlockedDeviceRequired from rootkey wrapper key) has landed on
  main. The stub existed only to work around BrowserStack stock
  no-screen-lock devices failing key generation; the real keystore
  path now succeeds for them, the bench backend's relaxed init
  handler ignores the rootkey bytes it receives, and the production
  branch in the FGS loader simplifies back to a single
  RootKeyStore.loadOrInitialize() call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim verbose explanations down to non-obvious why-only. Cut:
- Restatements of what the code does (the code is right there).
- Multi-paragraph rationale better suited to PR description / README.
- ASCII state-machine diagrams and historical narration.

Keep load-bearing rationale: hidden constraints (libUV contiguous
argv, AGP asset merge, RCTLog level filter, undici WASM init,
streamx-microtask write-after-end race), security gates (`%{public}s`
defeats os_log redaction), and protocol invariants (`stopping`-before-
close so native distinguishes graceful from crash).

Net: -473 lines, no behavior change. Bench bundle still builds; plugin
still loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants