Skip to content

cross platform dev?#1531

Draft
daniel-noland wants to merge 23 commits into
mainfrom
pr/daniel-noland/cross-platform
Draft

cross platform dev?#1531
daniel-noland wants to merge 23 commits into
mainfrom
pr/daniel-noland/cross-platform

Conversation

@daniel-noland
Copy link
Copy Markdown
Collaborator

trivial idea struck me and I'm just testing it

@daniel-noland daniel-noland changed the base branch from main to pr/daniel-noland/everything-is-wasm May 12, 2026 19:20
@daniel-noland daniel-noland added ci:+cross run cross compile jobs ci:+cross/full labels May 13, 2026
Base automatically changed from pr/daniel-noland/everything-is-wasm to main May 13, 2026 09:03
Lift the two embedded lookup tables out of nix/platforms.nix into
sibling files:

  - nix/hardware.nix carries per-hardware SOC tuning (arch, march,
    mcpu, NUMA hints, stdenv override) for x86-64-v3/v4, zen3/4/5,
    aarch64, bluefield2/3, and wasm32-wasip1.

  - nix/triples.nix is a 3-level lookup keyed by (arch, kernel, libc)
    producing the matching rust target triple plus the nix-internal
    machine/nixarch tags.

platforms.nix becomes a thin assembler: it composes a hardware record
with its triple info and the existing DPDK name override.  Behaviour
for single-platform callers is unchanged; the split lets other
consumers (e.g. an all-platforms cross-target enumeration) reuse the
maps without paying the parameterised composition.
Adds an "every target triple" build mode.  Passing platform=all (via
justfile or directly to nix) flips two paths:

  - default.nix gains an extra-platforms list: one platforms.nix
    record per (arch, kernel, libc) combination from triples.nix,
    using a canonical-hardware-per-arch lookup (x86-64-v3 for x86_64,
    aarch64 for aarch64, wasm32-wasip1 for wasm32).  SOC-specific
    march tuning is intentionally not enumerated -- the rust target
    triple is arch-determined, so the SOC variants would duplicate
    target rows.  To exercise SOC tuning, call platform=<name>
    directly.

  - The llvm overlay reads extra-platforms.*.info.target and extends
    the rust-toolchain's target list with each entry.  Result: with
    platform=all the toolchain in the dev shell / build environment
    knows about all five triples (x86_64-{gnu,musl}, aarch64-{gnu,musl},
    wasm32-wasip1), so cargo can cross-compile to any of them in one
    shot.

The single-platform path is unchanged: when platform is set to a
concrete hardware name, extra-platforms is the empty list and the
toolchain targets are exactly what they were before.

Justfile gains an all-platforms toggle (default false) that's plumbed
through --argstr to the relevant recipes (build, setup-roots, shell)
as a no-op for now; nix's CLI is lenient about unknown argstr args, so
this is forward-compatible for a future full-SOC matrix toggle without
breaking current invocations.
Cargo derivations rebuild on every source revision; their inputs
include the workspace src closure, so a cache hit is impossible.
Pushing them anyway is net-negative -- upload cost without ever
hitting on substitute, and storage bloat on cachix.  C-level builds
(DPDK, hwloc, libbsd, FRR, rust toolchain) remain cached as before.

Two halves:

  - default.nix tags every cargo derivation, dataplane.tar, and the
    cargo-bearing OCI containers (dataplane, dataplane-debugger,
    frr.dataplane) with allowSubstitutes=false + preferLocalBuild=true
    so cold runners don't waste a roundtrip asking cachix for a path
    cachix could not possibly have.  The cargo derivations also get a
    distinctive name = "dataplane-cargo-..." so the push filter can
    catch them with one regex.  Containers can't take these attrs as
    buildLayeredImage args, so the override goes through .overrideAttrs.

  - .github/actions/nix-shell/action.yml adds a pushFilter regex on
    the cachix-action step.  Two alternation arms: `-dataplane-cargo-`
    for the renamed cargo derivations + tar, and
    `-(dataplane|debugger|frr)\.tar\.gz` for the cargo-bearing
    containers (dockerTools.buildLayeredImage strips the registry
    prefix from the OCI name when computing the store-path basename).

frr.host and debug-tools are unchanged: they bundle no cargo outputs
and continue to benefit from caching.
Adds a `cross-matrix` attribute at the top of default.nix with three
presets that the GH Actions workflow can consume via
`nix eval --json -f default.nix 'cross-matrix.<preset>'`:

  - default: aarch64 + bluefield3 × gnu + musl (4 entries).
    Matches today's hardcoded cross-job matrix; today's PR/push CI
    behaviour is unchanged once the workflow side is wired up.

  - full: every entry in hardware.nix × every valid libc for that
    arch in triples.nix (16 entries with the current data files).
    Wasm32 is intentionally omitted -- the `wasm` CI job covers it
    and the build path is materially different.

  - skip: empty list, for explicit opt-out from a workflow_dispatch
    run that doesn't want any cross legs.

Each entry is `{ platform, libc }` -- kernel is implied by arch and
the workflow side adds the recipe fan-out (build-container dataplane,
build-container frr.dataplane).

`full` is computed by joining hardware.nix with triples.nix at eval
time rather than enumerated by hand, so adding a new arch/hardware/libc
in those files automatically picks it up in the rare full-suite CI
run.
Replaces the hand-rolled `cross` matrix (aarch64 + bluefield3 × gnu +
musl × two recipes) with a computed matrix sourced from
`cross-matrix.<scope>` in default.nix.  Three scopes are now selectable:

  - default: today's behaviour (4 platform/libc pairs).
  - full:    every hardware × valid-libc in hardware.nix / triples.nix
             (16 platform/libc pairs).
  - skip:    no cross legs at all.

Scope is resolved by event:

  - workflow_dispatch: `cross_scope` input (default/full/skip), default
    is "default".  Surfaces in the manual-run UI so a full audit run
    is a button press.
  - pull_request with `ci:+full-cross` label: full.
  - pull_request with `ci:+cross` label (without the full one): default.
    Matches today's PR behaviour.
  - pull_request without either label: skip.  Matches today's PR
    behaviour (cross only ran when ci:+cross was set).
  - push, merge_group: default.  Matches today's behaviour.

Mechanics:

  - New `cross_matrix` job runs `nix eval --json -f default.nix
    'cross-matrix.<scope>'`, then expands each (platform, libc) row with
    `jq` into one matrix entry per recipe variant (build-container
    dataplane, build-container frr.dataplane).  Output JSON feeds
    `cross.strategy.matrix.include` via fromJSON.
  - `cross_matrix` also outputs `has_legs`; the `cross` job gates on
    `has_legs == 'true'` so the empty-matrix case (skip / unlabeled
    PR) skips cleanly rather than failing a zero-leg job.
  - `summary` picks up `cross_matrix` as a dependency so a broken
    matrix-resolution step surfaces as a CI error rather than
    silently dropping every cross leg.

The cross job remains build-only (`just build-container`); no runtime
execution, so the expanded matrix exercises more cross-compile/link
combinations but no additional qemu-user paths.
fix(cargo): use explicit triples for the host-arch test runners

Replaces the cfg-pattern runner entries added in 1f74b46 with
explicit target-triple entries.  Reason: `cargo miri` injects
`--config target.cfg(all()).runner=cargo-miri-wrapped` on every
invocation, and cargo refuses to disambiguate between two cfg-pattern
matches, so `just miri` errored with "several matching instances of
target.'cfg(..)'.runner".  Cargo treats explicit-triple entries as
more specific than cfg(all()) and selects them without competing, so
the conflict disappears.

Also adds the matching -musl variants for x86_64 and aarch64 so the
qemu-user wrapper handles both libc flavours when running tests on a
non-matching host (the existing wrapper script is libc-agnostic).

The default `just miri` target is powerpc64-unknown-linux-gnu, which
matches none of the explicit triples below; only cargo-miri's
cfg(all()) injection applies and miri runs correctly.  A
miri::cpu=x86_64 (or aarch64) invocation would silently take the
explicit-triple runner instead of cargo-miri-wrapped and bypass miri;
a comment in .cargo/config.toml flags the limitation.
The previous fixup landed explicit-triple runner entries in
.cargo/config.toml to avoid cargo's "several matching instances"
error against cargo-miri's `cfg(all())` injection.  That fixed the
default `just miri` (cpu=powerpc64, where no triple matches) but
left a foot-gun for `just miri miri::cpu=x86_64` (or aarch64):
cargo prefers explicit-triple entries over cfg(all()), so the
qemu-wrapping runner would have won and silently bypassed miri.

scripts/test-runner.sh now detects miri context via MIRI_SYSROOT
(set by cargo-miri's internals; not the same as MIRIFLAGS which the
miri recipe also sets) and re-delegates to .cargo-miri-wrapped,
which sits next to the invoking cargo and is reachable via the
$CARGO env var.  The wrapper-exists guard (-x) keeps a
misconfigured toolchain from exec-failing with an opaque error.

Verified both paths:

  - just miri (cpu=powerpc64): no triple match, cargo-miri's
    cfg(all()) runner is selected directly.
  - just miri::cpu=x86_64 miri: triple match wins, script runs,
    MIRI_SYSROOT detected, .cargo-miri-wrapped is exec'd; miri
    sysroot is prepared and tests compile against it.

Also drops the now-obsolete caveat comment in .cargo/config.toml
(the script handles the case it warned about).
The new cross_matrix job was missing `env: USER: runner` that every
other dev.yml job sets, so cachix-action's `cachix use` step failed
with `$USER must be set. If running in a container, try setting
USER=root.` (the GitHub `lab` runner doesn't export USER by default).

Add the same env block as the other jobs.
Extend cross to optionally run `just test` under qemu-user before the
build-container steps, gated to scope=full only.

Changes:

  - cross_matrix:
      * jq filter collapses to one row per (platform, libc, profile)
        instead of fanning out per (recipe_name, recipe_args).  The
        per-recipe fan-out moves into the cross job's step list.
      * Adds a `scope` output (alongside `matrix` and `has_legs`) so
        downstream jobs can gate on the resolved scope.
      * Outputs are written via a brace-grouped redirect to
        $GITHUB_OUTPUT (shellcheck SC2129).
  - cross:
      * Strategy matrix consumes the new flat shape directly.  Job
        name drops the recipe portion since it's no longer in the
        row.
      * JUST_VARS moves from per-step to job-level env -- all three
        steps want the same platform/libc/profile triple.
      * Steps:
          1. `just test`  -- only when scope==full.  Builds the
             cross-target test archive via nix and runs `cargo
             nextest run` against it; binaries dispatch through
             scripts/test-runner.sh (qemu-user for non-native).
          2. `just build-container dataplane`
          3. `just build-container frr.dataplane`

Rationale for scope-gating the test step:

Default scope runs on every PR with ci:+cross and on push/merge_group,
so total cost matters.  qemu-user signal value is limited for our
workload (ISA emulation only, no hardware semantics), and the test
build is a full from-scratch cargo compile per leg because the cargo
cache-exclusion already forbids substitutes for source-volatile
artifacts.  Keeping default scope build-only preserves today's
behaviour and runtime.  Full scope is the rare opt-in
(ci:+cross/full label or cross_scope=full dispatch) where the
comprehensive sweep is the whole point; tests under emulation pay
their cost there.

Leg counts after the matrix collapse:
  - default: 4 legs (was 8) -- same total work (2 build-containers
    per leg, serial via max-parallel: 1).
  - full:   16 legs (was 32) -- more total work per leg due to the
    added test step, but fewer parallel slots needed.
  - skip:    0 legs (unchanged).
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/cross-platform branch from 0e5f687 to eeba121 Compare May 14, 2026 03:56
@daniel-noland daniel-noland removed ci:+cross run cross compile jobs ci:+cross/full labels May 14, 2026
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/cross-platform branch from eeba121 to e3113f7 Compare May 14, 2026 03:58
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/cross-platform branch from e3113f7 to 652206d Compare May 14, 2026 04:26
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/cross-platform branch from 3e1b127 to c8fa532 Compare May 14, 2026 05:27
@daniel-noland daniel-noland added the ci:+vlab Enable VLAB tests label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+vlab Enable VLAB tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant