cross platform dev?#1531
Draft
daniel-noland wants to merge 23 commits into
Draft
Conversation
Lift the two embedded lookup tables out of nix/platforms.nix into
sibling files:
- nix/hardware.nix carries per-hardware SOC tuning (arch, march,
mcpu, NUMA hints, stdenv override) for x86-64-v3/v4, zen3/4/5,
aarch64, bluefield2/3, and wasm32-wasip1.
- nix/triples.nix is a 3-level lookup keyed by (arch, kernel, libc)
producing the matching rust target triple plus the nix-internal
machine/nixarch tags.
platforms.nix becomes a thin assembler: it composes a hardware record
with its triple info and the existing DPDK name override. Behaviour
for single-platform callers is unchanged; the split lets other
consumers (e.g. an all-platforms cross-target enumeration) reuse the
maps without paying the parameterised composition.
Adds an "every target triple" build mode. Passing platform=all (via
justfile or directly to nix) flips two paths:
- default.nix gains an extra-platforms list: one platforms.nix
record per (arch, kernel, libc) combination from triples.nix,
using a canonical-hardware-per-arch lookup (x86-64-v3 for x86_64,
aarch64 for aarch64, wasm32-wasip1 for wasm32). SOC-specific
march tuning is intentionally not enumerated -- the rust target
triple is arch-determined, so the SOC variants would duplicate
target rows. To exercise SOC tuning, call platform=<name>
directly.
- The llvm overlay reads extra-platforms.*.info.target and extends
the rust-toolchain's target list with each entry. Result: with
platform=all the toolchain in the dev shell / build environment
knows about all five triples (x86_64-{gnu,musl}, aarch64-{gnu,musl},
wasm32-wasip1), so cargo can cross-compile to any of them in one
shot.
The single-platform path is unchanged: when platform is set to a
concrete hardware name, extra-platforms is the empty list and the
toolchain targets are exactly what they were before.
Justfile gains an all-platforms toggle (default false) that's plumbed
through --argstr to the relevant recipes (build, setup-roots, shell)
as a no-op for now; nix's CLI is lenient about unknown argstr args, so
this is forward-compatible for a future full-SOC matrix toggle without
breaking current invocations.
Cargo derivations rebuild on every source revision; their inputs
include the workspace src closure, so a cache hit is impossible.
Pushing them anyway is net-negative -- upload cost without ever
hitting on substitute, and storage bloat on cachix. C-level builds
(DPDK, hwloc, libbsd, FRR, rust toolchain) remain cached as before.
Two halves:
- default.nix tags every cargo derivation, dataplane.tar, and the
cargo-bearing OCI containers (dataplane, dataplane-debugger,
frr.dataplane) with allowSubstitutes=false + preferLocalBuild=true
so cold runners don't waste a roundtrip asking cachix for a path
cachix could not possibly have. The cargo derivations also get a
distinctive name = "dataplane-cargo-..." so the push filter can
catch them with one regex. Containers can't take these attrs as
buildLayeredImage args, so the override goes through .overrideAttrs.
- .github/actions/nix-shell/action.yml adds a pushFilter regex on
the cachix-action step. Two alternation arms: `-dataplane-cargo-`
for the renamed cargo derivations + tar, and
`-(dataplane|debugger|frr)\.tar\.gz` for the cargo-bearing
containers (dockerTools.buildLayeredImage strips the registry
prefix from the OCI name when computing the store-path basename).
frr.host and debug-tools are unchanged: they bundle no cargo outputs
and continue to benefit from caching.
Adds a `cross-matrix` attribute at the top of default.nix with three
presets that the GH Actions workflow can consume via
`nix eval --json -f default.nix 'cross-matrix.<preset>'`:
- default: aarch64 + bluefield3 × gnu + musl (4 entries).
Matches today's hardcoded cross-job matrix; today's PR/push CI
behaviour is unchanged once the workflow side is wired up.
- full: every entry in hardware.nix × every valid libc for that
arch in triples.nix (16 entries with the current data files).
Wasm32 is intentionally omitted -- the `wasm` CI job covers it
and the build path is materially different.
- skip: empty list, for explicit opt-out from a workflow_dispatch
run that doesn't want any cross legs.
Each entry is `{ platform, libc }` -- kernel is implied by arch and
the workflow side adds the recipe fan-out (build-container dataplane,
build-container frr.dataplane).
`full` is computed by joining hardware.nix with triples.nix at eval
time rather than enumerated by hand, so adding a new arch/hardware/libc
in those files automatically picks it up in the rare full-suite CI
run.
Replaces the hand-rolled `cross` matrix (aarch64 + bluefield3 × gnu +
musl × two recipes) with a computed matrix sourced from
`cross-matrix.<scope>` in default.nix. Three scopes are now selectable:
- default: today's behaviour (4 platform/libc pairs).
- full: every hardware × valid-libc in hardware.nix / triples.nix
(16 platform/libc pairs).
- skip: no cross legs at all.
Scope is resolved by event:
- workflow_dispatch: `cross_scope` input (default/full/skip), default
is "default". Surfaces in the manual-run UI so a full audit run
is a button press.
- pull_request with `ci:+full-cross` label: full.
- pull_request with `ci:+cross` label (without the full one): default.
Matches today's PR behaviour.
- pull_request without either label: skip. Matches today's PR
behaviour (cross only ran when ci:+cross was set).
- push, merge_group: default. Matches today's behaviour.
Mechanics:
- New `cross_matrix` job runs `nix eval --json -f default.nix
'cross-matrix.<scope>'`, then expands each (platform, libc) row with
`jq` into one matrix entry per recipe variant (build-container
dataplane, build-container frr.dataplane). Output JSON feeds
`cross.strategy.matrix.include` via fromJSON.
- `cross_matrix` also outputs `has_legs`; the `cross` job gates on
`has_legs == 'true'` so the empty-matrix case (skip / unlabeled
PR) skips cleanly rather than failing a zero-leg job.
- `summary` picks up `cross_matrix` as a dependency so a broken
matrix-resolution step surfaces as a CI error rather than
silently dropping every cross leg.
The cross job remains build-only (`just build-container`); no runtime
execution, so the expanded matrix exercises more cross-compile/link
combinations but no additional qemu-user paths.
fix(cargo): use explicit triples for the host-arch test runners Replaces the cfg-pattern runner entries added in 1f74b46 with explicit target-triple entries. Reason: `cargo miri` injects `--config target.cfg(all()).runner=cargo-miri-wrapped` on every invocation, and cargo refuses to disambiguate between two cfg-pattern matches, so `just miri` errored with "several matching instances of target.'cfg(..)'.runner". Cargo treats explicit-triple entries as more specific than cfg(all()) and selects them without competing, so the conflict disappears. Also adds the matching -musl variants for x86_64 and aarch64 so the qemu-user wrapper handles both libc flavours when running tests on a non-matching host (the existing wrapper script is libc-agnostic). The default `just miri` target is powerpc64-unknown-linux-gnu, which matches none of the explicit triples below; only cargo-miri's cfg(all()) injection applies and miri runs correctly. A miri::cpu=x86_64 (or aarch64) invocation would silently take the explicit-triple runner instead of cargo-miri-wrapped and bypass miri; a comment in .cargo/config.toml flags the limitation.
The previous fixup landed explicit-triple runner entries in
.cargo/config.toml to avoid cargo's "several matching instances"
error against cargo-miri's `cfg(all())` injection. That fixed the
default `just miri` (cpu=powerpc64, where no triple matches) but
left a foot-gun for `just miri miri::cpu=x86_64` (or aarch64):
cargo prefers explicit-triple entries over cfg(all()), so the
qemu-wrapping runner would have won and silently bypassed miri.
scripts/test-runner.sh now detects miri context via MIRI_SYSROOT
(set by cargo-miri's internals; not the same as MIRIFLAGS which the
miri recipe also sets) and re-delegates to .cargo-miri-wrapped,
which sits next to the invoking cargo and is reachable via the
$CARGO env var. The wrapper-exists guard (-x) keeps a
misconfigured toolchain from exec-failing with an opaque error.
Verified both paths:
- just miri (cpu=powerpc64): no triple match, cargo-miri's
cfg(all()) runner is selected directly.
- just miri::cpu=x86_64 miri: triple match wins, script runs,
MIRI_SYSROOT detected, .cargo-miri-wrapped is exec'd; miri
sysroot is prepared and tests compile against it.
Also drops the now-obsolete caveat comment in .cargo/config.toml
(the script handles the case it warned about).
The new cross_matrix job was missing `env: USER: runner` that every other dev.yml job sets, so cachix-action's `cachix use` step failed with `$USER must be set. If running in a container, try setting USER=root.` (the GitHub `lab` runner doesn't export USER by default). Add the same env block as the other jobs.
Extend cross to optionally run `just test` under qemu-user before the
build-container steps, gated to scope=full only.
Changes:
- cross_matrix:
* jq filter collapses to one row per (platform, libc, profile)
instead of fanning out per (recipe_name, recipe_args). The
per-recipe fan-out moves into the cross job's step list.
* Adds a `scope` output (alongside `matrix` and `has_legs`) so
downstream jobs can gate on the resolved scope.
* Outputs are written via a brace-grouped redirect to
$GITHUB_OUTPUT (shellcheck SC2129).
- cross:
* Strategy matrix consumes the new flat shape directly. Job
name drops the recipe portion since it's no longer in the
row.
* JUST_VARS moves from per-step to job-level env -- all three
steps want the same platform/libc/profile triple.
* Steps:
1. `just test` -- only when scope==full. Builds the
cross-target test archive via nix and runs `cargo
nextest run` against it; binaries dispatch through
scripts/test-runner.sh (qemu-user for non-native).
2. `just build-container dataplane`
3. `just build-container frr.dataplane`
Rationale for scope-gating the test step:
Default scope runs on every PR with ci:+cross and on push/merge_group,
so total cost matters. qemu-user signal value is limited for our
workload (ISA emulation only, no hardware semantics), and the test
build is a full from-scratch cargo compile per leg because the cargo
cache-exclusion already forbids substitutes for source-volatile
artifacts. Keeping default scope build-only preserves today's
behaviour and runtime. Full scope is the rare opt-in
(ci:+cross/full label or cross_scope=full dispatch) where the
comprehensive sweep is the whole point; tests under emulation pay
their cost there.
Leg counts after the matrix collapse:
- default: 4 legs (was 8) -- same total work (2 build-containers
per leg, serial via max-parallel: 1).
- full: 16 legs (was 32) -- more total work per leg due to the
added test step, but fewer parallel slots needed.
- skip: 0 legs (unchanged).
0e5f687 to
eeba121
Compare
eeba121 to
e3113f7
Compare
e3113f7 to
652206d
Compare
3e1b127 to
c8fa532
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
trivial idea struck me and I'm just testing it