From 7757b4fd31a8e9f2368373d293b37253bee7c5fd Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Wed, 10 Jun 2026 06:18:48 +0200 Subject: [PATCH 1/4] =?UTF-8?q?ci(miri):=20bump=20timeout=2030=E2=86=9245?= =?UTF-8?q?=20so=20lean-mem=20contention=20doesn't=20fail=20green=20runs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The release-commit CI run went red solely because Miri timed out at 30 min on the self-hosted lean-mem pool while it was contended clearing the #509 outage backlog — every required gate (Format/Clippy/Test/Docs/YAML/Traceability) was green, and Rocq proofs passed. Miri's own comment already anticipated this ("Revisit once we have a few green runs to set the budget closer to actual"; it was previously bumped 15→30 for the same lean-mem slowness). 45 min gives headroom on the lean-mem class without masking a real hang (a genuine UB loop still trips the budget). Does not touch the deliberate self-hosted runner choice. Refs: #509 Trace: skip --- .github/workflows/ci.yml | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 6425f5d..a027615 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -440,9 +440,13 @@ jobs: # 15 min with the last printed test at the 11-min mark (i.e., # the slow tests at the tail just ran past the budget on # smithy's lean-mem class). Hosted may have been fine because - # of different tail-test perf characteristics. Revisit once - # we have a few green runs to set the budget closer to actual. - timeout-minutes: 30 + # of different tail-test perf characteristics. + # Bumped 30→45 (2026-06-10): under self-hosted-pool contention + # (clearing the #509 outage backlog) Miri ran past 30 min and + # timed out, failing the run while every required gate was green. + # 45 gives headroom on the lean-mem class without masking a real + # hang (a genuine UB loop still trips the budget). + timeout-minutes: 45 env: MIRIFLAGS: "-Zmiri-disable-isolation -Zmiri-tree-borrows" From 7d55f0c2ee2e4e137a0a708d6f778fa2b3719c72 Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Wed, 10 Jun 2026 06:35:03 +0200 Subject: [PATCH 2/4] ci(verus): run on ubuntu-latest with cachix Nix, mirroring the green rocq job Verus Proofs was perma-red at the "Install Nix" step: the no-sudo daemonless DeterminateSystems installer (crafted for the self-hosted NoNewPrivileges runner) broke on a version bump and escalated to sudo, which fails there. The proofs never even ran. Rocq does the identical Bazel+Nix work on ubuntu-latest with cachix/install-nix-action@v30 and is green. Move Verus to the same config: ubuntu-latest + cachix installer + nix_path. Fixes the install, and removes the job's dependency on the self-hosted pool (also a #509 resilience win). 16 GB hosted RAM is ample for the rivet Verus specs; the lean-mem RAM headroom is no longer needed. Stays continue-on-error (advisory) like rocq. Refs: #509 Trace: skip --- .github/workflows/ci.yml | 37 +++++++++++++++---------------------- 1 file changed, 15 insertions(+), 22 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index a027615..25f294c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -738,8 +738,13 @@ jobs: verus: name: Verus Proofs needs: [test] - # lean-mem: Verus solver work benefits from RAM headroom. - runs-on: [self-hosted, linux, x64, lean-mem] + # ubuntu-latest (GitHub-hosted): mirrors the green `rocq` job — full sudo, + # no NoNewPrivileges, so the standard cachix Nix installer works. Moved off + # the self-hosted lean-mem pool (#509): there the no-sudo daemonless + # nix-installer broke on a version bump and chronically failed at install, + # so this advisory job was perma-red regardless of the proofs. 16 GB on the + # hosted runner is ample for the rivet Verus specs. + runs-on: ubuntu-latest continue-on-error: true timeout-minutes: 20 steps: @@ -756,27 +761,15 @@ jobs: # of which target is being built, so Nix must be on PATH even for # Verus-only invocations. # - # The self-hosted runners run with `NoNewPrivileges=true` (systemd - # hardening), which makes `sudo` inside `cachix/install-nix-action` - # fail with "sudo: 'no new privileges' flag is set". Use the - # DeterminateSystems installer which supports a daemonless - # single-user mode that does not need sudo. + # ubuntu-latest is GitHub-hosted (full sudo, no NoNewPrivileges), so the + # standard cachix multi-user installer works — matching the green `rocq` + # job. The DeterminateSystems daemonless variant was only needed for the + # self-hosted NoNewPrivileges runner, and an installer version bump broke + # its no-sudo path (it escalated to sudo and failed at install); #509. - name: Install Nix - # Pinned by SHA + `determinate: false`. The action's `determinate` - # input default flipped false->true between v17 and v22, so the - # former `@main` pin started installing Determinate Nix, whose - # `determinate-nixd` daemon ignores `init: none` and fails on - # no-sudo / NoNewPrivileges runners (missing daemon socket, - # root-owned store lock). `determinate: false` restores the plain - # upstream daemonless install this job was written for. - uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25 # v22 - with: - determinate: false - init: none - extra-conf: | - experimental-features = nix-command flakes - - name: Add Nix to PATH - run: echo "$HOME/.nix-profile/bin" >> "$GITHUB_PATH" + uses: cachix/install-nix-action@v30 + with: + nix_path: nixpkgs=channel:nixos-unstable - name: Verify Verus specs run: bazel test //verus:rivet_specs_verify - name: Upload Verus test log From 774a94dd640e474c28033999c9c733c752861ffa Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Wed, 10 Jun 2026 06:38:13 +0200 Subject: [PATCH 3/4] =?UTF-8?q?docs(artifacts):=20file=20REQ-215=20?= =?UTF-8?q?=E2=80=94=20CI-reliability=20fix=20(Verus=20ubuntu-latest,=20Mi?= =?UTF-8?q?ri=20timeout)=20as=20v0.17.0=20scope?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Tracks the Miri+Verus CI-reliability fixes in this PR as the first v0.17.0 scope item (baseline v0.17.0-track), traced to REQ-051 (CI-enforced gates) and the #509 resilience theme. Trace: skip --- artifacts/requirements.yaml | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/artifacts/requirements.yaml b/artifacts/requirements.yaml index 22eaff0..434e36f 100644 --- a/artifacts/requirements.yaml +++ b/artifacts/requirements.yaml @@ -6754,3 +6754,45 @@ artifacts: created-by: ai-assisted model: claude-opus-4-8 timestamp: 2026-06-09T09:30:00Z + + - id: REQ-215 + type: requirement + title: "Proof CI jobs (Verus, Miri) must not be perma-red from self-hosted-pool infra; run them on a config that actually completes (v0.17.0, #509)" + status: implemented + description: | + Finding (v0.16.0 post-release CI). The required gates were all green and + v0.16.0 shipped, yet the main CI run was red from two advisory/proof jobs + failing on self-hosted-pool infrastructure, not on code: + - Verus Proofs: perma-red at "Install Nix". The no-sudo daemonless + DeterminateSystems installer (crafted for the self-hosted + NoNewPrivileges runner) broke on a version bump and escalated to + sudo, which fails there — the proofs never ran. + - Miri: timed out at 30 min on the slow `lean-mem` self-hosted runner + while the pool was contended clearing the #509 outage backlog. + + Fix (first v0.17.0 CI-reliability work, #509 theme): + - Move Verus to `ubuntu-latest` + `cachix/install-nix-action@v30`, + mirroring the green `rocq` job exactly. Fixes the install and removes + the self-hosted-pool dependency. Stays continue-on-error (advisory). + - Bump Miri `timeout-minutes` 30 → 45 for headroom on the lean-mem + class under contention, without masking a real hang. + + Acceptance: + - The `Verus Proofs` job runs on `ubuntu-latest` with + `cachix/install-nix-action` and its check goes GREEN (the proofs run, + no Install-Nix failure). + - The `Miri` job `timeout-minutes` is 45. + - `actionlint` clean (modulo pre-existing custom runner-label notes). + - A subsequent main CI run is green on all non-`continue-on-error` jobs. + tags: [ci, reliability, verus, miri, nix, dogfooding] + fields: + priority: should + category: non-functional + baseline: v0.17.0-track + links: + - type: traces-to + target: REQ-051 + provenance: + created-by: ai-assisted + model: claude-opus-4-8 + timestamp: 2026-06-10T08:00:00Z From b8355f152dff1c837483dd4b79b907a918d8cd5e Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Wed, 10 Jun 2026 06:46:49 +0200 Subject: [PATCH 4/4] ci(verus): decouple from test job so it runs on the available hosted runner Mirrors rocq (#299): with needs:[test] the ubuntu-latest Verus job sat blocked behind the contended self-hosted rust-cpu test job. Decoupling lets it run immediately on the always-available hosted runner. Refs: #509 Trace: skip --- .github/workflows/ci.yml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 25f294c..22ae788 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -737,7 +737,11 @@ jobs: # are green under the unified workspace. verus: name: Verus Proofs - needs: [test] + # Decoupled from `test` (mirrors `rocq`, #299): Verus runs the proof specs + # via Bazel, independent of cargo-test status. With `needs: [test]` this + # ubuntu-latest (always-available) job sat blocked behind the contended + # self-hosted rust-cpu `test` job — the exact saturation Rocq decoupled to + # avoid (#509). # ubuntu-latest (GitHub-hosted): mirrors the green `rocq` job — full sudo, # no NoNewPrivileges, so the standard cachix Nix installer works. Moved off # the self-hosted lean-mem pool (#509): there the no-sudo daemonless