(1) concurrency facade part 2#1557
Open
daniel-noland wants to merge 13 commits into
Open
Conversation
94cd4a1 to
6778245
Compare
6778245 to
aff5585
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR continues the workspace-wide migration to the dataplane-concurrency facade so synchronization + threading primitives can be swapped across production, loom, and shuttle backends, while also tightening lint/CI enforcement around avoiding direct std::sync imports.
Changes:
- Switch many crates/modules from
std::sync::*andstd::thread::*toconcurrency::sync::*/concurrency::thread::*, and add the needed crate dependencies/features (loom/shuttle/shuttle_dfs propagation). - Rework the shuttle backend runner to use a
PortfolioRunner(Random + PCT, optionally DFS) and adjust several concurrency-sensitive tests to run under the new model-checking setup. - Add/extend enforcement tooling (clippy
disallowed-types, semgrep rule, opengrep workflow wiring, nextest filtering tweaks).
Reviewed changes
Copilot reviewed 65 out of 66 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tracectl/src/control.rs | Switch to concurrency::sync primitives and simplify locking API usage. |
| tracectl/Cargo.toml | Add concurrency dependency. |
| sysfs/src/lib.rs | Use concurrency::sync::LazyLock. |
| sysfs/Cargo.toml | Add concurrency dependency. |
| routing/src/router/rio.rs | Use concurrency::sync::Arc. |
| routing/src/router/ctl.rs | Use concurrency::sync::Arc. |
| routing/src/fib/test.rs | Add model-checker-friendly concurrency tests and move threading/sync to the facade. |
| routing/src/cli/handler.rs | Use concurrency::sync::Arc. |
| routing/Cargo.toml | Propagate loom/shuttle features to concurrency. |
| nat/src/stateful/apalloc/test_alloc.rs | Refactor allocator concurrency tests to #[concurrency::test] and document loom exclusions. |
| nat/src/stateful/apalloc/port_alloc.rs | Route thread IDs and per-backend shuffling through the facade/backends. |
| nat/src/lib.rs | Gate loom builds with arbitrary_self_types for self: Arc<Self> receivers. |
| nat/Cargo.toml | Adjust feature flags (add loom; simplify shuttle variants). |
| mgmt/tests/reconcile.rs | Use concurrency::sync::Arc and document unwind-safety rationale for using std::sync::Mutex locally. |
| mgmt/src/vpc_manager/mod.rs | Use concurrency::sync::Arc. |
| mgmt/src/processor/mgmt_client.rs | Use concurrency::sync::Arc. |
| mgmt/src/processor/k8s_less_client.rs | Use concurrency::sync::Arc. |
| mgmt/src/processor/k8s_client.rs | Use concurrency::sync::Arc. |
| mgmt/src/processor/gwconfigdb.rs | Use concurrency::sync::Arc. |
| mgmt/src/processor/confbuild/router.rs | Use concurrency::sync::Arc. |
| left-right-tlcache/src/lib.rs | Use facade Mutex in tests and remove unwrap()-style locking. |
| left-right-tlcache/Cargo.toml | Add concurrency as a dev-dependency. |
| k8s-intf/src/client.rs | Use concurrency::sync::Arc. |
| k8s-intf/Cargo.toml | Make concurrency an optional dep tied to the client feature. |
| justfile | Update nextest filtering logic for shuttle and loom runs. |
| interface-manager/src/lib.rs | Use concurrency::sync::Arc. |
| interface-manager/Cargo.toml | Add concurrency dependency. |
| flow-entry/src/flow_table/table.rs | Restructure shuttle-gated concurrency tests to use #[concurrency::test] bodies directly. |
| flow-entry/Cargo.toml | Add loom feature and simplify shuttle variants. |
| dpdk/src/acl/mod.rs | Use facade Arc/thread in concurrency-oriented test. |
| dataplane/src/main.rs | Gate the binary out of loom builds and move runtime logic into a non-loom module. |
| dataplane/src/drivers/dpdk.rs | Fix CmdArgs import path. |
| dataplane/Cargo.toml | Add a workspace-resolving loom feature propagating to concurrency/loom. |
| config/src/gwconfig.rs | Replace ArcSwap metadata storage with concurrency::slot::Slot. |
| config/Cargo.toml | Add concurrency dependency. |
| concurrency/tests/thread_scope.rs | Adjust shuttle gating and allow clippy disallowed-types where needed for the facade. |
| concurrency/tests/stress_dispatch.rs | Update dispatch-test docs/gating to match new shuttle behavior. |
| concurrency/tests/scope_property.rs | Allow clippy disallowed-types for facade-driven tests. |
| concurrency/tests/quiescent_shuttle.rs | Allow clippy disallowed-types for facade-driven tests. |
| concurrency/tests/quiescent_protocol.rs | Allow clippy disallowed-types for facade-driven tests. |
| concurrency/tests/quiescent_properties.rs | Allow clippy disallowed-types for facade-driven tests. |
| concurrency/tests/quiescent_model.rs | Update shuttle docs/gating and allow clippy disallowed-types. |
| concurrency/tests/arc_weak.rs | Update shuttle opt-out rationale and allow clippy disallowed-types. |
| concurrency/src/thread/mod.rs | Refine loom exports, add loom spawn/sleep shims, and add BuilderExt::spawn_scoped. |
| concurrency/src/thread/loom_scope.rs | Route loom scope spawning through the new loom spawn shim. |
| concurrency/src/sync/std_backend.rs | Re-export LazyLock and allow disallowed-types internally. |
| concurrency/src/sync/shuttle_backend.rs | Re-export LazyLock/OnceLock from std with lint allowances. |
| concurrency/src/sync/parking_lot_backend.rs | Re-export LazyLock and allow disallowed-types internally. |
| concurrency/src/sync/mod.rs | Update docs to describe the new shuttle portfolio approach. |
| concurrency/src/sync/loom_backend.rs | Re-export LazyLock from std with lint allowance. |
| concurrency/src/stress.rs | Implement loom stack workaround and shuttle PortfolioRunner dispatch; adjust budgets. |
| concurrency/src/slot.rs | Add load() convenience APIs for both ArcSwap and mutex-fallback implementations. |
| concurrency/src/quiescent.rs | Allow clippy disallowed-types (facade plumbing). |
| concurrency/Cargo.toml | Simplify shuttle feature lattice (remove shuttle_pct) and document portfolio behavior. |
| concurrency-macros/src/lib.rs | Update #[concurrency::test] expansion/docs for the new shuttle feature shape. |
| common/src/cliprovider.rs | Use Slot::load() instead of load_full(). |
| clippy.toml | Disallow direct std::sync::{Mutex,RwLock,...} types. |
| cli/Cargo.toml | Add concurrency dependency. |
| cli/bin/terminal.rs | Use concurrency::sync::Arc. |
| cli/bin/main.rs | Use concurrency::sync::Arc. |
| cli/bin/completions.rs | Use concurrency::sync::Arc. |
| Cargo.lock | Record new dependency edges on dataplane-concurrency. |
| .semgrep/rules/no-std-sync-direct.yaml | Add semgrep rules to forbid direct use std::sync::* imports (outside allowed paths). |
| .github/workflows/lint-opengrep.yml | Configure opengrep to run with auto plus local semgrep rules. |
| .github/workflows/dev.yml | Remove the separate shuttle_pct CI step. |
Comments suppressed due to low confidence (1)
nat/src/lib.rs:15
confidence: 8
tags: [other]
Enabling `#![feature(arbitrary_self_types)]` under `feature = "loom"` makes `nat --features loom` require a nightly toolchain. If the goal is only to support loom model-checking on stable, consider refactoring the `self: Arc<Self>` receivers into regular functions that take `Arc<Self>` as an explicit first argument (or another stable pattern) so loom builds don’t depend on unstable language features.
</details>
aff5585 to
66c8acf
Compare
66c8acf to
70b51a0
Compare
Sweep direct `use std::sync::{Arc, Mutex, RwLock, atomic::*}` imports
across the workspace to `concurrency::sync` so loom/shuttle test builds
can route through instrumented primitives via one feature flip.
Two enforcement layers:
* `clippy.toml` extends `disallowed-types` for the lock primitives.
parking_lot's lock types are distinct concrete types, so clippy
sees through the `concurrency::sync` re-export without flagging
legitimate uses.
* `.semgrep/rules/no-std-sync-direct.yaml` covers the rest (`Arc`,
`Weak`, atomics, `LazyLock`, `OnceLock`, `Once`, `Barrier`,
`Condvar`) where clippy's alias resolution can't distinguish the
facade re-export from `std::sync`. The `concurrency` crate and its
tests are exempt by path.
`mgmt/tests/reconcile.rs` keeps a direct `std::sync::Mutex` because
bolero's `catch_unwind` needs `RefUnwindSafe`, which parking_lot's
`Mutex` doesn't impl. Documented inline with `clippy::disallowed_types`
allow + `nosemgrep` annotation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
`std::thread::Builder::spawn_scoped` is inherent on std but missing on
the loom and shuttle Builders; both ship `Scope::spawn` instead. Add a
`concurrency::thread::BuilderExt` trait with one method:
* std: forwards to the inherent `Builder::spawn_scoped` via
fully-qualified call (Rust's method resolution prefers the
inherent, so the trait impl is dead but kept for symmetry).
* shuttle / loom: discards advisory Builder config, delegates to
`Scope::spawn`, wraps the infallible return in `Ok` to match
std's `io::Result` signature.
`use concurrency::thread::BuilderExt;` lets call sites write
`builder.spawn_scoped(scope, f)` under every backend. Used by the
kernel driver's named scoped threads in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Collapse the chained `shuttle_dfs -> shuttle_pct -> shuttle` features into a single `shuttle` feature backed by `shuttle::PortfolioRunner`. The runner drives `RandomScheduler` and `PctScheduler` in parallel; any scheduler finding a counterexample fails the test (`stop_on_first_failure = true`). `shuttle_dfs` becomes an additive opt-in that adds `DfsScheduler` to the same portfolio. `stress.rs` now has one shuttle arm instead of three, and `#[concurrency::test]` emits one leaf per backend (`loom` / `shuttle`) instead of three shuttle variants. Workspace consumers (`nat`, `flow-entry`) and CI (`dev.yml`) drop the `shuttle_pct` step; the existing `shuttle` step covers Random + PCT in one pass. Tests previously gated `not(feature = "shuttle_pct")` to opt out of single-threaded bodies that PCT panics on are rewritten to `not(feature = "shuttle")` since PCT now runs in every shuttle build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Tier A of the std::thread sweep that complements the `concurrency::sync`
facade migration. Swap `use std::thread` to `use concurrency::thread`
in two test modules whose tests are candidates for
`#[concurrency::test]` conversion:
* `routing/src/fib/test.rs` -- prerequisite for the FIB race-test
conversion later in the stack.
* `dpdk/src/acl/mod.rs::classify_concurrent_arc_shared` -- import
swap only; the test runs under real DPDK EAL so the macro
conversion is deferred.
Production threading sites (`dpdk/src/lcore.rs`, `mgmt/src/processor/
launch.rs`, `routing/src/router/rio.rs`, `dataplane/src/statistics`,
`test-utils`) are left alone -- they need real OS threads and never
compile under loom/shuttle. A later sweep adds clippy/semgrep
enforcement once production sites are also routed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Tier B of the std::thread sweep. `ThreadPortMap` keys a per-thread `RwLock<HashMap<ThreadId, _>>` by `std::thread::current().id()`. Each backend ships its own `ThreadId`, so a std-typed map would silently work in production while loom/shuttle key the table by their own thread identity. Route the import and call sites through `concurrency::thread` so the key tracks the active backend. No behavioural change under the default backend. Prerequisite for any future loom/shuttle exercise of the NAT allocator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Replace the hand-rolled `shuttle::check_random(... 100)` wrappers in `shuttle_tests` with `#[concurrency::test]`, which routes bodies through `concurrency::stress` (loom `model`, shuttle `PortfolioRunner`). The module is renamed `concurrency_tests` and gated to `cfg(any(feature = "shuttle", feature = "loom"))`. `FlowTable::insert` spawns a tokio task for the flow timer, which would panic without a running runtime; the existing `start_timer` bypass under shuttle is extended to loom. Tokio-driven coverage of `insert` stays in `std_tests`. `test_flow_table_timeout` is dropped from the model-checker mod: it's single-threaded (PCT rejects), and `std_tests` already has the authoritative `#[tokio::test(start_paused = true)]` version. Adds `loom = ["concurrency/loom"]` to `flow-entry/Cargo.toml` so the macro-emitted cfg arm resolves to a known feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Move `test_fib_removals` and `test_leftright_destroy_race_simple` out of the `#[concurrency_mode(std)]` block into a sibling `concurrency_tests` module that runs through `#[concurrency::test]`: default backend smoke run, `loom::model` under loom, shuttle's PortfolioRunner under shuttle. The heavy fuzz loops (`test_concurrency_fib` / `test_concurrency_fibtable`) stay on std -- their 100k+ packet iteration counts are TSAN-calibrated, not for per-iteration model-checking cost. Iteration counts are tuned per backend via `cfg_select!`: 5 rounds under loom/shuttle (vs 1000 on std), with a fixed reader/worker budget under the model checkers so unbounded poll loops don't trip shuttle's `max_steps` ceiling. `test_packet` is inlined because the original lives in the std-gated `mod tests` and is invisible under loom/shuttle. Add `loom`, `shuttle`, `shuttle_dfs` features to `routing/Cargo.toml` so the macro-emitted cfg arms resolve to known features. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Bump `concurrency::stress`'s shuttle `Config::stack_size` from the default 32 KiB to 4 MiB. Shuttle wraps each atomic/lock primitive with bookkeeping that pushes per-instance size into the 100-byte range (an `AtomicBool` is ~100 bytes under shuttle), so any non-trivial atomic-heavy body blows through the default. The historical workaround was per-call `shuttle::Config` overrides at 1 MiB (notably in NAT's allocator tests). One number in the dispatcher kills the per-test knob. 4 MiB carries the heaviest workspace consumer (NAT allocator's per-block atomic arrays) with headroom; the cost is `N workers * 4 MiB` per stress iteration, well below CI memory pressure. `shuttle::Config` is `#[non_exhaustive]`, so written as a mutation of `Config::default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Now that `concurrency::stress` carries a 4 MiB shuttle stack and the
PortfolioRunner runs Random + PCT in parallel, the custom
`shuttle_config` / `run_shuttle_random` / `run_shuttle_pct`
scaffolding in `tests_shuttle` is redundant. Replace with a single
`mod concurrency_tests` that flips each test to `#[concurrency::test]`:
* `test_concurrent_allocations_two_ips` (was
`..._without_shuttle`) -- two threads against distinct source IPs;
smoke run on default backend, full coverage on
`--features shuttle`.
* `test_concurrent_allocations_three_workers` (was
`..._shuttle_random` + `_pct`, collapsed) -- portfolio runs both
schedulers in one invocation.
* `test_ensure_shuttle_works` -- gated to model-checker backends
only; the deliberate race only reaches the failing schedule under
a real scheduler, the default backend's one-shot run is
non-deterministic.
Drops the helpers plus the `Arc` / `thread` imports in `std_tests`
that they pulled in. Adds `loom = ["concurrency/loom"]` to
`nat/Cargo.toml` so the macro-emitted cfg arm resolves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Make `just features=loom test` build and run end-to-end across the
workspace. Earlier commits had loom working only on the `concurrency`
crate; everything else failed to compile or crashed at runtime with
stack overflow, Arc leak, or DashMap destructor panics. Bundled into
one commit because each fix was discovered by running the previous
one.
## concurrency
* `fn sleep(_: Duration)` shim under loom (loom 0.7 doesn't model
time; yields to the scheduler so the call still acts as a
schedule point).
* Enumerate `loom::thread` re-exports and shadow `spawn` with a
4 MiB default stack. Loom 0.7's default coroutine stack is 4 KiB,
which overflows trivially under atomic-heavy `concurrency::sync`
types.
* `stress` under loom wraps the body in a 4 MiB
`Builder::spawn` so the main 4 KiB coroutine just spawns and
joins. Costs one of loom's five thread slots.
* `loom_scope::Scope::spawn` routes through `super::spawn` for the
same default.
* `Slot::load()` / `SlotOption::load()` helpers used by
`common::cliprovider` to drop a redundant `load_full()` clone.
## nat
* `cfg_attr`-gate `#![feature(arbitrary_self_types)]`: loom wraps
`Arc<T>` in a facade newtype that isn't a blessed self-receiver,
so `self: Arc<Self>` methods on `AllocatedIp` /
`AllocatedPortBlock` need the unstable feature there.
* `#[concurrency_mode(loom)]` no-op `shuffle_slice` (loom needs
determinism for replay; shuffle is allocation-order heuristic,
not correctness).
* Gate `concurrency_tests` off loom: the facade's `Weak` shim holds
a strong clone of the `Arc`, so the allocator's
`Weak::upgrade().is_none()` liveness signal never fires and
loom's `Arc leaked` assertion catches it.
## flow-entry
* Gate `concurrency_tests` to shuttle only: `FlowTable`'s internal
`DashMap` panics in loom's end-of-execution cleanup (sharded
`RwLock`s don't fit loom's strict lifecycle accounting).
## routing
* Gate `fib::test::concurrency_tests` off loom: the `left_right`
epoch state space is too large for exhaustive search to terminate
in reasonable time.
## dataplane
* The binary builds an `Arc<dyn Fn(...) ...>` trait-object closure
in `packet_processor::setup_internal`, which needs `CoerceUnsized`
on the concrete `Arc`. Loom 0.7's `Arc` doesn't carry that trait,
and the facade newtype can't add it. Gate the bin out of loom
builds: extract the body to `dataplane/src/runtime.rs` and leave
`main.rs` with a stub `main` under loom that panics if invoked.
Library crates still get loom coverage through feature
propagation.
* `dataplane/src/drivers/dpdk.rs` switches `use crate::CmdArgs` to
`use args::CmdArgs;` to follow the new module layout.
CI's loom step in `.github/workflows/dev.yml` still scopes to
`--package=dataplane-concurrency` because the loom-incompatible tests
across the workspace are now cfg-gated rather than package-filtered;
the package scope is no longer load-bearing for the test invocation,
only for cargo's feature unification.
Default and `--features shuttle` builds unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Daniel Noland <daniel@githedgehog.com>
Leftover from the superseded dis-guard exploration; nothing references it. Public-API drift in `concurrency::slot` for no benefit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
The import-only enforcement let `static X: std::sync::LazyLock<T> = std::sync::LazyLock::new(...)` slip through, and one such site already existed in `config/src/external/overlay/vpcpeering.rs`. Extend the rule with a multi-line regex backstop that matches the facade-managed type names in any expression position, with a leading-comment lookahead so rustdoc intra-doc links (`/// [std::sync::Arc]`) don't false-positive. Also expand the grouped-import regex to span multiple lines. Convert the offending FQN to `concurrency::sync::LazyLock`. Move the deliberate `mgmt/tests/reconcile.rs` `nosemgrep:` onto the same line as the `std::sync::Mutex::new` it suppresses, and annotate the intentional `std::sync::Arc` in `concurrency::stress` (shared *across* `loom::model` invocations, so it must remain a std Arc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
70b51a0 to
7c9c5f5
Compare
The previous loom step ran `just test concurrency` only, which left the workspace-wide loom compile (the whole point of the facade's local `Weak<T>` shim and `Arc::downgrade`) unprotected. Add a `cargo check`-equivalent step ahead of the test run so a regression in any consumer crate fails CI directly. Tests stay scoped to the concurrency crate -- model-checking the whole workspace under loom is intractable. Update the inline comment to reflect the new reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Daniel Noland <daniel@githedgehog.com>
7c9c5f5 to
93f9bb7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 2 of the concurrency facade rollout (follows #1544). Migrates workspace consumers to
concurrency::syncso loom/shuttle test builds route through their instrumented primitives via one feature flip, adds enforcement so future drift is caught at lint time, and makesjust features=loom testbuild and run end-to-end across the workspace.The migration is mechanical (one swap of
std::sync->concurrency::syncper consumer). What's load-bearing is the enforcement pair:clippy.toml'sdisallowed-typescatches the lock primitives (parking_lot's concrete types let clippy see through the re-export); a semgrep rule catches the rest (Arc,Weak, atomics,LazyLock, ...) where clippy's alias resolution can't distinguishconcurrency::sync::Arcfromstd::sync::Arc. One documented exception inmgmt/tests/reconcile.rs(bolero needsRefUnwindSafe, which parking_lot'sMutexdoesn't impl).While the workspace is being touched, three side cleanups land in the same stack: collapse the
shuttle_dfs -> shuttle_pct -> shuttlefeature chain to a singleshuttlefeature backed byPortfolioRunner(Random + PCT in parallel, DFS additive); bump the shuttle stack inconcurrency::stressto 4 MiB so per-testshuttle::Configoverrides go away; convert the existing shuttle tests innat,flow-entry, androutingto#[concurrency::test]so they pick up the same backend matrix.The top commit is
feat(loom): workspace-wide loom compile + runtime support— bundled because each fix in it was discovered by running the previous one (stack overflow on loom's 4 KiB coroutines,Arc leakedfrom the facade'sWeak-as-strong-clone shim hitting NAT's liveness pattern,DashMapdestructor panics in loom's end-of-execution cleanup). Cohesive final state is what reviewers should evaluate.