East-west agent: identity MMDB, declared-edge microseg, behavioral + direction fields#410
Closed
pigri wants to merge 14 commits into
Closed
East-west agent: identity MMDB, declared-edge microseg, behavioral + direction fields#410pigri wants to merge 14 commits into
pigri wants to merge 14 commits into
Conversation
…ral + direction fields Adds the agent half of the east-west / lateral-movement detection chain. - Direction + behavior: populate ids.dst_home_net/dst_external_net, ids.src_pod_net/dst_pod_net and the flow.* behavioral group (unique_dst_ports, flows_per_min, dst_port_entropy, ...) at BOTH eval sites (kernel_pump XDP and the AF_PACKET fallback). POD_NETS/is_pod_ip distinguishes true pod-to-pod from node-to-node-over-public-IP. New BlockSource::Microseg/Behavioral. - Workload identity: IdentityMmdbWorker pulls identity.mmdb on the threat-MMDB rails (push-aware via config SSE + interval-poll fallback); a security::identity lookup module resolves src/dst IP -> workload/namespace, surfaced as id.* fields. - Declared-edge microsegmentation: security::edge_set consumes the policy-edges allow-list; edge.declared / edge.policy_violation evaluated per flow. - Per-rule `log` smart-firewall action (records a notice, installs no kernel block) for staged alert-first rollout. - Overlay VXLAN/Geneve decap in the IDS inspect path so pod-to-pod overlay frames decode. Requires the matching amygdala scheme fields (ids.dst_*, ids.*_pod_net, flow.*, id.*, edge.*, Action::Log). The amygdala dependency + Cargo.lock need bumping once that publishes; build is blocked until then. Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
…path scans
is_pod_ip and is_protected_ip each took an RwLock read and walked a Vec<IpNet>
linearly, so the capture hot path paid four locked O(N) scans per flow
(src/dst x pod/protected).
Add a precompiled SegmentTable (`classify_ip` -> {protected, pod}) returning both
memberships in one pass, behind an ArcSwap rebuilt on every HOME_NET / POD_NET
mutation — reads are now lock-free and a hot-reload is a single pointer swap.
is_pod_ip / is_protected_ip delegate to it (API unchanged).
Representation is adaptive: a flat scan at or below 16 CIDRs (the common k8s
handful, where hashing overhead would lose), a prefix-length-bucketed hash above
it (O(distinct prefix lengths), for the many-subnet case). std-only, no new
lookup crate. Measured ~7.7x on a ~200-CIDR set; parity on small sets.
Fuse the two enrichment sites (kernel_pump XDP + AF_PACKET fallback) to two
classify_ip calls instead of four is_*_ip calls.
Tests cover both representations, equivalence vs the legacy linear scan, IPv6,
and hot-reload; an ignored micro-bench records the crossover.
Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
The per-source flow accumulator used SipHash everywhere: the global DashMap<IpAddr, Entry> plus the per-entry dst_port_counts / src_ports maps, all touched on the capture hot path. Switch to ahash. The outer map is keyed by source IP (attacker-influenceable), so it takes a per-process random seed (RandomState::new) for hash-flood resistance; the inner port maps are bounded by PORT_CAP, so a default seed is fine. Public API and behaviour unchanged — the existing aggregation / entropy / TTL / port-cap tests cover it. Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
…rary The IdentityInfo/IdentityClient model + MMDB reader and the EdgeSet parser/ evaluator move to the hippocampus crate; synapse-core keeps the process-global client singletons + capability registration as thin wrappers that re-export the library types. Public API (lookup/evaluate/init_*/refresh_*/get_version_cache, IdentityInfo/EdgeVerdict) is unchanged, so the workers and enrichment call sites are untouched. hippocampus is a dev path dependency for now; switch to the gen0sec registry once it is published.
Every src/dst identity lookup did a memmap B-tree walk + maxminddb decode (three String allocations), and the same pod/node IPs recur across every connection, so the walk was repeated constantly for a small stable set of IPs. Add an IP-keyed cache (positive and negative results) in front of the walk, keyed with a per-process-seeded ahash map. Identity is stable for an IP between MMDB refreshes; the cache is cleared wholesale on every refresh so a re-labelled IP is never served stale. Bounded by a coarse cap (clear-on-full) so a many-IP scan can't grow it without limit. Public `lookup` API unchanged. Tested with a counting fake resolver: resolves once per IP, caches negatives, re-resolves after clear (refresh), and stays bounded. Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
The default release profile already sets lto="thin" + codegen-units=1. Add an
opt-in release-lto profile (inherits release, upgrades to fat LTO) for the
shipping image build only — fat LTO is materially slower to build, so dev/CI
keep using the thin default.
Measured on the synapse agent binary (default features):
- size: 52 MiB (thin) -> 46 MiB (fat), -5.2 MiB / -10.3% (better cross-crate
dead-code elimination)
- build: ~196s -> ~354s for the binary
- compatibility unchanged: no target-cpu/target-feature is set, output stays
baseline x86-64 (ELF for GNU/Linux 3.2.0), so it runs on every node; the
SSL-uprobe symbol table is preserved (strip="debuginfo" inherited). LTO is
link-time only and does not change the target ISA.
Build the shipping image with `cargo build --profile release-lto`.
Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
…istic bench) A realistic east-west benchmark (bench_realistic_enrichment: pod/node/external mix, 32-CIDR HOME_NET) showed the original LINEAR_THRESHOLD=16 was a net ~1.6x REGRESSION: a real HOME_NET (broad pod /16 + service /12 + node /32s) tipped over 16 entries into the Bucketed hash, which pays a fixed per-prefix-length probe count and cannot exploit the short-circuit that real pod traffic gets on the broad CIDR. The earlier 7.7x was on adversarial all-miss input, not real traffic. Fix: raise the threshold to 64 (above any realistic cluster's node/LB count) and sort the Linear representation broad-prefix-first so the scan short-circuits on the broad pod/service CIDR regardless of config order. The real win is the lock-free ArcSwap read + 4->2 call fusion, not the data structure. Bucketed is kept only for pathological large no-broad-prefix configs. Full-path realistic result now: OLD RwLock+4 linear 64 ns/pkt -> NEW ArcSwap+2 classify 49 ns/pkt = ~1.3x; large disjoint sets still ~7.9x; small parity. Claude-Session: https://claude.ai/code/session_01MAY62VuZHhkeXgQ9feF3fJ
hippocampus is published, so synapse-core depends on it via the registry instead of a local path. A commented [patch.gen0sec] entry keeps the local-path dev override available, matching the amygdala/cortex pattern.
amygdala 0.1.6 publishes the id./edge./flow./ids. wirefilter enrichment fields this branch depends on; synapse-app now builds against the registry crate.
Adds bench_concurrent_classify: N threads hammering the lookup, old RwLock read+scan vs the new lock-free ArcSwap classify_ip, measuring how aggregate throughput scales with thread count. Result (8 threads): the RwLock baseline does not scale (25.7 -> 16.6 -> ~28 Mops/s for 1/2/8 threads; concurrent readers bounce the lock's reader-count cache line), while ArcSwap scales near-linearly (23 -> 45 -> 86 -> 174 Mops/s), 6.3x the RwLock at 8 threads. This is the real system-level justification for the lock-free change: per-packet single-thread cost is a wash, but a multi-core sharded capture path serializes on the old RwLock and not on ArcSwap.
…lpm-segment-fusion # Conflicts: # crates/synapse-core/src/security/identity/mod.rs
access-rules: lock-free concurrency-scaling segment classifier + ahash, identity cache, opt-in fat-LTO
# Conflicts: # crates/synapse-app/src/kernel_pump.rs
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft — blocked on the amygdala scheme fields (see depends-on below).
Adds the agent half of the east-west / lateral-movement detection chain: make internal pod-to-pod traffic visible, enrich it with direction + behavior + workload identity, and express microsegmentation as a declared-edge allow-list.
What's here
Direction + behavioral fields — populates
ids.dst_home_net/ids.dst_external_net,ids.src_pod_net/ids.dst_pod_net, and theflow.*behavioral group (unique_dst_ports,flows_per_min,dst_port_entropy, …) at both eval sites:kernel_pump(XDP) and the AF_PACKET fallback, so enrichment never silently no-ops on fallback nodes.POD_NETS/is_pod_ip(insynapse-access-rules) distinguishes true pod-to-pod from node-to-node-over-public-IP (both are in HOME_NET, so HOME_NET alone can't). NewBlockSource::Microseg/Behavioralfor source attribution.Workload identity (pod IP → workload/namespace) —
IdentityMmdbWorkerpullsidentity.mmdbon the existing threat-MMDB rails (push-aware via the config SSE channel + interval-poll fallback). Asecurity::identitylookup module resolves src/dst IPs, surfaced asid.*fields. No in-cluster RBAC on the agent — it's a downloaded artifact like the threat MMDB.Declared-edge microsegmentation —
security::edge_setconsumes apolicy-edgesallow-list;edge.declared/edge.policy_violationare evaluated per flow so a rule likeedge.policy_violation == 1enforces microseg at the wire.Per-rule
logsmart-firewall action — matches and records a notice but installs no kernel block, regardless ofenforce_block. Lets a single new rule dry-run while others stay enforcing (staged alert-first rollout).Overlay VXLAN/Geneve decap in the IDS inspect path so pod-to-pod overlay frames decode to their inner Ethernet frame.
Depends on
Requires the matching amygdala scheme fields —
ids.dst_*,ids.*_pod_net,flow.*,id.*,edge.*, andAction::Log(gen0sec/amygdala#9). Until that merges and publishes, this won't build; the amygdala dependency version +Cargo.lockneed bumping at that point. Kept as a draft for that reason.Notes for review
mainhas since advanced (0.7.4 + the classifier HOME_NET skip), so rebase ontomainbefore un-drafting.Cargo.lockis intentionally unchanged — regenerate after the amygdala bump.enforce_blockis flipped; theis_protected_ipnever-ban guard still wraps every block.