feat: east-west operator — service graph, per-pod identity + pod labels, declared-edge delta#50
Closed
pigri wants to merge 10 commits into
Closed
feat: east-west operator — service graph, per-pod identity + pod labels, declared-edge delta#50pigri wants to merge 10 commits into
pigri wants to merge 10 commits into
Conversation
…vice-graph-api A leader-elected producer (--graph-producer) that uploads workload identities + NetworkPolicy declared edges to service-graph-api over REST, which writes them into the Apache AGE service_graph. The operator stays REST-only (no DB), mirroring the Identity/Edge producers. Extracts the NetworkPolicy walk into a shared walkPolicyEdges helper so buildEdgeDoc (agent allow-list) and the graph producer derive edges from one interpretation (buildEdgeDoc output unchanged). Content-hash gated full snapshot; stdlib-only (no new deps).
…ontrol-plane) The GraphProducer now derives node classification from pure Kubernetes state and sends it on each workload vertex: - internet_exposed: pod selected by a LoadBalancer/NodePort Service, or by a Service referenced from an Ingress backend - control_plane: kube-system, or a named core component (apiserver/etcd/...) - role: derived (control-plane > internet-exposed > internal) Lists Services + Ingresses (RBAC added, read-only); failures degrade gracefully (identities + declared edges still upload). Unit-tested.
…nents Was painting all of kube-system control-plane. Now matches only the actual control plane by name (kube-apiserver / etcd / kube-controller-manager / kube-scheduler / cloud-controller-manager / ccm); add-ons like cert-manager, CNI, autoscalers, DNS, kube-proxy are internal.
The service graph's observed layer is produced by the agent keyed by raw IP (the agent has no workload identity so it works on-prem). The operator is the only producer that knows the IP -> workload mapping, so the graph producer now ships each workload's deduped, sorted pod IPs. service-graph-api builds an IP index from these and resolves observed traffic to the workload ref. IPs are included in the change-detection hash so IP churn triggers a re-upload.
On an overlay cluster the host-network agent observes node-level traffic (pod-to-pod is VXLAN-encapsulated), so the observed layer is keyed by node IPs. Declare each Node as a `node/<name>` vertex carrying its Internal + External IPs (deduped, sorted) so service-graph-api resolves node-level observed edges to named nodes instead of host/<ip>. Control-plane nodes are flagged via the standard role labels. Node IPs feed the same change-detection hash and IP index as pod workloads.
StatefulSet replicas (postgres core-0/1/2, kafka brokers) have stable per-ordinal identities and talk to each OTHER — DB streaming replication, inter-broker traffic. Keying them by the set name collapsed all replicas to one ref, so that traffic became a self-loop and was invisible in the graph. Key StatefulSet pods per-pod (core-0/core-1/core-2) so replicas are distinct vertices and their mutual edges are real. Deployments stay aggregated.
…PodSet) Kafka brokers are owned by Strimzi's StrimziPodSet, not a StatefulSet, so the StatefulSet-only check missed them. Key per-pod for ANY controller whose pods are <controller>-<ordinal> (StatefulSet, StrimziPodSet, ...), so kafka brokers (core-kafka-0/1/2) also become distinct vertices. Deployment pods (random RS suffix) still aggregate.
collectDeclaredEdges dropped the ports walkPolicyEdges already provides, so
graph DECLARED_EDGEs had no port. Emit one edge per (src,dst,port) ('*' when
the rule has no port restriction); declaredEdge gains Port and the version hash
includes it.
Attach each pod's labels (dropping k8s-internal churn keys like
pod-template-hash / controller-revision-hash) to the identity record — written
as a nested map in the MMDB baseline and carried in the delta upserts — so the
agent can evaluate identity.k8s.{src,dst}_label["<key>"] rules.
Mirror the identity delta producer for declared edges. The EdgeProducer becomes informer-driven: on NetworkPolicy/Pod/Namespace change it emits an incremental EdgeDelta (allow-list lines added/removed, tagged with epoch/seq for gap detection) to download-api, alongside a periodic full allow-list baseline for cold-start/resync. Factor the cluster-state reads into edgeInputs(), shared by the baseline build (buildAndUpload) and the delta flush (flushDelta). main.go wires the informer event handlers; tests cover the delta emission.
3d5d83f to
30e64f0
Compare
Contributor
Author
|
Superseded by #47. Per the architecture decision, #47 adopts the context-only / agent-built-graph direction and now grafts in this PR's per-pod identity (StatefulSet/StrimziPodSet), pod labels, and event-driven edge-delta producer. #50's operator-side GraphProducer path is intentionally dropped (the agent builds the graph). Closing in favor of #47. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Operator-side support for the east-west service graph + identity-aware microsegmentation. The operator is the Kubernetes context provider: it resolves cluster state (Pods, NetworkPolicies, Nodes) into the identity, declared-edge, and graph artifacts the agent consumes.
Service graph (GraphProducer)
Workload identity
identity.k8s.{src,dst}_labelrules.Declared-edge delta producer
EdgeDelta(allow-list lines added/removed, tagged with epoch/seq for gap detection) to download-api, plus a periodic full baseline for cold-start / resync. Mirrors the identity delta producer.Rebased onto current main.
go build ./...,go vet ./..., andgo test ./...all pass.