Skip to content

feat: east-west operator — service graph, per-pod identity + pod labels, declared-edge delta#50

Closed
pigri wants to merge 10 commits into
mainfrom
feat/service-graph-workload-ips
Closed

feat: east-west operator — service graph, per-pod identity + pod labels, declared-edge delta#50
pigri wants to merge 10 commits into
mainfrom
feat/service-graph-workload-ips

Conversation

@pigri

@pigri pigri commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Operator-side support for the east-west service graph + identity-aware microsegmentation. The operator is the Kubernetes context provider: it resolves cluster state (Pods, NetworkPolicies, Nodes) into the identity, declared-edge, and graph artifacts the agent consumes.

Service graph (GraphProducer)

  • Emits workload pod IPs and declares Kubernetes nodes as graph vertices.
  • Classifies workloads (role / internet-exposed / control-plane), narrowed to true control-plane components.
  • Uploads the east-west service graph to service-graph-api.

Workload identity

  • Per-pod identity for StatefulSet pods, generalized to any ordinal controller (e.g. StrimziPodSet).
  • Emits pod labels (k8s-internal churn keys filtered out) in the identity MMDB + delta, for identity.k8s.{src,dst}_label rules.

Declared-edge delta producer

  • Declared edges carry the NetworkPolicy port.
  • Event-driven EdgeProducer: on NetworkPolicy / Pod / Namespace change it emits an incremental EdgeDelta (allow-list lines added/removed, tagged with epoch/seq for gap detection) to download-api, plus a periodic full baseline for cold-start / resync. Mirrors the identity delta producer.

Rebased onto current main. go build ./..., go vet ./..., and go test ./... all pass.

pigri added 10 commits July 3, 2026 08:04
…vice-graph-api

A leader-elected producer (--graph-producer) that uploads workload identities +
NetworkPolicy declared edges to service-graph-api over REST, which writes them
into the Apache AGE service_graph. The operator stays REST-only (no DB),
mirroring the Identity/Edge producers.

Extracts the NetworkPolicy walk into a shared walkPolicyEdges helper so
buildEdgeDoc (agent allow-list) and the graph producer derive edges from one
interpretation (buildEdgeDoc output unchanged). Content-hash gated full
snapshot; stdlib-only (no new deps).
…ontrol-plane)

The GraphProducer now derives node classification from pure Kubernetes state and
sends it on each workload vertex:
- internet_exposed: pod selected by a LoadBalancer/NodePort Service, or by a
  Service referenced from an Ingress backend
- control_plane: kube-system, or a named core component (apiserver/etcd/...)
- role: derived (control-plane > internet-exposed > internal)

Lists Services + Ingresses (RBAC added, read-only); failures degrade gracefully
(identities + declared edges still upload). Unit-tested.
…nents

Was painting all of kube-system control-plane. Now matches only the actual
control plane by name (kube-apiserver / etcd / kube-controller-manager /
kube-scheduler / cloud-controller-manager / ccm); add-ons like cert-manager,
CNI, autoscalers, DNS, kube-proxy are internal.
The service graph's observed layer is produced by the agent keyed by raw
IP (the agent has no workload identity so it works on-prem). The operator
is the only producer that knows the IP -> workload mapping, so the graph
producer now ships each workload's deduped, sorted pod IPs. service-graph-api
builds an IP index from these and resolves observed traffic to the workload
ref. IPs are included in the change-detection hash so IP churn triggers a
re-upload.
On an overlay cluster the host-network agent observes node-level traffic
(pod-to-pod is VXLAN-encapsulated), so the observed layer is keyed by node
IPs. Declare each Node as a `node/<name>` vertex carrying its Internal +
External IPs (deduped, sorted) so service-graph-api resolves node-level
observed edges to named nodes instead of host/<ip>. Control-plane nodes are
flagged via the standard role labels. Node IPs feed the same change-detection
hash and IP index as pod workloads.
StatefulSet replicas (postgres core-0/1/2, kafka brokers) have stable
per-ordinal identities and talk to each OTHER — DB streaming replication,
inter-broker traffic. Keying them by the set name collapsed all replicas to
one ref, so that traffic became a self-loop and was invisible in the graph.
Key StatefulSet pods per-pod (core-0/core-1/core-2) so replicas are distinct
vertices and their mutual edges are real. Deployments stay aggregated.
…PodSet)

Kafka brokers are owned by Strimzi's StrimziPodSet, not a StatefulSet, so the
StatefulSet-only check missed them. Key per-pod for ANY controller whose pods
are <controller>-<ordinal> (StatefulSet, StrimziPodSet, ...), so kafka brokers
(core-kafka-0/1/2) also become distinct vertices. Deployment pods (random RS
suffix) still aggregate.
collectDeclaredEdges dropped the ports walkPolicyEdges already provides, so
graph DECLARED_EDGEs had no port. Emit one edge per (src,dst,port) ('*' when
the rule has no port restriction); declaredEdge gains Port and the version hash
includes it.
Attach each pod's labels (dropping k8s-internal churn keys like
pod-template-hash / controller-revision-hash) to the identity record — written
as a nested map in the MMDB baseline and carried in the delta upserts — so the
agent can evaluate identity.k8s.{src,dst}_label["<key>"] rules.
Mirror the identity delta producer for declared edges. The EdgeProducer becomes
informer-driven: on NetworkPolicy/Pod/Namespace change it emits an incremental
EdgeDelta (allow-list lines added/removed, tagged with epoch/seq for gap
detection) to download-api, alongside a periodic full allow-list baseline for
cold-start/resync. Factor the cluster-state reads into edgeInputs(), shared by
the baseline build (buildAndUpload) and the delta flush (flushDelta).

main.go wires the informer event handlers; tests cover the delta emission.
@pigri pigri force-pushed the feat/service-graph-workload-ips branch from 3d5d83f to 30e64f0 Compare July 3, 2026 06:05
@pigri pigri changed the title feat: operator context layer for the east-west service graph feat: east-west operator — service graph, per-pod identity + pod labels, declared-edge delta Jul 3, 2026
@pigri

pigri commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #47. Per the architecture decision, #47 adopts the context-only / agent-built-graph direction and now grafts in this PR's per-pod identity (StatefulSet/StrimziPodSet), pod labels, and event-driven edge-delta producer. #50's operator-side GraphProducer path is intentionally dropped (the agent builds the graph). Closing in favor of #47.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant