Extract variant branches into a generic extension mechanism#4871
Extract variant branches into a generic extension mechanism#4871caseydavenport wants to merge 29 commits into
Conversation
Add WithContext/ComponentHandlerOption to NewComponentHandler (variadic, backward-compatible) and call operator.ApplyPatches in CreateOrUpdateOrDelete for components implementing render.Named.
Pulls the enterprise RBAC extra-rules and MULTI_INTERFACE_MODE env branches out of pkg/render/typha.go into a new pkg/enterprise package. The enterprise package registers a patch via operator.Patch on startup; pkg/render/typha.go now has zero IsEnterprise branches.
Calls enterprise.Register() at startup so the typha modifier is wired in. Builds an operator.Context in the installation reconciler and passes it to the component handler so registered modifiers receive reconcile-derived state.
Extracts the image override registry into a leaf pkg/imageoverride package (no render/operator transitive deps) to avoid the render→operator import cycle. operator.OverrideImage/ResolveImage now delegate there. Registers the enterprise node image override in pkg/enterprise. Removes the IsEnterprise image switch from render/node.go; FIPS handling is preserved via a post-resolve check.
…sion The OSS installation controller no longer directly creates the node-prometheus keypair or fetches the prometheus/esgw certs. Those are now handled by a registered InstallationExtension in pkg/enterprise. Port value derivation and the kube-controller TLS block remain in the OSS controller unchanged.
Moves the calico-node-metrics Service out of OSS node render and into the enterprise node modifier, where it derives ports from ctx.FelixConfiguration. Also exports NodeBGPReporterPort so the modifier can reference it.
# Conflicts: # pkg/controller/installation/core_controller_test.go # pkg/controller/utils/component.go # pkg/render/node.go
The registry package is renamed to extensions. The installation controller builds the render context through a registered factory, and the componentHandler applies registered modifiers to component output. The node and typha variant branches now live in enterprise modifiers, and the calico log directory is mounted for both variants.
Drop the functional-options builder for Inputs (one call site, all fields always set) in favor of a plain struct literal, and replace the single-method RenderContextFactory interface with a registered builder func. All three extension seams now register a func.
Register modifiers, image overrides, and the render context builder per variant. The registries now gate on the installation variant, so the enterprise funcs drop their self-gate guards (the IsEnterprise checks the PR set out to remove) and the image override drops its decline bool - it only runs for its own variant. Move the node prometheus reporter keypair mounting (volume, mount, cert-management init container, pod hash annotation) into the node modifier, and remove NodeConfiguration.PrometheusServerTLS along with the round-trip through the installation controller. Core node render no longer carries a prometheus mount; in calico the keypair is never created. Rename Extensible.Name() to ModifierKey() so an unrelated Name() method can't make a component modifier-eligible by accident.
Merge the per-component image override and modifier into a single
Extension{Image, Modify} registered once per (variant, component) via
extensions.Register, so all of a component's variance lives in one place.
The image half still lands in the imageoverride leaf so render resolves it
without an import cycle; the fan-out is internal to Register.
Rename the variant-level render context builder to Setup
(RegisterSetup/RunSetup). That names the two phases a reader has to hold:
Setup is the controller-side work that builds the RenderContext baton, and
Extension hooks are the pure render-time funcs. Three registries with three
key schemes become two concepts split by when they run.
modCtx read like "modifier context"; the value is an extensions.RenderContext, so name it for what it is.
Add a package doc that lays out the two-phase model (Setup vs Extension) so the whole seam is legible from `go doc`. Fix two comments that still called the setup a render context builder.
Add a per-component context channel: a component implements render.ExtensionContextProvider to hand its modifier config a modifier can't derive from the shared RenderContext (config only the component's controller has). The componentHandler reads it into RenderContext.Component before applying the modifier. node's setup-produced keypair keeps its own field; this is for component-config-derived inputs. Move windows's enterprise branches into a pkg/enterprise extension: the two windows image overrides, the node-metrics Service, the calico log volume (swapped in for the OSS cni-log mount), the enterprise felix env, the trusted DNS servers for openshift/rke2, and the prometheus reporter keypair mount. The windows component exposes its reporter port, keypair, and trusted bundle via ExtensionContext; the windows controller wires the render context into its handler. Core windows render is now OSS-only.
Move guardian's enterprise branches out of core render into a pkg/enterprise extension: the secrets Role/RoleBinding and default UI settings, the elasticsearch/kibana service ports, the management-cluster-request cluster role rules (which wholly replace the OSS rules, plus impersonation and the OpenShift SCC), and the CA bundle env vars. rulesForManagementClusterRequests moves to enterprise; the UI-settings helpers are exported from the manager render so the modifier can build them. The guardian component exposes its OpenShift flag, impersonation config, and trusted bundle path via ExtensionContext; the clusterconnection controller wires the render context into its handler. Core guardian render is now OSS-only except the network policy, which is the next commit.
Move the enterprise guardian network policy out of core render. GuardianPolicy is now an extensible component that renders the OSS policy; the enterprise modifier replaces it with the management-cluster policy, built entirely from the component's ExtensionContext (URL, pod proxies, OpenShift, egress-policy flag). Building the egress rules can fail on proxy URL parsing - the modifier drops the policy on failure, matching the old behavior of omitting it rather than installing a partial one. guardian.go now carries no IsEnterprise branches.
The apiserver was the last render component with IsEnterprise branches. Its enterprise behavior (query server container, audit logging, the tigera-* RBAC, UI settings roles, managed cluster access) moves into pkg/enterprise as the apiserver modifier, leaving render with only the shared objects. The deployment-existence check no longer keys off the variant. The controller sets RequiresQueryServer (true for Enterprise) and render branches on RequiresAggregationServer || RequiresQueryServer, so the variant decision lives in the controller rather than render. The apiserver also needs to delete enterprise objects when switching to Calico, and to drop the management-cluster RBAC when not a management cluster, so Modifier now takes and returns both the create and delete lists. The other modifiers pass the delete list through unchanged.
…sions Route the new cni-plugins init container image through the override registry instead of an IsEnterprise branch, so node stays de-varianted. Carry master's Network RBAC additions into the relocated enterprise apiserver cluster roles.
| // Wire the in-repo Calico Enterprise extensions (the render context factory, | ||
| // modifiers, and image overrides) into the operator registries. After the | ||
| // monorepo split this call moves to calico-private's main. | ||
| enterprise.Register() |
There was a problem hiding this comment.
I think I'd prefer a pattern where we pass in the registration funcs as an argument / using an options pattern to the constructor for the operator.
o := operator.New(operator.WithExtensions(enterprise.New()))^ Pseudocode, but something like that
| // The render context carries the installation so registered modifiers gate on | ||
| // variant; the guardian component supplies its own per-component context (the | ||
| // impersonation config, OpenShift, and CA bundle path) via ExtensionContext. |
There was a problem hiding this comment.
| // The render context carries the installation so registered modifiers gate on | |
| // variant; the guardian component supplies its own per-component context (the | |
| // impersonation config, OpenShift, and CA bundle path) via ExtensionContext. |
Overcommenting.
| // The render context carries the installation so registered modifiers gate on | ||
| // variant; the guardian component supplies its own per-component context (the | ||
| // impersonation config, OpenShift, and CA bundle path) via ExtensionContext. | ||
| ch := utils.NewComponentHandler(log, r.cli, r.scheme, managementClusterConnection, utils.WithRenderContext(extensions.RenderContext{Installation: installationSpec})) |
There was a problem hiding this comment.
Split this out so each arg is on its own line - it's a bit too long as is.
| logf.SetLogger(zap.New(zap.WriteTo(ginkgo.GinkgoWriter))) | ||
| gomega.RegisterFailHandler(ginkgo.Fail) | ||
|
|
||
| // Wire the enterprise extensions so the guardian modifier runs the way it |
There was a problem hiding this comment.
I think as part of this, we should refactor the tests so that we have separate *_enterprise_test.go files.
There was a problem hiding this comment.
And perhaps specific *_extensions_test.go files that exercise generic extension mechanism machinery.
| // This is separate from the calico/node prometheus metrics port, which is user configurable. | ||
| // defaultNodeReporterPort is the default port calico/node uses to report Calico | ||
| // Enterprise internal metrics. The Linux node path derives this in the | ||
| // enterprise node modifier; this copy serves the Windows controller, which |
There was a problem hiding this comment.
We shouldn't have any enterprise references in the non-enterprise code. Including here, and the implied windows controller.
| // that flow straight into the RenderContext and the side-effecting dependencies | ||
| // (Client, CertificateManager) a setup needs to produce controller-side | ||
| // artifacts. | ||
| type Inputs struct { |
There was a problem hiding this comment.
I'm not entirely sure why we need a seaprate Inputs and RenderContext struct - they carry most of the same fields?
| // Register stores fn under key for the given variant. The key is the render | ||
| // component's image identifier (e.g. "node"). | ||
| func Register(variant operatorv1.ProductVariant, key string, fn Override) { | ||
| registry[overrideKey{variant, key}] = fn |
There was a problem hiding this comment.
Same here - instead of a global map, we can pass an image "registry" as input to the constructor.
|
|
||
| // Component names used as keys into the extension modifier registry. Keep these | ||
| // in sync with the ModifierKey() methods that return them. | ||
| const ( |
There was a problem hiding this comment.
Feel like this string key pattern is a bit inelegant. But perhaps nothing else we can do here that's better.
| g, err := render.GuardianPolicy(cfg) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| resources, _ = g.Objects() | ||
| objs, _ := g.Objects() |
There was a problem hiding this comment.
Yeah, enterprise tests in their own file would be nice. Probably even belongs in their own package, tbh.
| // - 1 Service to expose calico/node metrics. | ||
| // - 1 Service to expose Windows calico/node metrics. | ||
| // For this scenario, we expect the basic resources plus the following for Tigera Secure. | ||
| // The calico/node and Windows calico/node metrics Services are added by the |
There was a problem hiding this comment.
No need for references to enterprise in OSS code.
The goal here is to make a clean split between the enterprise and OSS code paths so they are fully decoupled.
Trim over-commenting on the component handler calls and expand the ones threading a render context to one arg per line. Rename RunSetup to BuildContext, drop the nodePrometheusTLS temp var, and log a BUG on a failed per-component context assertion instead of failing silently.
The variant modifiers, image overrides, and setups lived in package-level registries that enterprise populated via an init-style Register() at startup. Replace those with an extensions.Set value that main builds (enterprise.New()) and hands to controllers through ControllerOptions. Each controller threads it into its component handler and the node/windows render configs, so nothing is wired by import side effect and the core operator can run with no Set. A handler that renders an extensible component without being given the Set logs a BUG, so a missing wiring is loud rather than a silent no-op. Rename imageoverride.Registry to imageoverride.Overrides so it doesn't read as the container image registry.
The installation, windows, and clusterconnection reconcilers copied a handful of ControllerOptions values into their own struct fields. Drop those and read them off r.opts so the options live in one place. Also removes the dead kubernetesVersion field on the installation reconciler.
This is phase 1 of prepping the operator for the monorepo merge. The goal is to get the variant branches (the
IsEnterprise()checks) out of the core operator render code and into a generic extension mechanism, so enterprise can self-register its modifications instead of core code having to know about them.Description
Adds the extension primitives in
pkg/extensions:pkg/imageoverrideleaf package to avoid an import cycle;pkg/extensionsdelegates to it.RenderContexthanded to modifiers through a registered factory, which also does the per-controller side-effecting work (cert creation, bundle assembly, validation). The core operator default does nothing variant-specific.Enterprise wiring lives in
pkg/enterprise, which self-registers all of this throughenterprise.Register().De-varianting progress
The metric is zero
IsEnterprise()branches left in the core render/controller code for each component - they all move intopkg/enterprise:typha and node are done. Image selection routes through the override registry, and all of node's variant branches (RBAC for node and the CNI plugin, volumes, CNI env, Felix env, readiness probe, the metrics service, and the prometheus cert/bundle setup) now live in enterprise modifiers. The calico log directory is mounted for both variants so the modifier doesn't have to special-case it.
Still deferred
Behavior is preserved. The test gate is the union of the existing core tests plus the relocated enterprise assertions, all passing.