Skip to content

Extract variant branches into a generic extension mechanism#4871

Open
caseydavenport wants to merge 29 commits into
tigera:masterfrom
caseydavenport:casey-variant-extensions
Open

Extract variant branches into a generic extension mechanism#4871
caseydavenport wants to merge 29 commits into
tigera:masterfrom
caseydavenport:casey-variant-extensions

Conversation

@caseydavenport

@caseydavenport caseydavenport commented May 29, 2026

Copy link
Copy Markdown
Member

This is phase 1 of prepping the operator for the monorepo merge. The goal is to get the variant branches (the IsEnterprise() checks) out of the core operator render code and into a generic extension mechanism, so enterprise can self-register its modifications instead of core code having to know about them.

Description

Adds the extension primitives in pkg/extensions:

  • A modifier registry - enterprise registers a function keyed by component name, applied at the componentHandler chokepoint after a component's objects are built.
  • An image-override registry - lets enterprise swap the image a component resolves without core branching. It lives in a small pkg/imageoverride leaf package to avoid an import cycle; pkg/extensions delegates to it.
  • A render context factory - the installation controller builds the RenderContext handed to modifiers through a registered factory, which also does the per-controller side-effecting work (cert creation, bundle assembly, validation). The core operator default does nothing variant-specific.

Enterprise wiring lives in pkg/enterprise, which self-registers all of this through enterprise.Register().

De-varianting progress

The metric is zero IsEnterprise() branches left in the core render/controller code for each component - they all move into pkg/enterprise:

  • typha
  • node
  • guardian
  • windows
  • apiserver

typha and node are done. Image selection routes through the override registry, and all of node's variant branches (RBAC for node and the CNI plugin, volumes, CNI env, Felix env, readiness probe, the metrics service, and the prometheus cert/bundle setup) now live in enterprise modifiers. The calico log directory is mounted for both variants so the modifier doesn't have to special-case it.

Still deferred

  • The kube-controllers TLS setup and the remaining value-selection branches in the installation controller.

Behavior is preserved. The test gate is the union of the existing core tests plus the relocated enterprise assertions, all passing.

None

Add WithContext/ComponentHandlerOption to NewComponentHandler (variadic,
backward-compatible) and call operator.ApplyPatches in
CreateOrUpdateOrDelete for components implementing render.Named.
Pulls the enterprise RBAC extra-rules and MULTI_INTERFACE_MODE env branches out of pkg/render/typha.go into a new pkg/enterprise package. The enterprise package registers a patch via operator.Patch on startup; pkg/render/typha.go now has zero IsEnterprise branches.
Calls enterprise.Register() at startup so the typha modifier is wired in.
Builds an operator.Context in the installation reconciler and passes it to
the component handler so registered modifiers receive reconcile-derived state.
Extracts the image override registry into a leaf pkg/imageoverride
package (no render/operator transitive deps) to avoid the render→operator
import cycle. operator.OverrideImage/ResolveImage now delegate there.
Registers the enterprise node image override in pkg/enterprise. Removes
the IsEnterprise image switch from render/node.go; FIPS handling is
preserved via a post-resolve check.
…sion

The OSS installation controller no longer directly creates the node-prometheus
keypair or fetches the prometheus/esgw certs. Those are now handled by a
registered InstallationExtension in pkg/enterprise. Port value derivation and
the kube-controller TLS block remain in the OSS controller unchanged.
Moves the calico-node-metrics Service out of OSS node render and into
the enterprise node modifier, where it derives ports from
ctx.FelixConfiguration. Also exports NodeBGPReporterPort so the modifier
can reference it.
# Conflicts:
#	pkg/controller/installation/core_controller_test.go
#	pkg/controller/utils/component.go
#	pkg/render/node.go
Comment thread pkg/controller/utils/component.go Outdated
Comment thread pkg/enterprise/installation.go Outdated
Comment thread pkg/operator/context.go Outdated
Comment thread pkg/operator/extension.go Outdated
Comment thread pkg/operator/extension.go Outdated
Comment thread pkg/enterprise/installation.go Outdated
Comment thread pkg/operator/patch.go Outdated
Comment thread pkg/enterprise/node.go
Comment thread pkg/render/enterprise_setup_test.go Outdated
Comment thread cmd/main.go Outdated
The registry package is renamed to extensions. The installation controller builds the render context through a registered factory, and the componentHandler applies registered modifiers to component output. The node and typha variant branches now live in enterprise modifiers, and the calico log directory is mounted for both variants.
Drop the functional-options builder for Inputs (one call site, all
fields always set) in favor of a plain struct literal, and replace the
single-method RenderContextFactory interface with a registered builder
func. All three extension seams now register a func.
Register modifiers, image overrides, and the render context builder per
variant. The registries now gate on the installation variant, so the
enterprise funcs drop their self-gate guards (the IsEnterprise checks the
PR set out to remove) and the image override drops its decline bool - it
only runs for its own variant.

Move the node prometheus reporter keypair mounting (volume, mount,
cert-management init container, pod hash annotation) into the node
modifier, and remove NodeConfiguration.PrometheusServerTLS along with the
round-trip through the installation controller. Core node render no longer
carries a prometheus mount; in calico the keypair is never created.

Rename Extensible.Name() to ModifierKey() so an unrelated Name() method
can't make a component modifier-eligible by accident.
Merge the per-component image override and modifier into a single
Extension{Image, Modify} registered once per (variant, component) via
extensions.Register, so all of a component's variance lives in one place.
The image half still lands in the imageoverride leaf so render resolves it
without an import cycle; the fan-out is internal to Register.

Rename the variant-level render context builder to Setup
(RegisterSetup/RunSetup). That names the two phases a reader has to hold:
Setup is the controller-side work that builds the RenderContext baton, and
Extension hooks are the pure render-time funcs. Three registries with three
key schemes become two concepts split by when they run.
modCtx read like "modifier context"; the value is an extensions.RenderContext,
so name it for what it is.
Add a package doc that lays out the two-phase model (Setup vs Extension)
so the whole seam is legible from `go doc`. Fix two comments that still
called the setup a render context builder.
Add a per-component context channel: a component implements
render.ExtensionContextProvider to hand its modifier config a modifier
can't derive from the shared RenderContext (config only the component's
controller has). The componentHandler reads it into RenderContext.Component
before applying the modifier. node's setup-produced keypair keeps its own
field; this is for component-config-derived inputs.

Move windows's enterprise branches into a pkg/enterprise extension: the two
windows image overrides, the node-metrics Service, the calico log volume
(swapped in for the OSS cni-log mount), the enterprise felix env, the
trusted DNS servers for openshift/rke2, and the prometheus reporter keypair
mount. The windows component exposes its reporter port, keypair, and trusted
bundle via ExtensionContext; the windows controller wires the render context
into its handler. Core windows render is now OSS-only.
Move guardian's enterprise branches out of core render into a pkg/enterprise
extension: the secrets Role/RoleBinding and default UI settings, the
elasticsearch/kibana service ports, the management-cluster-request cluster
role rules (which wholly replace the OSS rules, plus impersonation and the
OpenShift SCC), and the CA bundle env vars. rulesForManagementClusterRequests
moves to enterprise; the UI-settings helpers are exported from the manager
render so the modifier can build them.

The guardian component exposes its OpenShift flag, impersonation config, and
trusted bundle path via ExtensionContext; the clusterconnection controller
wires the render context into its handler. Core guardian render is now
OSS-only except the network policy, which is the next commit.
Move the enterprise guardian network policy out of core render. GuardianPolicy
is now an extensible component that renders the OSS policy; the enterprise
modifier replaces it with the management-cluster policy, built entirely from the
component's ExtensionContext (URL, pod proxies, OpenShift, egress-policy flag).
Building the egress rules can fail on proxy URL parsing - the modifier drops the
policy on failure, matching the old behavior of omitting it rather than
installing a partial one. guardian.go now carries no IsEnterprise branches.
The apiserver was the last render component with IsEnterprise branches. Its
enterprise behavior (query server container, audit logging, the tigera-* RBAC,
UI settings roles, managed cluster access) moves into pkg/enterprise as the
apiserver modifier, leaving render with only the shared objects.

The deployment-existence check no longer keys off the variant. The controller
sets RequiresQueryServer (true for Enterprise) and render branches on
RequiresAggregationServer || RequiresQueryServer, so the variant decision lives
in the controller rather than render.

The apiserver also needs to delete enterprise objects when switching to Calico,
and to drop the management-cluster RBAC when not a management cluster, so
Modifier now takes and returns both the create and delete lists. The other
modifiers pass the delete list through unchanged.
@radTuti radTuti modified the milestones: v1.43.0, v1.44.0 Jun 12, 2026
@caseydavenport caseydavenport marked this pull request as ready for review June 17, 2026 14:35
@caseydavenport caseydavenport requested a review from a team as a code owner June 17, 2026 14:35
…sions

Route the new cni-plugins init container image through the override
registry instead of an IsEnterprise branch, so node stays de-varianted.
Carry master's Network RBAC additions into the relocated enterprise
apiserver cluster roles.
Comment thread cmd/main.go Outdated
// Wire the in-repo Calico Enterprise extensions (the render context factory,
// modifiers, and image overrides) into the operator registries. After the
// monorepo split this call moves to calico-private's main.
enterprise.Register()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer a pattern where we pass in the registration funcs as an argument / using an options pattern to the constructor for the operator.

o := operator.New(operator.WithExtensions(enterprise.New()))

^ Pseudocode, but something like that

Comment on lines +447 to +449
// The render context carries the installation so registered modifiers gate on
// variant; the guardian component supplies its own per-component context (the
// impersonation config, OpenShift, and CA bundle path) via ExtensionContext.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The render context carries the installation so registered modifiers gate on
// variant; the guardian component supplies its own per-component context (the
// impersonation config, OpenShift, and CA bundle path) via ExtensionContext.

Overcommenting.

// The render context carries the installation so registered modifiers gate on
// variant; the guardian component supplies its own per-component context (the
// impersonation config, OpenShift, and CA bundle path) via ExtensionContext.
ch := utils.NewComponentHandler(log, r.cli, r.scheme, managementClusterConnection, utils.WithRenderContext(extensions.RenderContext{Installation: installationSpec}))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this out so each arg is on its own line - it's a bit too long as is.

logf.SetLogger(zap.New(zap.WriteTo(ginkgo.GinkgoWriter)))
gomega.RegisterFailHandler(ginkgo.Fail)

// Wire the enterprise extensions so the guardian modifier runs the way it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as part of this, we should refactor the tests so that we have separate *_enterprise_test.go files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps specific *_extensions_test.go files that exercise generic extension mechanism machinery.

// This is separate from the calico/node prometheus metrics port, which is user configurable.
// defaultNodeReporterPort is the default port calico/node uses to report Calico
// Enterprise internal metrics. The Linux node path derives this in the
// enterprise node modifier; this copy serves the Windows controller, which

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have any enterprise references in the non-enterprise code. Including here, and the implied windows controller.

Comment thread pkg/extensions/setup.go
// that flow straight into the RenderContext and the side-effecting dependencies
// (Client, CertificateManager) a setup needs to produce controller-side
// artifacts.
type Inputs struct {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why we need a seaprate Inputs and RenderContext struct - they carry most of the same fields?

Comment thread pkg/imageoverride/imageoverride.go Outdated
// Register stores fn under key for the given variant. The key is the render
// component's image identifier (e.g. "node").
func Register(variant operatorv1.ProductVariant, key string, fn Override) {
registry[overrideKey{variant, key}] = fn

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - instead of a global map, we can pass an image "registry" as input to the constructor.

Comment thread pkg/render/component.go

// Component names used as keys into the extension modifier registry. Keep these
// in sync with the ModifierKey() methods that return them.
const (

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like this string key pattern is a bit inelegant. But perhaps nothing else we can do here that's better.

g, err := render.GuardianPolicy(cfg)
Expect(err).NotTo(HaveOccurred())
resources, _ = g.Objects()
objs, _ := g.Objects()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, enterprise tests in their own file would be nice. Probably even belongs in their own package, tbh.

Comment thread pkg/render/render_test.go
// - 1 Service to expose calico/node metrics.
// - 1 Service to expose Windows calico/node metrics.
// For this scenario, we expect the basic resources plus the following for Tigera Secure.
// The calico/node and Windows calico/node metrics Services are added by the

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for references to enterprise in OSS code.

The goal here is to make a clean split between the enterprise and OSS code paths so they are fully decoupled.

Trim over-commenting on the component handler calls and expand the ones
threading a render context to one arg per line. Rename RunSetup to
BuildContext, drop the nodePrometheusTLS temp var, and log a BUG on a
failed per-component context assertion instead of failing silently.
The variant modifiers, image overrides, and setups lived in package-level
registries that enterprise populated via an init-style Register() at startup.
Replace those with an extensions.Set value that main builds (enterprise.New())
and hands to controllers through ControllerOptions. Each controller threads it
into its component handler and the node/windows render configs, so nothing is
wired by import side effect and the core operator can run with no Set.

A handler that renders an extensible component without being given the Set logs
a BUG, so a missing wiring is loud rather than a silent no-op.

Rename imageoverride.Registry to imageoverride.Overrides so it doesn't read as
the container image registry.
The installation, windows, and clusterconnection reconcilers copied a handful
of ControllerOptions values into their own struct fields. Drop those and read
them off r.opts so the options live in one place. Also removes the dead
kubernetesVersion field on the installation reconciler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants