feat(applicationlayer): WAF v3 (Coraza WASM) render — kube-controllers plumbing + admission webhook#4821
Conversation
4c8c92b to
46b19ae
Compare
96a55a7 to
2f38001
Compare
1690e71 to
2f38001
Compare
|
Nit: It would be possible to cut down on the code comments. Especially when the comments don't add any information you couldn't get directly from reading the code. |
| variant: enterpriseVariant, | ||
| } | ||
| {{- end }} | ||
| {{ with index .Components "coraza-wasm" }} |
There was a problem hiding this comment.
Can you remind me if we could put this in an existing image and then change the run cmd when it is used. What was the motivation for an extra image again?
There was a problem hiding this comment.
we needed this extra image because this is an oci image that the envoy gateway controller consumes and distributes it out to gateway proxies. it's a scratch oci image with just 1 file, a .wasm file which contains coraza engine + rules. needs to be small so it gets distributed fast, i think we set our benchmark to be roughly 30mb, we're well under that i think
There was a problem hiding this comment.
It adds extra maintenance on our side and the customer when we add extra images. We could add the WASM image to an existing image like envoyproxy and eliminate needing to pull anything altogether, which would result in better performance. File loading is also supported, we could even point to a file path, rather than an image path.
…rollers component Review feedback (rene-dekker, tigera#4821): - Move the webhook Service + ValidatingWebhookConfiguration out of the core controller passthrough into the kube-controllers component, so the objects are emitted as objectsToDelete when the WAF extension is disabled or the GatewayAPI CR is removed. Add deletion test coverage. - Export WAFWebhookContainerPort from pkg/render/applicationlayer and use it for the container port and NetworkPolicy ingress rule instead of duplicated 9443 constants. - Use gatewayapi.GetGatewayAPI for the WAF gate so the legacy tigera-secure CR name is handled (and a default/tigera-secure duplicate degrades). - Drop the nil-guard around the WAF webhook KeyPairOption (the certificate-management render skips nil key pairs). - Log when multiple imagePullSecrets are configured and only the first is used for the WAF wasm OCI pull.
Review feedback (rene-dekker, tigera#4821): nothing consumes the constant, and the calico-private side has since moved from the ingress-gateway-addons feature gate to a binary license-validity check (kube-controllers applicationlayer LicenseGate), so the feature string has no remaining consumer. Also take the iff->if comment suggestion.
…AF wasm pull secret Review feedback (rene-dekker, tigera#4821): rather than copying only the first Installation pull secret into tigera-waf-pull-secret, merge the registry auths of every Installation pull secret into it. The EnvoyExtensionPolicy image source takes a single pullSecretRef, so a merged secret is the only way to honor multiple pull secrets for the Coraza wasm OCI pull (e.g. the Tigera pull secret plus credentials for a private registry mirror). First secret in Installation order wins on duplicate registry entries; the merged map marshals with sorted keys so the rendered bytes are deterministic across reconciles. Unparseable secrets are skipped and logged rather than failing the reconcile. Legacy dockercfg-type secrets are supported.
5912b6e to
5eaab67
Compare
Add spec.extensions.waf.state (+ IsWAFGatewayExtensionEnabled helper) to the GatewayAPI CR to gate the WAF v3 (Gateway API add-on) surface, default-off. Regenerate deepcopy + CRD manifest. Refs EV-6657
Render the WAF v3 (Coraza WASM) surface on calico-kube-controllers, gated on the GatewayAPI WAF extension: - WASM_IMAGE/WASM_PULL_SECRET/WASM_CA_CERT env, ENABLED_CONTROLLERS, reconciler RBAC (wafpolicies/plugins, EnvoyExtensionPolicy, events, secret replication), coraza-wasm component (config/enterprise_versions.yml + gen-versions template + generated enterprise.go) + GatewayAddonsFeature constant. - In-process WAF SecLang validating admission webhook: a Service fronting the kube-controllers Pod + ValidatingWebhookConfiguration (wafplugins/wafpolicies, /validate-waf, FailurePolicy=Fail, caBundle=operator CA); the serving-cert mount + WAF_WEBHOOK_CERT_DIR env + container port 9443; and namespaces patch/update RBAC for the waf-id-range annotation. Refs EV-6657
…n controller Gate on GatewayAPI.spec.extensions.waf.state, issue the webhook serving cert for the tigera-waf-webhook Service DNS (materialized into calico-system via the existing CertificateManagement render), thread it into the kube-controllers config, and render the webhook Service + ValidatingWebhookConfiguration. Refs EV-6657
…k :9443 (EV-6386) The WAF admission webhook serves on :9443 in kube-controllers, but the calico-system kube-controllers NetworkPolicy only allowed ingress on :9094, so default-deny dropped the apiserver -> webhook request and the ValidatingWebhook timed out (failurePolicy=Fail) -- blocking all WAFPolicy/WAFPlugin writes. Add a :9443 ingress rule, gated on the GatewayAPI WAF extension being enabled.
…ull (EV-6386) The WAF reconciler replicated WASM_PULL_SECRET (the install pull secret, tigera-pull-secret) into tenant namespaces, but the GatewayAPI render also copies tigera-pull-secret there (operator-managed) so the replica conflicts (ReplicaUnmanaged) and WAFPolicies are blocked. Provision + replicate a dedicated tigera-waf-pull-secret (renamed copy of the install pull secret) instead, avoiding the clash.
… (EV-6386) Symmetric to the tigera-waf-pull-secret fix: WASM_CA_CERT pointed at the operator-managed tigera-ca-bundle, which the GatewayAPI render also copies into tenant namespaces, so the WAF reconciler's replica clashed (ReplicaUnmanaged). Provision + replicate a dedicated tigera-waf-ca-bundle copy instead.
…rollers component Review feedback (rene-dekker, tigera#4821): - Move the webhook Service + ValidatingWebhookConfiguration out of the core controller passthrough into the kube-controllers component, so the objects are emitted as objectsToDelete when the WAF extension is disabled or the GatewayAPI CR is removed. Add deletion test coverage. - Export WAFWebhookContainerPort from pkg/render/applicationlayer and use it for the container port and NetworkPolicy ingress rule instead of duplicated 9443 constants. - Use gatewayapi.GetGatewayAPI for the WAF gate so the legacy tigera-secure CR name is handled (and a default/tigera-secure duplicate degrades). - Drop the nil-guard around the WAF webhook KeyPairOption (the certificate-management render skips nil key pairs). - Log when multiple imagePullSecrets are configured and only the first is used for the WAF wasm OCI pull.
Review feedback (rene-dekker, tigera#4821): nothing consumes the constant, and the calico-private side has since moved from the ingress-gateway-addons feature gate to a binary license-validity check (kube-controllers applicationlayer LicenseGate), so the feature string has no remaining consumer. Also take the iff->if comment suggestion.
…AF wasm pull secret Review feedback (rene-dekker, tigera#4821): rather than copying only the first Installation pull secret into tigera-waf-pull-secret, merge the registry auths of every Installation pull secret into it. The EnvoyExtensionPolicy image source takes a single pullSecretRef, so a merged secret is the only way to honor multiple pull secrets for the Coraza wasm OCI pull (e.g. the Tigera pull secret plus credentials for a private registry mirror). First secret in Installation order wins on duplicate registry entries; the merged map marshals with sorted keys so the rendered bytes are deterministic across reconciles. Unparseable secrets are skipped and logged rather than failing the reconcile. Legacy dockercfg-type secrets are supported.
…nfigMap (EV-6386) The WAF reconciler replicates the WASM_CA_CERT ConfigMap (tigera-waf-ca-bundle) into tenant namespaces for the Coraza wasm registry TLS check, but the source copy was never created (left as a TODO), so reconcile failed with 'source configmap calico-system/tigera-waf-ca-bundle not found'. Provision it in the core controller as a renamed copy of the trusted CA bundle -- the full TrustedBundle is available there, unlike the read-only interface the kube-controllers render sees. Gate WASM_CA_CERT on the provisioned ConfigMap.
5eaab67 to
2af9beb
Compare
…proxy image (EV-6386) The Coraza WAF wasm is baked into the gateway envoy-proxy image (its final layer), so there is no separate coraza-wasm image to ship. Resolve WASM_IMAGE from ComponentGatewayAPIEnvoyProxy -- the same image the gateway data plane already runs -- and drop the standalone ComponentCorazaWASM component and its enterprise_versions.yml pin. Addresses review feedback on op#4821.
…(EV-6386) Completes the standalone coraza-wasm removal: the gen-versions template still defined ComponentCorazaWASM, so gen-versions regenerated it into enterprise.go and validate-gen-versions/dirty-check failed. The wasm now ships baked into the envoy-proxy image, resolved via ComponentGatewayAPIEnvoyProxy.
Summary
Operator-side render for WAF v3 (Coraza WASM) on
calico-kube-controllers. Pairs with the merged reconcilers (tigera/calico-private#11834) and the in-process SecLang validating admission webhook (tigera/calico-private#12141, EV-6657). Design:tigera/designs#25(PMREQ-384).Review scope — 3 commits, ~470 lines:
feat(api): add GatewayAPI WAF extension gating fieldGatewayAPI.spec.extensions.waf.state(Enabled/Disabled, default off) + deepcopy + CRDfeat(applicationlayer): render WAF v3 + in-process admission webhookgateway_waf.go(webhook Service + VWC), kube-controllers env/RBAC/cert/port,coraza-wasmcomponentfeat(applicationlayer): wire WAF v3 render + webhook into installation controllerEverything is gated on
GatewayAPI.spec.extensions.waf.state == Enabled— non-WAF / OSS / WAF-disabled installs render a byte-identical kube-controllers Deployment.WAF SecLang admission webhook — in-process (calico-kube-controllers)
The webhook runs in-process inside the existing
calico-kube-controllersPod (the applicationlayer manager's webhook server) — not a standalone Deployment.pkg/render/applicationlayer/gateway_waf.gorenders only:Service(tigera-waf-webhook) fronting the kube-controllers Pod (:443 → :9443), andValidatingWebhookConfigurationintercepting CREATE/UPDATE onwafplugins+wafpoliciesat/validate-waf,failurePolicy: Fail(the reconciler backstop is status-only, so the webhook is the hard admission gate), caBundle = the operator CA.No dedicated ServiceAccount/ClusterRole/ClusterRoleBinding/Deployment — the webhook reuses the kube-controllers ServiceAccount + ClusterRole.
kube-controllers plumbing (gated on
GatewayAPI.spec.extensions.waf.state == Enabled)WASM_IMAGE/WASM_PULL_SECRET/WASM_CA_CERTenv +ENABLED_CONTROLLERS=applicationlayer(names verified against the merged reconcilers'manager.go).applicationlayer.projectcalico.orgresources +/status+/finalizers, EnvoyExtensionPolicy CRUD, events (core +events.k8s.io), secret/ConfigMap replication, Gateway/HTTPRoute reads for targetRef validation,namespacespatch/update for the upcoming per-namespacewaf-id-rangeannotation.WAFWebhookServerTLS, issued fortigera-waf-webhook.calico-system.svc) mounted into the Pod;WAF_WEBHOOK_CERT_DIRenv; container port9443.Controller wiring (installation core controller)
Issues the serving cert via CertificateManager, materializes it into
calico-systemthrough the existing CertificateManagement render, threads it into the kube-controllers config, and renders the webhook Service + ValidatingWebhookConfiguration — all behind the WAF-enabled gate.Test plan
go test ./pkg/render/applicationlayer/... ./pkg/render/kubecontrollers/...— webhook contract (resources/path/failurePolicy/caBundle), kube-controllers cert mount + env + port, RBAC.go build ./...,go vet,gofmtclean; full Operator CI green (2424 tests).*/finalizersfor OCPOwnerReferencesPermissionEnforcement).Release Note
Linked
tigera/designs#25(PMREQ-384)tigera/calico-private#11834(merged)tigera/calico-private#12141(EV-6657, open — must merge before this PR)tigera/calico-private#12215(merged)