OCPBUGS-84321: ClusterAPIMachineManagement: capi-controllers: excessive restarts of during cluster installation by sg00dwin · Pull Request #614 · openshift/cluster-capi-operator

sg00dwin · 2026-07-02T20:46:30Z

Summary

Move the capi-controllers Deployment from CVO static manifests to the operator's manifests-gen pipeline at installOrder 30
This ensures all provider CRDs are installed and confirmed Established before capi-controllers starts, eliminating ~10 crash-restart cycles during cluster installation
Add a CVO tombstone to delete the old Deployment on upgrade

Problem

During cluster installation, CVO deploys the capi-controllers Deployment simultaneously with the capi-operator installer. The installer needs time to install provider CRDs (e.g. AWSCluster, Cluster), but capi-controllers starts immediately and crashes because those CRDs don't exist yet. Both containers (capi-controllers and machine-api-migration) exit with os.Exit(1) and restart ~10 times before the CRDs are eventually available.

Solution

The operator's Boxcutter installer uses probe-gated sequencing — components at a higher installOrder only deploy after earlier components are confirmed ready. By moving the Deployment to installOrder 30 (after core CRDs at 10 and providers at 20), the controllers only start once all required CRDs exist.

No changes to controller logic. The fix is purely deployment ordering.

Changes

File	Change
`admission-policies/default/capi-controllers-deployment.yaml`	New — Deployment YAML as manifests-gen kustomize input
`admission-policies/default/kustomization.yaml`	Added the new resource
`Makefile`	Added `--self-image-ref` for runtime image substitution
`capi-operator-manifests/default/manifests.yaml`	Regenerated — now includes the Deployment
`capi-operator-manifests/default/metadata.yaml`	Regenerated — now includes selfImageRef
`manifests/0000_30_cluster-api_17_deployment.yaml`	Deleted — CVO no longer manages this
`manifests/0000_30_cluster-api_00_tombstones-5.0.yaml`	New — CVO tombstone for upgrade cleanup

Upgrade behavior

On upgrade from a release where CVO owns the Deployment:

CVO processes the tombstone and deletes its old copy
The operator creates the new operator-managed copy at installOrder 30

Brief pod downtime during ownership transfer is expected and acceptable.

Assisted-by: Claude (Opus 4.6)

Summary by CodeRabbit

New Features
- Added the capi-controllers controller Deployment to the default install manifests.
- Updated manifest generation so the controller Deployment uses the correct self image reference and is packaged with the release.
Bug Fixes
- Improved the controller-related release/admission configuration by updating the capi-controllers deployment admission policy and aligning controller metadata/behavior for install and uninstall consistency.

…during cluster installation Move capi-controllers Deployment from CVO to operator The capi-controllers Deployment was deployed by CVO as a static manifest, starting simultaneously with the installer. Since the installer needs time to install provider CRDs, capi-controllers would crash repeatedly until they became available. Move the Deployment to the operator's manifests-gen pipeline at installOrder 30, where probe-gated sequencing ensures all CRDs are Established before the Deployment is created. Add a tombstone so CVO deletes its old copy on upgrade. Assisted-by: Claude (Opus 4.6)

openshift-merge-bot · 2026-07-02T20:46:33Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-07-02T20:46:37Z

@sg00dwin: This pull request references Jira Issue OCPBUGS-84321, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Move the capi-controllers Deployment from CVO static manifests to the operator's manifests-gen pipeline at installOrder 30

This ensures all provider CRDs are installed and confirmed Established before capi-controllers starts, eliminating ~10 crash-restart cycles during cluster installation

Add a CVO tombstone to delete the old Deployment on upgrade

Problem

During cluster installation, CVO deploys the capi-controllers Deployment simultaneously with the capi-operator installer. The installer needs time to install provider CRDs (e.g. AWSCluster, Cluster), but capi-controllers starts immediately and crashes because those CRDs don't exist yet. Both containers (capi-controllers and machine-api-migration) exit with os.Exit(1) and restart ~10 times before the CRDs are eventually available.

Solution

The operator's Boxcutter installer uses probe-gated sequencing — components at a higher installOrder only deploy after earlier components are confirmed ready. By moving the Deployment to installOrder 30 (after core CRDs at 10 and providers at 20), the controllers only start once all required CRDs exist.

No changes to controller logic. The fix is purely deployment ordering.

Changes

File Change

admission-policies/default/capi-controllers-deployment.yaml New — Deployment YAML as manifests-gen kustomize input

admission-policies/default/kustomization.yaml Added the new resource

Makefile Added --self-image-ref for runtime image substitution

capi-operator-manifests/default/manifests.yaml Regenerated — now includes the Deployment

capi-operator-manifests/default/metadata.yaml Regenerated — now includes selfImageRef

manifests/0000_30_cluster-api_17_deployment.yaml Deleted — CVO no longer manages this

manifests/0000_30_cluster-api_00_tombstones-5.0.yaml New — CVO tombstone for upgrade cleanup

Upgrade behavior

On upgrade from a release where CVO owns the Deployment:

CVO processes the tombstone and deletes its old copy

The operator creates the new operator-managed copy at installOrder 30

Brief pod downtime during ownership transfer is expected and acceptable.

Assisted-by: Claude (Opus 4.6)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-07-02T20:46:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5c2e12cd-0b52-461d-b772-cd246f638e51

📥 Commits

Reviewing files that changed from the base of the PR and between 2cd7b9f and 20cd738.

📒 Files selected for processing (1)

pkg/admissionpolicy/testutils/util.go

Walkthrough

This PR updates capi-controllers manifest generation to use a self image reference, adds the generated deployment and tombstone manifests, and narrows admission-policy profile loading to validating policy objects.

Changes

capi-controllers manifest migration

Layer / File(s)	Summary
Self image wiring `Makefile`, `capi-operator-manifests/default/metadata.yaml`	`manifests-gen` now receives `--self-image-ref`, and default metadata sets `selfImageRef` to the cluster-capi-operator image.
Generated deployment and tombstone `capi-operator-manifests/default/manifests.yaml`, `manifests/0000_30_cluster-api_00_tombstones-5.0.yaml`	A generated `capi-controllers` Deployment is appended, and a tombstone manifest defines the migrated Deployment with release annotations.
Admission policy manifest support `admission-policies/default/capi-controllers-deployment.yaml`, `admission-policies/default/kustomization.yaml`, `pkg/admissionpolicy/testutils/util.go`	The deployment manifest is added to the admission-policy kustomization, its annotations are removed, and profile loading now keeps only validating policy and binding objects.

Estimated code review effort: 2 (Simple) | ~15 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Topology-Aware Scheduling Compatibility	⚠️ Warning	New capi-controllers manifests hardcode a control-plane nodeSelector and master/control-plane tolerations, with no ControlPlaneTopology gating; this can fail on HyperShift.	Gate scheduling by ControlPlaneTopology or exclude external/hosted clusters; avoid unconditional control-plane selectors in the generated manifests.

✅ Passed checks (14 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title matches the main fix: reducing capi-controllers restarts during cluster installation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	No Ginkgo test titles were added or modified; the only changed file is a helper util with no It/Describe/Context/When declarations.
Test Structure And Quality	✅ Passed	No Ginkgo test blocks were changed; only a test helper’s object filtering was updated, so the listed test-quality checks aren’t applicable here.
Microshift Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added; the PR diff is manifests plus a test utility helper, with no It/Describe/Context/When additions or MicroShift-sensitive test code.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added; the only code change is a test utility parser, so there’s no SNO-specific test assumption to flag.
Ote Binary Stdout Contract	✅ Passed	The only changed Go file just narrows decoded object types; it adds no main/init/TestMain/BeforeSuite/RunSpecs code or stdout writes.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added; the only code change is a test-helper filter update, and the manifest changes are non-test assets.
No-Weak-Crypto	✅ Passed	HEAD only changes admission policy loader filtering; scans found no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB or secret-comparison code in the patch.
Container-Privileges	✅ Passed	No privileged, hostPID/Network/IPC, allowPrivilegeEscalation, SYS_ADMIN, or root-run settings were added in the touched manifests.
No-Sensitive-Data-In-Logs	✅ Passed	The diff only filters admission-policy objects; I գտnot see any new logging/print paths or sensitive-value emission in the touched files.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-07-02T20:47:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign racheljpg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

capi-operator-manifests/default/manifests.yaml (1)
1032-1032: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Consider setting automountServiceAccountToken: false if the API token isn't needed by this pod, or confirm it is required.

serviceAccountName: capi-controllers is set but automountServiceAccountToken isn't specified, defaulting to token mounting.

As per path instructions, "**/*.{yaml,yml}: ... automountServiceAccountToken: false unless needed".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@capi-operator-manifests/default/manifests.yaml` at line 1032, The pod spec
using serviceAccountName capi-controllers is implicitly mounting a service
account token, which should be disabled unless this workload needs API access.
Update the manifest where the pod spec is defined to set
automountServiceAccountToken to false if the token is not required, or otherwise
confirm and keep the default behavior; use the serviceAccountName setting to
locate the relevant pod template.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@capi-operator-manifests/default/manifests.yaml`:
- Around line 990-993: The container resource configuration only sets
resources.requests and is missing matching resources.limits for cpu and memory.
Update the container specs in manifests.yaml, including both affected resource
blocks, to add explicit limits alongside the existing requests so each container
has bounded resource usage.
- Around line 970-1027: Add livenessProbe and readinessProbe to both containers
in the manifest: the capi-controllers and machine-api-migration container specs
already expose dedicated healthz ports (healthz-o on 9440 and healthz-m on
9441), so wire the probes to those endpoints using the existing container names
and ports. Ensure each probe is defined alongside the current ports/args for the
corresponding container and points to the appropriate health check path on the
matching healthz port.
- Around line 970-1027: Both container specs in the manifest lack a
securityContext, so they still run with root-capable defaults; add a
securityContext to the capi-controllers and machine-api-migration container
definitions. Set runAsNonRoot and readOnlyRootFilesystem, disable
allowPrivilegeEscalation, and drop ALL capabilities (only add back anything
explicitly required). Use the existing container blocks with names
capi-controllers and machine-api-migration to place the same hardened settings
on both.
- Around line 948-1049: The capi-controllers Deployment manifest is missing
required hardening settings, so update the source manifest for the
capi-controllers Deployment to include a pod/container securityContext, add
resource limits alongside the existing requests, and define liveness and
readiness probes for both containers where appropriate. Make sure these
additions are placed in the same Deployment spec that defines the
capi-controllers and machine-api-migration containers so the rendered bundle
stays in sync.

---

Nitpick comments:
In `@capi-operator-manifests/default/manifests.yaml`:
- Line 1032: The pod spec using serviceAccountName capi-controllers is
implicitly mounting a service account token, which should be disabled unless
this workload needs API access. Update the manifest where the pod spec is
defined to set automountServiceAccountToken to false if the token is not
required, or otherwise confirm and keep the default behavior; use the
serviceAccountName setting to locate the relevant pod template.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c6e49887-bf0f-40ca-be91-6bde2079827a

📥 Commits

Reviewing files that changed from the base of the PR and between 3dffe24 and 2cd7b9f.

📒 Files selected for processing (6)

Makefile
admission-policies/default/capi-controllers-deployment.yaml
admission-policies/default/kustomization.yaml
capi-operator-manifests/default/manifests.yaml
capi-operator-manifests/default/metadata.yaml
manifests/0000_30_cluster-api_00_tombstones-5.0.yaml

💤 Files with no reviewable changes (1)

admission-policies/default/capi-controllers-deployment.yaml

coderabbitai · 2026-07-02T20:52:36Z

+      containers:
+      - args:
+        - --diagnostics-address=:8443
+        command:
+        - /capi-controllers
+        env:
+        - name: RELEASE_VERSION
+          value: 0.0.1-snapshot
+        image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        name: capi-controllers
+        ports:
+        - containerPort: 9443
+          name: webhook-server
+          protocol: TCP
+        - containerPort: 8443
+          name: diagnostics-o
+          protocol: TCP
+        - containerPort: 9440
+          name: healthz-o
+          protocol: TCP
+        resources:
+          requests:
+            cpu: 10m
+            memory: 50Mi
+        terminationMessagePolicy: FallbackToLogsOnError
+        volumeMounts:
+        - mountPath: /tmp/k8s-webhook-server/serving-certs
+          name: cert
+          readOnly: true
+        - mountPath: /tmp/k8s-metrics-server/serving-certs
+          name: metrics-cert
+          readOnly: true
+      - args:
+        - --diagnostics-address=:8442
+        - --health-addr=:9441
+        command:
+        - /machine-api-migration
+        env:
+        - name: RELEASE_VERSION
+          value: 0.0.1-snapshot
+        image: registry.ci.openshift.org/openshift:cluster-capi-operator
+        name: machine-api-migration
+        ports:
+        - containerPort: 8442
+          name: diagnostics-m
+          protocol: TCP
+        - containerPort: 9441
+          name: healthz-m
+          protocol: TCP
+        resources:
+          requests:
+            cpu: 10m
+            memory: 50Mi
+        terminationMessagePolicy: FallbackToLogsOnError
+        volumeMounts:
+        - mountPath: /tmp/k8s-metrics-server/serving-certs
+          name: metrics-cert
+          readOnly: true


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add liveness and readiness probes.

Neither container defines livenessProbe/readinessProbe, despite both exposing dedicated healthz-* ports (9440, 9441) presumably intended for this purpose.

As per path instructions, "**/*.{yaml,yml}: ... Liveness + readiness probes defined".

🧰 Tools

🪛 Trivy (0.69.3)

[error] 978-1003: Root file system is not read-only

Container 'machine-api-migration' of Deployment 'capi-controllers' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[error] 978-1003: Default security context configured

container capi-controllers in openshift-cluster-api namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@capi-operator-manifests/default/manifests.yaml` around lines 970 - 1027, Add livenessProbe and readinessProbe to both containers in the manifest: the capi-controllers and machine-api-migration container specs already expose dedicated healthz ports (healthz-o on 9440 and healthz-m on 9441), so wire the probes to those endpoints using the existing container names and ports. Ensure each probe is defined alongside the current ports/args for the corresponding container and points to the appropriate health check path on the matching healthz port.

Source: Path instructions

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Add securityContext (runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation, dropped capabilities) to both containers.

Neither capi-controllers nor machine-api-migration containers define a securityContext, so they run with the platform default (root-capable, writable root filesystem, privilege escalation allowed). Static analysis flags this (Checkov CKV_K8S_20/CKV_K8S_23, Trivy KSV-0014/KSV-0118).

🔒 Proposed securityContext additions

containers: - args: - --diagnostics-address=:8443 command: - /capi-controllers + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + runAsNonRoot: true + capabilities: + drop: + - ALL env:

Apply the equivalent block to the machine-api-migration container as well.

As per path instructions, "**/*.{yaml,yml}: If this is a Kubernetes/OpenShift manifest or Helm template: securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false; Drop ALL capabilities, add only what is required".

🧰 Tools

🪛 Trivy (0.69.3)

[error] 978-1003: Root file system is not read-only

Container 'machine-api-migration' of Deployment 'capi-controllers' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)

[error] 978-1003: Default security context configured

container capi-controllers in openshift-cluster-api namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@capi-operator-manifests/default/manifests.yaml` around lines 970 - 1027, Both container specs in the manifest lack a securityContext, so they still run with root-capable defaults; add a securityContext to the capi-controllers and machine-api-migration container definitions. Set runAsNonRoot and readOnlyRootFilesystem, disable allowPrivilegeEscalation, and drop ALL capabilities (only add back anything explicitly required). Use the existing container blocks with names capi-controllers and machine-api-migration to place the same hardened settings on both.

Source: Path instructions

Probes: The healthz ports (9440, 9441) are declared in the Deployment manifest but neither binary actually configures a health endpoint — there's no HealthProbeBindAddress in cmd/capi-controllers/main.go or cmd/machine-api-migration/main.go. Adding probes against those ports would cause the pods to fail health checks and crash-loop. The original Deployment on main has no probes for the same reason.

securityContext: Same as the other comment — the original Deployment on main has no securityContext. This PR moves the Deployment between deployment mechanisms without changing its spec. Hardening would be a separate effort.

@sg00dwin, thanks for the clarification — that makes sense on both counts. Since the binaries don't wire up HealthProbeBindAddress for those healthz-* ports, adding probes now would indeed cause crash-looping, and since this PR is purely a deployment-mechanism migration (CVO static manifest → operator manifests-gen) without any spec changes, carrying over the existing (lack of) securityContext is consistent with keeping the diff scoped to ordering only. I'll withdraw both suggestions.

✏️ Learnings added

Learnt from: sg00dwin Repo: openshift/cluster-capi-operator PR: 614 File: capi-operator-manifests/default/manifests.yaml:970-1027 Timestamp: 2026-07-02T21:04:22.725Z Learning: In the openshift/cluster-capi-operator repository, the `capi-controllers` Deployment (in capi-operator-manifests/default/manifests.yaml) exposes `healthz-o` (port 9440) and `healthz-m` (port 9441) container ports, but neither `cmd/capi-controllers/main.go` nor `cmd/machine-api-migration/main.go` configures a `HealthProbeBindAddress`, so these ports do not serve actual health endpoints. Do not suggest adding livenessProbe/readinessProbe against these ports unless the binaries are updated to implement them. Additionally, this Deployment intentionally has no securityContext defined, matching the pre-existing static manifest deployed by CVO on main; PRs that migrate the Deployment between deployment mechanisms (e.g., from CVO static manifests to the operator's manifests-gen pipeline) without changing the Deployment spec should not be flagged for missing securityContext hardening, as that would be a separate effort.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

sg00dwin · 2026-07-02T20:54:14Z

/jira refresh

openshift-ci-robot · 2026-07-02T20:54:20Z

@sg00dwin: This pull request references Jira Issue OCPBUGS-84321, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

sg00dwin · 2026-07-03T03:08:26Z

/test unit

LoadAdmissionPolicyProfiles() deserialized every object from manifests.yaml without type filtering. Adding the capi-controllers Deployment to that file broke 68 admission policy tests. Restrict the loader to ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding types only. Assisted-by: Claude (Opus 4.6)

openshift-ci-robot · 2026-07-03T12:19:08Z

@sg00dwin: This pull request references Jira Issue OCPBUGS-84321, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

Summary

Move the capi-controllers Deployment from CVO static manifests to the operator's manifests-gen pipeline at installOrder 30

This ensures all provider CRDs are installed and confirmed Established before capi-controllers starts, eliminating ~10 crash-restart cycles during cluster installation

Add a CVO tombstone to delete the old Deployment on upgrade

Problem

During cluster installation, CVO deploys the capi-controllers Deployment simultaneously with the capi-operator installer. The installer needs time to install provider CRDs (e.g. AWSCluster, Cluster), but capi-controllers starts immediately and crashes because those CRDs don't exist yet. Both containers (capi-controllers and machine-api-migration) exit with os.Exit(1) and restart ~10 times before the CRDs are eventually available.

Solution

The operator's Boxcutter installer uses probe-gated sequencing — components at a higher installOrder only deploy after earlier components are confirmed ready. By moving the Deployment to installOrder 30 (after core CRDs at 10 and providers at 20), the controllers only start once all required CRDs exist.

No changes to controller logic. The fix is purely deployment ordering.

Changes

File Change

admission-policies/default/capi-controllers-deployment.yaml New — Deployment YAML as manifests-gen kustomize input

admission-policies/default/kustomization.yaml Added the new resource

Makefile Added --self-image-ref for runtime image substitution

capi-operator-manifests/default/manifests.yaml Regenerated — now includes the Deployment

capi-operator-manifests/default/metadata.yaml Regenerated — now includes selfImageRef

manifests/0000_30_cluster-api_17_deployment.yaml Deleted — CVO no longer manages this

manifests/0000_30_cluster-api_00_tombstones-5.0.yaml New — CVO tombstone for upgrade cleanup

Upgrade behavior

On upgrade from a release where CVO owns the Deployment:

CVO processes the tombstone and deletes its old copy

The operator creates the new operator-managed copy at installOrder 30

Brief pod downtime during ownership transfer is expected and acceptable.

Assisted-by: Claude (Opus 4.6)

Summary by CodeRabbit

New Features

Added the capi-controllers controller Deployment to the default install manifests.

Updated manifest generation so the controller Deployment uses the correct self image reference and is packaged with the release.

Bug Fixes

Improved the controller-related release/admission configuration by updating the capi-controllers deployment admission policy and aligning controller metadata/behavior for install and uninstall consistency.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-07-03T12:23:15Z

@sg00dwin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested review from racheljpg and theobarberbany July 2, 2026 20:47

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jul 2, 2026

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 2, 2026

Uh oh!

Conversation

sg00dwin commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Upgrade behavior

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jul 2, 2026

Uh oh!

openshift-ci-robot commented Jul 2, 2026

Summary

Problem

Solution

Changes

Upgrade behavior

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Jul 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sg00dwin Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sg00dwin commented Jul 2, 2026

Uh oh!

openshift-ci-robot commented Jul 2, 2026

Uh oh!

sg00dwin commented Jul 3, 2026

Uh oh!

openshift-ci-robot commented Jul 3, 2026

Summary

Problem

Solution

Changes

Upgrade behavior

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sg00dwin commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot Jul 2, 2026 •

edited

Loading