Skip to content

Fix NodeSelector nil-vs-pointer reconcile loop in all service reconcilers#1907

Open
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:fix-nodeselector-reconcile-loop
Open

Fix NodeSelector nil-vs-pointer reconcile loop in all service reconcilers#1907
lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:fix-nodeselector-reconcile-loop

Conversation

@lmiccini
Copy link
Copy Markdown
Contributor

When the OSCP has no nodeSelector defined, instance.Spec.NodeSelector is a nil map. Taking its address (&instance.Spec.NodeSelector) creates a non-nil pointer to a nil map, which is structurally different from a nil pointer. controllerutil.CreateOrPatch detects this as a diff via reflect.DeepEqual and sends an update patch on every reconcile, bumping the sub-CR's Generation. The ObservedGeneration readiness check then reports the sub-CR as "in progress", triggering another OSCP reconcile and creating an infinite loop (~1 update/second).

Guard all 25 NodeSelector inheritance assignments with len(instance.Spec.NodeSelector) > 0 so the assignment is skipped when the OSCP nodeSelector is nil or empty, avoiding the spurious diff.

@openshift-ci openshift-ci Bot requested review from dprince and slagle April 29, 2026 15:51
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lmiccini
Once this PR has been reviewed and has the lgtm label, please assign rebtoor for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

OpenStackControlPlane CRD Size Report

Metric Value
CRD JSON size 322464 bytes (315KB)
Base branch size 322464 bytes
Change +0.00%
Status yellow — growing
Threshold reference
Color Range Meaning
🟢 green < 300KB Comfortable
🟡 yellow 300–400KB Growing
🟠 orange 400–750KB Concerning
🔴 red > 750KB Approaching 1.5MB etcd limit (cut in half to allow space for update)

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/13ac44a8c41f48ce9d28e146d1dc1e4c

openstack-k8s-operators-content-provider NODE_FAILURE Node(set) request 100-0000083077 failed in 0s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@nhicher
Copy link
Copy Markdown

nhicher commented Apr 29, 2026

recheck

…lers

When the OSCP has no nodeSelector defined, instance.Spec.NodeSelector is
a nil map. Taking its address (&instance.Spec.NodeSelector) creates a
non-nil pointer to a nil map, which is structurally different from a nil
pointer. controllerutil.CreateOrPatch detects this as a diff via
reflect.DeepEqual and sends an update patch on every reconcile, bumping
the sub-CR's Generation. The ObservedGeneration readiness check then
reports the sub-CR as "in progress", triggering another OSCP reconcile
and creating an infinite loop (~1 update/second).

Guard all 25 NodeSelector inheritance assignments with
len(instance.Spec.NodeSelector) > 0 so the assignment is skipped when
the OSCP nodeSelector is nil or empty, avoiding the spurious diff.

Two new functional tests verify the fix:
- "does not set nodeSelector on sub-CRs": NodeSelector stays nil on
  Memcached, Galera, and RabbitMQ when the OSCP omits nodeSelector.
- "does not cause spurious updates": sub-CR generation remains stable
  over 5 seconds (Consistently), catching the ~1/second spec mutation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the fix-nodeselector-reconcile-loop branch from 3cf0bfe to 7f2da76 Compare April 29, 2026 16:11
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/3e6760895c344f26842147ccd94bca5f

openstack-k8s-operators-content-provider RETRY_LIMIT Host unreachable in 5m 46s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants