Skip to content

feat: Add selector and bootstrap observability metrics#286

Open
rawadhossain wants to merge 1 commit into
kubernetes-sigs:mainfrom
rawadhossain:feat/add-observability-metrics
Open

feat: Add selector and bootstrap observability metrics#286
rawadhossain wants to merge 1 commit into
kubernetes-sigs:mainfrom
rawadhossain:feat/add-observability-metrics

Conversation

@rawadhossain

Copy link
Copy Markdown
Contributor

Description

This PR adds three new metrics. Two metrics are fully implemented, while the third is left with TODOs pending discussion.

What each metric does

node_readiness_selector_matched_nodes_total

Tracks how many nodes currently match a rule's spec. If a rule's NodeSelector matches no nodes, controller performs no work and produces no other signal. This metric makes those misconfigurations immediately visible.

node_readiness_bootstrap_completion_errors_total

Counts failures writing the bootstrap completion annotation. If this write fails, the node continues to be re-evaluated even though bootstrap completed. This metric makes those failures visible.

node_readiness_bootstrap_nrc_duration_seconds

Measures only the time NRC itself held a node, from the first taint until bootstrap completion. Excludes pre-NRC boot time.

It's registered, but recording logic deferred pending discussion on the timestamp anchor.
The two approaches I see are:

Option A:

  • Write readiness.k8s.io/taint-applied-<rule> to node.ObjectMeta when the taint is first applied and record duration from that timestamp to bootstrap completion.

Option B:

  • Write a dedicated node status condition and use its API-server-generated lastTransitionTime as the start timestamp.
  • Requires a separate Status().Patch() call and nodes/status write permissions.

I left TODOs, so implementation can be completed once we agree on the approach.

Related to Issue #182

Type of Change

/kind feature

Testing

  • Added tests coverages.

Checklist

  • make test passes
  • make lint passes

Signed-off-by: Rawad Hossain <rawad.hossain00@gmail.com>
@kubernetes-prow kubernetes-prow Bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 2, 2026
@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploy Preview for node-readiness-controller canceled.

Name Link
🔨 Latest commit 406dde6
🔍 Latest deploy log https://app.netlify.com/projects/node-readiness-controller/deploys/6a468cd1461bcc000824619a

@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rawadhossain
Once this PR has been reviewed and has the lgtm label, please assign ajaysundark for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubernetes-prow

Copy link
Copy Markdown

Hi @rawadhossain. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubernetes-prow kubernetes-prow Bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant