STOR-3030: feat(operator): detect GCP Dedicated and use hyperdisk-balanced StorageClass by jsafrane · Pull Request #573 · openshift/csi-operator

jsafrane · 2026-07-03T11:35:42Z

At operator startup, fetch the Infrastructure CR (with retry for up to 1 minute during early cluster bootstrap) and check if the GCP region starts with "u-". If so, the cluster runs on GCP Dedicated where only hyperdisk-balanced disks are supported — create only the hyperdisk-balanced StorageClass as default.

On regular GCP, the existing standard-csi and ssd-csi StorageClasses are used as before.

…geClass At operator startup, fetch the Infrastructure CR (with retry for up to 1 minute during early cluster bootstrap) and check if the GCP region starts with "u-". If so, the cluster runs on GCP Dedicated where only hyperdisk-balanced disks are supported — create only the hyperdisk-balanced StorageClass as default. On regular GCP, the existing standard-csi and ssd-csi StorageClasses are used as before. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci-robot · 2026-07-03T11:35:45Z

@jsafrane: This pull request references STOR-3030 which is a valid jira issue.

Details

In response to this:

At operator startup, fetch the Infrastructure CR (with retry for up to 1 minute during early cluster bootstrap) and check if the GCP region starts with "u-". If so, the cluster runs on GCP Dedicated where only hyperdisk-balanced disks are supported — create only the hyperdisk-balanced StorageClass as default.

On regular GCP, the existing standard-csi and ssd-csi StorageClasses are used as before.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-07-03T11:36:01Z

📝 Walkthrough

Walkthrough

This change modifies the GCP PD CSI driver operator's startup logic to select StorageClass manifest files based on the cluster's GCP region. A new helper function retrieves the Infrastructure custom resource, polling for up to one minute, and inspects the region to determine whether it is a GCP Dedicated region (prefixed with "u-"). Dedicated regions receive a hyperdisk-balanced-only StorageClass, while regular regions receive standard and SSD StorageClasses. Startup now propagates errors if the Infrastructure CR cannot be retrieved. New unit tests validate the selection logic across various region configurations and error handling when the Infrastructure CR is absent.

Changes

Area	Change
Operator startup	Added region-aware StorageClass file selection via new helper and wired it into controller setup with error propagation
Tests	Added tests covering region-based selection scenarios and missing Infrastructure CR error handling

Sequence Diagram(s)

Included in the hidden review stack artifact above.

Related Issues: None referenced in the provided context.

Related PRs: None referenced in the provided context.

Suggested labels: area/gcp-pd-csi-driver, ok-to-test

Suggested reviewers: None determinable from the provided context.

Poem
A rabbit hops through region names,
Seeking "u-" among the claims,
Hyperdisk for dedicated ground,
SSD and standard all around,
One minute waits, then off it bounds. 🐇

🚥 Pre-merge checks | ✅ 15

✅ Passed checks (15 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: detecting GCP Dedicated and switching to hyperdisk-balanced StorageClass.
Description check	✅ Passed	The description accurately describes the startup retry logic and region-based StorageClass selection in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Added tests use static t.Run names only; no dynamic values, timestamps, or generated identifiers appear in titles.
Test Structure And Quality	✅ Passed	PASS: The added tests are table-driven, isolated, use fake clients, bound execution with context timeouts, and include clear failure messages; no cleanup or Ginkgo waits are needed.
Microshift Test Compatibility	✅ Passed	The added tests are plain Go unit tests (t.Run), not Ginkgo e2e tests, and no MicroShift-unsupported OpenShift APIs/features or guards are involved.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PASS: The added tests are plain `testing` unit tests, not Ginkgo e2e tests, and they only exercise config-client logic with no node/topology assumptions.
Topology-Aware Scheduling Compatibility	✅ Passed	Only storage-class selection at startup changed; no new node selectors, affinities, replica math, tolerations, or spread/PDB constraints were added.
Ote Binary Stdout Contract	✅ Passed	No stdout writes appear in main/init/TestMain/suite setup; the PR only adds klog logging inside RunOperator/helper code, which the check excludes.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo e2e tests were added; the only test file is a Go unit test with no IPv4-only or external connectivity assumptions.
No-Weak-Crypto	✅ Passed	Changed files only add StorageClass selection logic/tests; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret comparisons were introduced.
Container-Privileges	✅ Passed	No privileged/root/hostNetwork/hostIPC/allowPrivilegeEscalation/SYS_ADMIN settings were added in the changed files; only Go startup logic and tests changed.
No-Sensitive-Data-In-Logs	✅ Passed	New logs only emit the GCP region and generic Infrastructure lookup status; no passwords, tokens, PII, or customer data are logged.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-07-03T11:36:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~legacy/gcp-pd-csi-driver-operator/OWNERS~~ [jsafrane]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

🧹 Nitpick comments (2)

legacy/gcp-pd-csi-driver-operator/pkg/operator/starter.go (1)
251-267: 🩺 Stability & Availability | 🔵 Trivial

Cumulative startup latency now up to ~2 minutes.

This adds up to 1 more minute of blocking retry on top of the existing FeatureGate wait timeout (case <-time.After(1 * time.Minute)), so worst-case operator startup latency roughly doubles. Confirm the operator's liveness/readiness probe initialDelaySeconds/failureThreshold tolerate this before the pod is restarted mid-bootstrap.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@legacy/gcp-pd-csi-driver-operator/pkg/operator/starter.go` around lines 251 -
267, The new blocking retry in starter.go can extend operator bootstrap by about
another minute on top of the existing FeatureGate wait path, so review the
startup flow around getStorageClassFiles and
csiControllerSet.WithStorageClassController to avoid exceeding probe tolerance.
Either shorten or remove the extra blocking wait, or update the operator’s
liveness/readiness probe settings (initialDelaySeconds and failureThreshold) so
the pod is not restarted while startup is still in progress.
legacy/gcp-pd-csi-driver-operator/pkg/operator/storageclass_test.go (1)
119-129: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low value

Consider shortening the context timeout for the missing-Infrastructure test.

Since the Infrastructure CR never appears in this test, the retry loop inside getStorageClassFiles will keep polling until the context deadline is reached, meaning this test will consistently take the full 10 seconds to complete. A much shorter timeout (e.g. 1-2s) would likely still exercise at least one poll iteration and let the test fail fast, keeping the suite quick.
♻️ Suggested fix
 func TestGetStorageClassFilesNoInfrastructure(t *testing.T) {
 	configClient := fakeconfig.NewSimpleClientset()
-	// Use a short timeout so the retry loop finishes quickly in tests.
-	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+	// Use a short timeout so the retry loop finishes quickly in tests.
+	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
 	defer cancel()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@legacy/gcp-pd-csi-driver-operator/pkg/operator/storageclass_test.go` around
lines 119 - 129, The missing-Infrastructure test in
TestGetStorageClassFilesNoInfrastructure uses a much longer context timeout than
needed, so it waits for the full retry period before failing. Shorten the
timeout in the test’s context.WithTimeout setup to a small value that still
allows at least one poll in getStorageClassFiles, and keep the existing
assertion that an error is returned. This change should be made in the
TestGetStorageClassFilesNoInfrastructure function only.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@legacy/gcp-pd-csi-driver-operator/pkg/operator/starter.go`:
- Around line 251-267: The new blocking retry in starter.go can extend operator
bootstrap by about another minute on top of the existing FeatureGate wait path,
so review the startup flow around getStorageClassFiles and
csiControllerSet.WithStorageClassController to avoid exceeding probe tolerance.
Either shorten or remove the extra blocking wait, or update the operator’s
liveness/readiness probe settings (initialDelaySeconds and failureThreshold) so
the pod is not restarted while startup is still in progress.

In `@legacy/gcp-pd-csi-driver-operator/pkg/operator/storageclass_test.go`:
- Around line 119-129: The missing-Infrastructure test in
TestGetStorageClassFilesNoInfrastructure uses a much longer context timeout than
needed, so it waits for the full retry period before failing. Shorten the
timeout in the test’s context.WithTimeout setup to a small value that still
allows at least one poll in getStorageClassFiles, and keep the existing
assertion that an error is returned. This change should be made in the
TestGetStorageClassFilesNoInfrastructure function only.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 49560999-d65b-4560-8fc9-5e90f7e2c29c

📥 Commits

Reviewing files that changed from the base of the PR and between 2fc1e7d and 723e2b4.

📒 Files selected for processing (2)

legacy/gcp-pd-csi-driver-operator/pkg/operator/starter.go
legacy/gcp-pd-csi-driver-operator/pkg/operator/storageclass_test.go

openshift-ci · 2026-07-03T15:06:31Z

@jsafrane: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-csi	`723e2b4`	link	true	`/test e2e-gcp-csi`
ci/prow/aws-efs-operator-e2e	`723e2b4`	link	true	`/test aws-efs-operator-e2e`
ci/prow/e2e-azure	`723e2b4`	link	true	`/test e2e-azure`
ci/prow/e2e-gcp-csi-manual-oidc	`723e2b4`	link	false	`/test e2e-gcp-csi-manual-oidc`
ci/prow/e2e-aws-csi	`723e2b4`	link	true	`/test e2e-aws-csi`
ci/prow/e2e-aws-ovn-upgrade	`723e2b4`	link	true	`/test e2e-aws-ovn-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 3, 2026

jsafrane mentioned this pull request Jul 3, 2026

STOR-3030: Use hyperdisk-balanced StorageClass on GCP Dedicated openshift/gcp-pd-csi-driver-operator#196

Closed

4 tasks

openshift-ci Bot requested a review from dobsonj July 3, 2026 11:36

openshift-ci Bot requested a review from mpatlasov July 3, 2026 11:36

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2026

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

STOR-3030: feat(operator): detect GCP Dedicated and use hyperdisk-balanced StorageClass#573

STOR-3030: feat(operator): detect GCP Dedicated and use hyperdisk-balanced StorageClass#573
jsafrane wants to merge 1 commit into
openshift:mainfrom
jsafrane:fix-STOR-3030

jsafrane commented Jul 3, 2026

Uh oh!

openshift-ci-robot commented Jul 3, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jsafrane commented Jul 3, 2026

Uh oh!

openshift-ci-robot commented Jul 3, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

openshift-ci-robot commented Jul 3, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading