Skip to content

Add CPMS e2e-presubmit test stage role for shiftstack-qa#9

Open
tusharjadhav3302 wants to merge 9 commits into
mainfrom
support_cpms_test_role_tj
Open

Add CPMS e2e-presubmit test stage role for shiftstack-qa#9
tusharjadhav3302 wants to merge 9 commits into
mainfrom
support_cpms_test_role_tj

Conversation

@tusharjadhav3302

@tusharjadhav3302 tusharjadhav3302 commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a new cpms_test stage role that runs the upstream cluster-control-plane-machine-set-operator e2e-presubmit tests, and include the cpms_replace_attrs day2ops procedure (unwired, for future use).

What this PR includes

New cpms_test stage role (collection/stages/roles/cpms_test/):

  • Clones the upstream operator repo at the matching release-X.Y branch
  • Runs make e2e-presubmit (rolls 1 master, ~30 min) with OPENSHIFT_CI=true for JUnit output
  • Post-processes results via the shared post_openshift_tests.yml (XML tag modification, HTML conversion, copy to report directory)
  • Defaults to e2e-presubmit only via cpms_test_suites variable — e2e-periodic can be enabled by overriding this list in a dedicated job

New cpms_replace_attrs day2ops procedure (unwired):

  • Ported from openshift-ir-plugin — patches CPMS with modified failure domains, networks, and security groups
  • Validates reconciliation and restores original state
  • Not wired into any job definition in this PR — will be enabled in a follow-up with a dedicated periodic job

Wiring:

  • cpms_test stage added to osp_verification.yaml (runs in the existing Zuul periodic job)
  • New sub-play playbooks/plays/cpms_test.yaml and import in ocp_testing.yaml

What this PR does NOT include

  • e2e-periodic is not enabled by default (5h timeout, rolls all 3 masters — exceeds the 8h Zuul job budget)
  • cpms_replace_attrs is not wired into any day2ops_procedures list (no cpms_replacements defaults — those create real OpenStack resources and require explicit configuration)
  • Both will be enabled in a dedicated periodic job in a follow-up PR

Test plan

  • Manual run on serval70 — automation executed end-to-end, JUnit XML produced for both suites
  • Zuul periodic job pass with e2e-presubmit completing within the 8h budget
  • Collect timing numbers for presubmit on a clean environment

@tusharjadhav3302 tusharjadhav3302 added the enhancement New feature or request label May 20, 2026

@imatza-rh imatza-rh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good port from the IR-plugin. Nice improvements: OpenStack resource cleanup in always, must-gather on test failure. Nit: PR description says "branch fallback" but prepare_openshift_tests.yml has no fallback - same as all other roles, works fine.

Please run ./gate.sh (ansible-lint + pre-commit inside the shiftstack-client container) - no CI checks are configured on GitHub PRs. yamllint passes locally, but ansible-lint needs the container.

Comment thread jobs_definitions/4.17_ovnkubernetes_ipi.yaml Outdated
Comment thread collection/stages/roles/day2ops/tasks/procedures/cpms_replace_attrs.yml Outdated
@tusharjadhav3302 tusharjadhav3302 force-pushed the support_cpms_test_role_tj branch from f88cbb5 to cce04e6 Compare May 28, 2026 13:59

@imatza-rh imatza-rh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three concerns before merging:

1. Timing budget - The osp_verification Zuul job currently runs 7-8.6h against an 8h timeout (recent SUCCESS builds). The CPMS e2e-periodic target has a 5h ginkgo timeout and rolls all 3 masters. Adding both presubmit + periodic would push the total well past the timeout. Consider running only e2e-presubmit initially, or running CPMS tests in a separate periodic job. Has this been tested on a live environment? Duration numbers would help.

2. Suggest splitting the binary download rewrite - The tools_get_openshift_release rewrite (commits bda356e-8548bec, merged from the use-oc-adm-release-extract-for-binaries branch) is independent of the CPMS test role. It changes a shared role used by all job definitions and introduces a hard dependency on rhoso_kubeconfig for pull secret extraction - the old code just did an HTTP GET with no cluster access needed. Splitting into a separate PR would let the CPMS test merge independently and give the shared role change its own review cycle.

3. cpms_replace_attrs wiring - The procedure is added but not wired into any day2ops_procedures list (it was removed from 4.17_ovnkubernetes_ipi.yaml per previous review). The cpms_replacements variable definition was also removed in that commit. Worth adding defaults and wiring it in osp_verification.yaml if it's intended to run, or noting in the description that it'll be wired separately.

Comment thread collection/stages/roles/cpms_test/tasks/run_cpms_test.yml Outdated
Comment thread collection/stages/roles/cpms_test/tasks/run_cpms_test.yml Outdated
@tusharjadhav3302 tusharjadhav3302 force-pushed the support_cpms_test_role_tj branch from c18853c to c867cb4 Compare June 29, 2026 10:47
@tusharjadhav3302

tusharjadhav3302 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Three concerns before merging:

1. Timing budget - The osp_verification Zuul job currently runs 7-8.6h against an 8h timeout (recent SUCCESS builds). The CPMS e2e-periodic target has a 5h ginkgo timeout and rolls all 3 masters. Adding both presubmit + periodic would push the total well past the timeout. Consider running only e2e-presubmit initially, or running CPMS tests in a separate periodic job. Has this been tested on a live environment? Duration numbers would help.

2. Suggest splitting the binary download rewrite - The tools_get_openshift_release rewrite (commits bda356e-8548bec, merged from the use-oc-adm-release-extract-for-binaries branch) is independent of the CPMS test role. It changes a shared role used by all job definitions and introduces a hard dependency on rhoso_kubeconfig for pull secret extraction - the old code just did an HTTP GET with no cluster access needed. Splitting into a separate PR would let the CPMS test merge independently and give the shared role change its own review cycle.

3. cpms_replace_attrs wiring - The procedure is added but not wired into any day2ops_procedures list (it was removed from 4.17_ovnkubernetes_ipi.yaml per previous review). The cpms_replacements variable definition was also removed in that commit. Worth adding defaults and wiring it in osp_verification.yaml if it's intended to run, or noting in the description that it'll be wired separately.

@imatza-rh
For the timing budget: I'm thinking of defaulting cpms_test_suites to ["e2e-presubmit"] only in defaults/main.yml, which rolls 1 master (~30 min). Since the role defaults are what the Zuul job picks up (no variable overrides are passed from ci-framework), this ensures only presubmit runs in the periodic job. This should fit within the 8h budget.

For e2e-periodic: it's available in the code but won't run unless someone explicitly adds it to cpms_test_suites in a job definition. We can enable it in a dedicated periodic job in a follow-up once we have confirmed timing numbers.

For cpms_replace_attrs: both roll all 3 masters (~1-2h) and has no dependency on presubmit — just needs a settled cluster (Progressing=False). Should I keep the procedure file in this PR (as code-only, unwired, with a note that it'll be enabled in a follow-up)? Or would you prefer I remove it entirely and submit it as a separate PR when we're ready to wire it with a dedicated job? Let me know which is cleaner for review.

For the binary download rewrite: use-oc-adm-release-extract-for-binaries has already been merged to main via PR #15. I've rebased this branch on top of current main and force-pushed — those commits are no longer part of this PR's diff.

So this PR would ship:

  • e2e-presubmit wired into osp_verification.yaml via role default (runs in the existing Zuul periodic job)
  • e2e-periodic in the code but not enabled by default
  • cpms_replace_attrs — pending your preference above

Does this approach work?

@imatza-rh

Copy link
Copy Markdown
Contributor

Presubmit-only default sounds right. Worth noting main.yml still hardcodes both targets - might want to add the cpms_test_suites gating before merging.

Keeping cpms_replace_attrs unwired is fine. Probably better without defaults for cpms_replacements since those create real OpenStack resources.

Binary rewrite confirmed out of this diff.

@tusharjadhav3302 tusharjadhav3302 changed the title Support CPMS test role in shiftstack-qa automation Add CPMS e2e-presubmit test stage role for shiftstack-qa Jun 29, 2026
@tusharjadhav3302 tusharjadhav3302 force-pushed the support_cpms_test_role_tj branch from 546a490 to 43e4123 Compare June 29, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

2 participants