Skip to content

PRODENG-3443: add upgrade smoke test (MCR stable-25.0/MKE 3.8.8 → stable-29.2/MKE 3.9.2)#629

Open
james-nesbitt wants to merge 1 commit into
mainfrom
upgrade-smoke-test-plan
Open

PRODENG-3443: add upgrade smoke test (MCR stable-25.0/MKE 3.8.8 → stable-29.2/MKE 3.9.2)#629
james-nesbitt wants to merge 1 commit into
mainfrom
upgrade-smoke-test-plan

Conversation

@james-nesbitt
Copy link
Copy Markdown
Collaborator

Jira: https://mirantis.jira.com/browse/PRODENG-3443

Summary

Adds an automated upgrade smoke test that provisions a 6-node Linux cluster (RHEL8/Rocky8/Ubuntu22), installs MCR stable-25.0 / MKE 3.8.8, then upgrades in place to MCR stable-29.2 / MKE 3.9.2 using a second Apply() call on the same infrastructure.

This is the first run — expected to surface MCR 25→29 upgrade issues so they can be fixed iteratively.

Changes

  • test/smoke/upgrade_test.goupgradeConfig struct, runUpgradeTest(), bumpVersions() YAML mutator, TestUpgradeLegacyToModern
  • Makefilemake smoke-upgrade (90m timeout)
  • .github/workflows/smoke-tests.yamlsmoke-upgrade CI job, gated by smoke-upgrade or smoke-test label

Design

bumpVersions() unmarshals the Terraform-generated launchpad_yaml output, updates spec.mcr.channel and spec.mke.version, and re-marshals — preserving host addresses, SANs, LB names, and install flags verbatim. Two sequential Apply() calls on the same infrastructure handle install then upgrade.

Known concerns (MCR 25→29)

See PRODENG-3443 for the full list. Key risks: package repo transition, storage driver compatibility, kernel version floor (MCR 29 requires ≥5.4), and MKE 3.8 compatibility on MCR 29 as an intermediate state.

TestUpgradeLegacyToModern provisions a 6-node legacy Linux cluster
(RHEL8/Rocky8/Ubuntu22, MCR stable-25.0, MKE 3.8.8), then upgrades
it in place to MCR stable-29.2 / MKE 3.9.2 using a second Apply()
call on the same Terraform infrastructure.

Design:
- runUpgradeTest() follows the same infra pattern as runSmokeTest()
  (defer terraform.Destroy first, tagged resources, temp SSH dir).
- bumpVersions() unmarshals the Terraform-generated launchpad YAML,
  updates spec.mcr.channel and spec.mke.version, and re-marshals —
  preserving host addresses, SANs, LB names, and install flags verbatim.
- Two sequential Apply() calls: base install then upgrade. Launchpad's
  UpgradeMCR and UpgradeMKE phases handle the delta automatically.
- Reset() is best-effort (same rationale as the other smoke tests).

Infra:
- 90m go test timeout (install ~20min + upgrade ~20min + reset + buffer).
- smoke-upgrade CI job gated by smoke-upgrade or smoke-test PR label.
- Makefile target: make smoke-upgrade
@james-nesbitt james-nesbitt added the smoke-upgrade Run smoke-upgrade CI job label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

smoke-upgrade Run smoke-upgrade CI job

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant