Skip to content

[Debug] Use larger runner for (most) integration test suites#1032

Draft
cpuguy83 wants to merge 4 commits into
project-dalec:mainfrom
cpuguy83:use_larger_runners
Draft

[Debug] Use larger runner for (most) integration test suites#1032
cpuguy83 wants to merge 4 commits into
project-dalec:mainfrom
cpuguy83:use_larger_runners

Conversation

@cpuguy83

@cpuguy83 cpuguy83 commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch 8 times, most recently from 4ee4703 to 52d00ea Compare April 9, 2026 18:19
@cpuguy83 cpuguy83 marked this pull request as ready for review April 9, 2026 18:20
Copilot AI review requested due to automatic review settings April 9, 2026 18:20
Comment thread .github/workflows/ci.yml
- name: Setup source policy
if: inputs.source_policy
uses: ./.github/actions/setup-source-policy
- name: Aggressive cleanup

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disks are larger on the updated runners and this step takes 3mins by itself to run, so just not worth it anymore.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the CI integration-test job to use a larger GitHub Actions runner for most suites and improves docker/containerd restart diagnostics during CI setup.

Changes:

  • Switch integration job runner selection to a conditional matrix-based runner (larger runner for most suites).
  • Add a Docker diagnostics step and tighten docker/containerd restart handling during OTEL tracing setup.
  • Update composite actions to stop/start Docker with clearer failure reporting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
.github/workflows/ci.yml Conditional runner selection for integration suites; adds Docker info; changes docker/containerd lifecycle around tracing setup; removes aggressive disk cleanup.
.github/actions/enable-containerd/action.yml Stops/starts Docker when enabling containerd snapshotter and emits logs on failure.
.github/actions/dns-spoof-ubuntu-archive/action.yml Stops/starts Docker after writing daemon DNS config and emits logs on failure.

Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/actions/enable-containerd/action.yml
Comment thread .github/actions/dns-spoof-ubuntu-archive/action.yml
Comment thread .github/workflows/ci.yml

integration:
runs-on: ubuntu-22.04
runs-on: ${{ matrix.suite == 'other' && 'ubuntu-22.04' || 'ubuntu-latest-4-cores' }}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runners we have access to are all ubuntu-latest-<n>-cores

@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 52d00ea to 86fbcc1 Compare April 9, 2026 20:47
@cpuguy83 cpuguy83 marked this pull request as draft April 9, 2026 20:48
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 5288397 to 8409f8e Compare April 10, 2026 20:28
@cpuguy83 cpuguy83 self-assigned this Apr 13, 2026
@cpuguy83 cpuguy83 changed the title Use larger runner for (most) integration test suites [Debug] Use larger runner for (most) integration test suites Apr 13, 2026
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 8409f8e to 04038b1 Compare May 22, 2026 23:32
cpuguy83 added 4 commits June 15, 2026 15:48
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
After switching to larger runners CI had 3 jobs where dockerd just would
not start. Seemingly because we are restarting docker (for config
updates) quickly enough such that systemd refuses to restart it.

This change resets the fail counter in systemd if docker fails to
restart and tries again.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Add timeout signaling from test2json2gha to GITHUB_OUTPUT so subsequent
CI steps can detect when tests timed out. On timeout, the dump logs step
now collects goroutine stacks, a binary heap profile, and the dockerd
binary from the runner for offline analysis with go tool pprof.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 0b81a6b to 5e705ef Compare June 15, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants