Skip to content

Remove 1024 CPUs limit#5343

Open
kolyshkin wants to merge 2 commits into
opencontainers:mainfrom
kolyshkin:cpu-aff-dyn
Open

Remove 1024 CPUs limit#5343
kolyshkin wants to merge 2 commits into
opencontainers:mainfrom
kolyshkin:cpu-aff-dyn

Conversation

@kolyshkin

@kolyshkin kolyshkin commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Replace the fixed size unix.CPUMask with CPUMaskDynamic (see CL 735380).

Modify ToCPUSet to return a dynamic set.

This is backward compatible with reading older state.json: both the old *unix.CPUSet (fixed array) and the new CPUSetDynamic (slice) marshal to/from the same JSON array-of-numbers representation.

This was introduced in https://go-review.googlesource.com/c/sys/+/735380
to overcome the 1024 CPUs limit.

Let's use it and drop our own wrapper (note we're losing retry-on-eintr
but I don't think it's a problem).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
ToCPUSet was limited to unix.CPUSet (a fixed-size mask capped at 1024
CPUs). Switch CPUAffinity, LinuxMemoryPolicy.Nodes, ToCPUSet, and
SetMempolicy to unix.CPUSetDynamic, which can represent arbitrary CPU
and NUMA node IDs.

ToCPUSet now parses the input in a first pass to find the maximum value,
then allocates a mask large enough to hold it (out-of-range Set calls on
a dynamic mask are silently ignored). An arbitrary sanity cap is kept to
avoid huge allocations on bogus input.

This is backward compatible with reading older state.json: both the old
*unix.CPUSet (a fixed array) and the new CPUSetDynamic (a slice)
marshal to and from the same JSON array-of-numbers representation.

Update TestToCPUSet accordingly, adding coverage for values beyond the
old non-dynamic limit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kolyshkin kolyshkin marked this pull request as ready for review June 24, 2026 21:56
@kolyshkin kolyshkin requested review from cyphar, lifubang and rata and removed request for cyphar June 24, 2026 21:56
@kolyshkin kolyshkin added this to the 1.6.0-rc.1 milestone Jun 24, 2026
@kolyshkin

Copy link
Copy Markdown
Contributor Author

Cc @askervin

return nil, fmt.Errorf("no members found in set %q", str)
}

s := unix.NewCPUSet(maxCPU + 1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! But I see that's something you contributed (golang/sys@690c91f)

I should probably check if we have similar assumptions elsewhere (assuming 1024 is the limit)

@thaJeztah thaJeztah left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second commit is missing a DCO sign-off; only has

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Comment thread CHANGELOG.md
Comment on lines +26 to +27
- The `cpuAffinity` and NUMA `memoryPolicy` settings are no longer limited to
1024 CPUs/nodes, as runc now uses a dynamically-sized CPU mask. (#5343)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the new (arbitrary) limit, or too high to reasonably be concerned about for users?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's too high, and the error message is clear so once someone will bump into it it's an easy fix.

AFAIK the kernel supports 8192 CPUs by default (unless specifically configured to handle more CPUs).

Comment thread internal/linux/linux.go

// SchedSetaffinity wraps sched_setaffinity syscall without unix.CPUSet size limitation.
func SchedSetaffinity(pid int, buf []byte) error {
err := retryOnEINTR(func() error {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you mentioned it;

note we're losing retry-on-eintr but I don't think it's a problem

Is this something we should still add in golang.org/x/sys ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, x/sys/unix is a low-level thing so I guess they will not accept it.

I also think (but not sure) that sched_setaffinity is safe wrt EINTR, as it's not a blocking call (like a syscall doing I/O).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, in principle almost any syscall can EINTR? It seems like we could just replace the function we're wrapping here (i.e., just replace unix.Syscall with the new unix.SchedSetAffinityDynamic) but I don't really mind too much and we were already calling into unix before...

@kolyshkin kolyshkin requested a review from thaJeztah June 25, 2026 00:22
Comment thread CHANGELOG.md
@@ -6,6 +6,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to forward-port the 1.5 changelog, these entries are being mixed with stuff we already released.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5345 is the changelog update PR.

Comment on lines +12 to +17
maxCPU := 0
for _, cpu := range cpus {
if cpu > maxCPU {
maxCPU = cpu
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Aren't we on a new enough Go version to be able to just use max here?

@rata rata left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a nit, but I'm fine if it's not fixed as part of this PR :)

Comment on lines -326 to +327
if ret >= maxCPU {
return 0, fmt.Errorf("values larger than %d are not supported", maxCPU-1)
if ret > maxCPUSetCPU {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are losing the "=" here. I guess because we use the "+1" below as the maxCPUSetCPU is a valid CPU to set. But just asking to make sure I'm following correctly

Comment on lines +333 to +339
// First pass: parse all ranges and find the maximum CPU value, so the
// resulting mask can be allocated large enough to hold it.
type cpuRange struct{ start, end int }
var (
ranges []cpuRange
maxCPU int
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd separate the parsing in another function from this one. Not very related to this PR, but if you don't mind doing it... Otherwise it's fine, we can tackle it in another PR

@thaJeztah

Copy link
Copy Markdown
Member

second commit is missing a DCO sign-off

Can you fix the DCO, Kir?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants