Skip to content

Back off on DescribeWorkerDeployment ResourceExhausted error#291

Merged
carlydf merged 3 commits intomainfrom
backoff-on-resource-exhausted2
Apr 24, 2026
Merged

Back off on DescribeWorkerDeployment ResourceExhausted error#291
carlydf merged 3 commits intomainfrom
backoff-on-resource-exhausted2

Conversation

@carlydf
Copy link
Copy Markdown
Collaborator

@carlydf carlydf commented Apr 23, 2026

What was changed and why

Problem

Namespaces with many TemporalWorkerDeployment objects can trigger Temporal's per-namespace DescribeWorkerDeployment rate limit (frontend.globalNamespaceWorkerDeploymentReadRPS, default 50 RPS). When this happens, the reconciler was returning the error immediately, causing the workqueue to requeue with an exponential backoff starting at ~5ms — effectively a tight retry loop that makes the rate-limit problem worse.

The error comes back as *serviceerror.ResourceExhausted (not a standard gRPC codes.ResourceExhausted status), so it must be detected with errors.As rather than grpcstatus.FromError.

Changes

  • internal/controller/worker_controller.go: detect *serviceerror.ResourceExhausted from GetWorkerDeploymentState and return RequeueAfter: 30s instead of an immediate error requeue. Sets ConditionProgressing=False with ReasonTemporalStateFetchFailed and a "Rate limited" message so the condition is visible to users.
  • internal/tests/internal/rate_limit_integration_test.go: new integration test that creates 10 TWDs against a 1 RPS limit, confirming the error surfaces with the expected condition reason and message.
  • go.mod / go.sum / go.work: bump Temporal server dependency to v1.31.0-154.2, which is the first version that enforces globalNamespaceWorkerDeploymentReadRPS.

Checklist

  1. Closes [Bug] Worker Deployment Read API rate limit still exceeded in v1.4.0 #278

  2. How was this tested:

  • KUBEBUILDER_ASSETS=.../bin/k8s/1.27.1-darwin-arm64 go test -tags test_dep ./internal/tests/internal -run "TestIntegration/rate-limit" -timeout 120s -v passes
  • Same test fails when the errors.As block is removed (reverts to immediate retry with generic message)
  • Full integration suite passes: go test -tags test_dep ./internal/tests/internal -run TestIntegration -timeout 600s
  1. Any docs updates needed?

@carlydf carlydf marked this pull request as ready for review April 23, 2026 03:49
@carlydf carlydf requested review from a team and jlegrone as code owners April 23, 2026 03:49
@carlydf carlydf marked this pull request as draft April 23, 2026 04:32
@carlydf carlydf changed the title Back off on DescribeWorkerDeployment ResourceExhausted error [BLOCKED on #290 but ready for review] Back off on DescribeWorkerDeployment ResourceExhausted error Apr 23, 2026
@carlydf
Copy link
Copy Markdown
Collaborator Author

carlydf commented Apr 23, 2026

integration test will pass once I pull in #290

@carlydf carlydf marked this pull request as ready for review April 23, 2026 04:33
Copy link
Copy Markdown
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

muy bueno.

Comment thread internal/tests/internal/rate_limit_integration_test.go
@carlydf carlydf changed the title [BLOCKED on #290 but ready for review] Back off on DescribeWorkerDeployment ResourceExhausted error Back off on DescribeWorkerDeployment ResourceExhausted error Apr 24, 2026
@carlydf carlydf enabled auto-merge (squash) April 24, 2026 19:05
@carlydf carlydf merged commit c09c580 into main Apr 24, 2026
17 checks passed
@carlydf carlydf deleted the backoff-on-resource-exhausted2 branch April 24, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Worker Deployment Read API rate limit still exceeded in v1.4.0

3 participants