[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling#49062
Open
xinlian12 wants to merge 2 commits intoAzure:mainfrom
Open
[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling#49062xinlian12 wants to merge 2 commits intoAzure:mainfrom
xinlian12 wants to merge 2 commits intoAzure:mainfrom
Conversation
db98eb9 to
c42cfe8
Compare
Member
Author
|
@sdkReviewAgent |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a thread-scaling issue in the Cosmos Java SDK where per-client Schedulers.newSingle() usage caused background thread counts to grow linearly with the number of client instances. It introduces shared schedulers in CosmosSchedulers and updates background work to run on those shared schedulers instead of allocating dedicated per-client threads.
Changes:
- Added shared bounded-elastic schedulers to
CosmosSchedulersfor Global Endpoint Manager refresh and per-partition availability checks. - Updated
GlobalEndpointManagerbackground refresh to use the shared scheduler and removed per-instance scheduler disposal. - Updated
GlobalPartitionEndpointManagerForPerPartitionCircuitBreakerto use the shared scheduler and removed per-instance scheduler disposal.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosSchedulers.java |
Adds shared bounded-elastic schedulers for endpoint refresh and partition availability checks. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java |
Switches background location refresh work from per-instance single scheduler to shared bounded-elastic scheduler. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/perPartitionCircuitBreaker/GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java |
Switches staleness check work from per-instance single scheduler to shared bounded-elastic scheduler. |
Member
Author
|
✅ Review complete (47:35) No new comments — existing review coverage is sufficient. Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
…ead scaling Thread count for 'global-ep-mgr' and 'partition-availability-staleness-check' threads was scaling linearly with tenant/client count because both GlobalEndpointManager and GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker created per-instance Schedulers.newSingle() schedulers. Changes: - Add GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC and PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTIC shared schedulers to CosmosSchedulers - GlobalEndpointManager: Replace per-instance scheduler with shared scheduler, track background refresh Disposable via AtomicReference for immediate cleanup on close(). Use getAndSet() to atomically dispose old subscriptions on reschedule. - GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker: Replace per-instance scheduler with shared scheduler, track recovery Disposable via AtomicReference for immediate cleanup on close(). Use compareAndSet on isPartitionRecoveryTaskRunning to prevent duplicate background tasks under concurrent init() calls. Shared BoundedElastic schedulers reuse threads with 60s TTL, preventing thread count from growing with client count while still supporting concurrent background tasks from multiple clients. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c42cfe8 to
02b230e
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PR testing revealed that
global-ep-mgrandpartition-availability-staleness-checkthread counts increase linearly with tenant/client count because bothGlobalEndpointManagerandGlobalPartitionEndpointManagerForPerPartitionCircuitBreakercreate per-instanceSchedulers.newSingle()schedulers.With N clients -> N dedicated threads for each component -> 2N extra threads just for background location refresh and circuit breaker staleness checks.
Solution
Replace per-instance
Schedulers.newSingle()with shared staticBoundedElasticschedulers inCosmosSchedulers, following the existing pattern used forCOSMOS_PARALLEL,TRANSPORT_RESPONSE_BOUNDED_ELASTIC, etc.Changes
CosmosSchedulers.javaGLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTICshared schedulerPARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTICshared schedulerGlobalEndpointManager.javaSchedulers.newSingle(CosmosDaemonThreadFactory)withCosmosSchedulers.GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTICDisposableviaAtomicReferencewithgetAndSet()to atomically clean up old subscriptions on concurrent callsclose()cancels the tracked subscription instead of disposing the schedulerGlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.javaSchedulers.newSingle("partition-availability-staleness-check")withCosmosSchedulers.PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTICDisposableviaAtomicReferencefor consistent cleanup onclose()Design Decisions
close()startRefreshLocationTimerAsync()is called concurrentlyisClosedguards in both classes provide additional protection against post-close workBenchmark Results Thread Scaling Fix Validation
Config: Gateway mode, ReadThroughput, concurrency=64, 10min per run, accounts lx1-lx28 (cycling modulo 28), host-pinned
Branches:
upstream/mainvsxinlian12/fix/shared-schedulers-thread-scaling1. Throughput: main vs fix (H1 ReadThroughput, steady-state)
H2:
2. Thread Scaling (PEAK, H1 ReadThroughput)
3. Thread Pool Breakdown (PEAK, H1 ReadThroughput)
4. Key Findings
global-ep-mgreliminated: 0 per-client threads across all tenant counts (was 1:1 on main)partition-availcapped at ~160: shared BoundedElastic pool reuses threads with 60s TTL (was 1:1 on main)global-endpoint-manager-bounded-elasticcapped at ~160: replacement shared pool