Skip to content

PHOENIX-7799 Coalesce splits by region server to avoid hotspotting from concurrent mappers#2411

Open
rahulLiving wants to merge 50 commits intoapache:masterfrom
rahulLiving:PHOENIX-7751_perf
Open

PHOENIX-7799 Coalesce splits by region server to avoid hotspotting from concurrent mappers#2411
rahulLiving wants to merge 50 commits intoapache:masterfrom
rahulLiving:PHOENIX-7751_perf

Conversation

@rahulLiving
Copy link
Copy Markdown
Contributor

@rahulLiving rahulLiving commented Apr 16, 2026

https://issues.apache.org/jira/browse/PHOENIX-7799

PhoenixSyncTableTool creates one MapReduce InputSplit per HBase region, causing each split to spawn its own mapper task. When multiple regions reside on the same RegionServer, concurrent mappers hit the same server simultaneously, leading to hotspotting and resource contention (noisy neighbor problem). This degrades performance and can cause timeouts.

Implement locality-aware split coalescing that groups all InputSplits from the same RegionServer into a single coalesced split. This ensures only one mapper processes regions from each server sequentially, eliminating concurrent requests and hotspotting. The feature will be controlled by the configuration property phoenix.sync.table.split.coalescing (default: false). For a table with 100 regions across 5 RegionServers, this reduces mapper count from 100 to 5, eliminating server-side contention while maintaining data locality.

Copy link
Copy Markdown
Contributor

@haridsv haridsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the mapper doesn't report meaningful progress during execution, it would effectively shows 0% until completion, then jump to 100%, isn't it? With coalesced splits the issue sort of gets worse, as each mapper runs for much longer. Before, there is at least some visibility into the progress based on the %ge of mappers completed. It would be helpful to include progress reporting as each range is completed. One idea is to make PhoenixNoOpSingleRecordReader more sophisticated to return one dummy record for each range and have getProgress return a ratio based on how many ranges completed. You would of course rename it to not imply "SingleRecord".

if (regionLocation == null) {
throw new IOException("Could not determine region location for key: "
+ Bytes.toStringBinary(keyRange.getLowerRange()));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that this should never see a null if all regions are fully online and client cache is up to date. However (per what AI says), if the region the key maps to is currently offline due to a RIT event (say, in the middle of a split), then the return value will be null, so shouldn't we retry to ride over such RITs, just like how hbase-client does during data path such as scans? Otherwise, there is a high chance of hitting a RIT and treating it as an error, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants