lib: utils/fdt: Cache CPU intc phandle->hartid lookups#9
Merged
Conversation
CLINT/PLIC/PLMT/PLICSW probing walks `interrupts-extended` and for each entry resolves the CPU intc phandle to a hartid via `fdt_node_offset_by_phandle()` + `fdt_parent_offset()` + `fdt_parse_hart_id()`. Both libfdt helpers are O(FDT_size) linear scans of the structure block, so on an N-hart system each driver pays O(N * FDT_size), and the same handful of intc phandles is re-resolved across multiple drivers (PLIC + MSWI + MTIMER + ...). Build a small cache by walking `/cpus` once: for each cpu node, record its child intc node's phandle paired with the parsed hartid. Subsequent lookups become an O(harts) linear scan over the cache instead of two full FDT walks per entry. The cache is keyed on the FDT pointer so a new fdt invalidates it implicitly. Also move the hwirq filter ahead of the phandle resolution at each callsite so non-matching `interrupts-extended` entries skip the lookup entirely. Measured on 8-hart system (release build, mtime @ 25MHz, 1M ticks = 40 ms): sbi_irqchip_init: 15.08M -> 3.64M (~4.15x; 603 -> 146 ms) sbi_ipi_init: 14.75M -> 2.82M (~5.23x; 590 -> 113 ms) sbi_timer_init: 14.86M -> 2.94M (~5.06x; 594 -> 118 ms) combined: 44.69M -> 9.39M (~4.76x; 1788 -> 376 ms) Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CLINT/PLIC/PLMT/PLICSW probing walks
interrupts-extendedand for each entry resolves the CPU intc phandle to a hartid viafdt_node_offset_by_phandle()+fdt_parent_offset()+fdt_parse_hart_id(). Both libfdt helpers are O(FDT_size) linear scans of the structure block, so on an N-hart system each driver pays O(N * FDT_size), and the same handful of intc phandles is re-resolved across multiple drivers (PLIC + MSWI + MTIMER + ...).Build a small cache by walking
/cpusonce: for each cpu node, record its child intc node's phandle paired with the parsed hartid. Subsequent lookups become an O(harts) linear scan over the cache instead of two full FDT walks per entry. The cache is keyed on the FDT pointer so a new fdt invalidates it implicitly.Also move the hwirq filter ahead of the phandle resolution at each callsite so non-matching
interrupts-extendedentries skip the lookup entirely.Measured on 8-hart system (release build, mtime @ 25MHz, 1M ticks = 40 ms):
sbi_irqchip_init: 15.08M -> 3.64M (~4.15x; 603 -> 146 ms)
sbi_ipi_init: 14.75M -> 2.82M (~5.23x; 590 -> 113 ms)
sbi_timer_init: 14.86M -> 2.94M (~5.06x; 594 -> 118 ms)
combined: 44.69M -> 9.39M (~4.76x; 1788 -> 376 ms)