Skip to content

Refactor(a2a3): decouple profiling from runtime, own it in platform#714

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:a2a3/aicore-profiling-state
May 8, 2026
Merged

Refactor(a2a3): decouple profiling from runtime, own it in platform#714
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:a2a3/aicore-profiling-state

Conversation

@ChaoZheng109
Copy link
Copy Markdown
Collaborator

@ChaoZheng109 ChaoZheng109 commented May 7, 2026

Summary

Profiling becomes a platform-layer concern instead of part of the runtime/AICore handshake contract. enable_profiling_flag and l2_perf_aicore_ring_addr move out of the runtime's Handshake struct and into KernelArgs plus a platform-owned per-core state surface (set_aicore_profiling_flag / set_aicore_l2_perf_ring, with matching getters). Mirrors the AICPU-side set_l2_swimlane_enabled / set_pmu_enabled pattern.

After this change:

  • The runtime/AICore handshake carries only synchronization + identity fields. Adding a new profiling sub-feature no longer touches Handshake or aicore_execute's signature.
  • Profiling lifetime is fully owned by the platform: the AICore kernel entry indexes its per-core L2PerfAicoreRing* from KernelArgs::aicore_ring_addr and publishes it via the setters; AICore code reads via the getters; the runtime never sees the storage.

Key changes

  • New header: src/a2a3/platform/include/aicore/aicore_profiling_state.h — set/get for the per-core profiling flag and L2Perf ring pointer
  • Onboard backing: [[block_local]] statics in onboard/aicore/kernel.cpp; setters/getters use weak linkage to dedup across the AIC + AIV compilation units linked into one AICore binary
  • Sim backing: pthread TLS in sim/aicore/kernel.cpp; the sim launch ABI gains enable_profiling_flag + aicore_ring_addr so the wrapper can populate slots before aicore_execute
  • KernelArgs gains aicore_ring_addr (device ptr to a uint64_t[num_aicore] table of per-core L2PerfAicoreRing*) + enable_profiling_flag; the bit-layout doc moves here from runtime.h
  • Host L2PerfCollector publishes the per-core ring address table that KernelArgs::aicore_ring_addr points at
  • Runtime Handshake shrinks in both host_build_graph/runtime/runtime.h and tensormap_and_ringbuffer/runtime/runtime.hl2_perf_aicore_ring_addr and enable_profiling_flag removed

Merge order

This PR is the third in a chain. Please merge in this order:

  1. Refactor: unify PMU/L2Perf/TensorDump collectors on shared profiling framework #705 — Refactor: unify PMU/L2Perf/TensorDump collectors on shared profiling framework
  2. Refactor: decouple AICore L2Perf writes via stable per-core staging ring #709 — Refactor: decouple AICore L2Perf writes via stable per-core staging ring
  3. This PR (Refactor(a2a3): decouple profiling from runtime, own it in platform #714) — Refactor(a2a3): decouple profiling from runtime, own it in platform

Currently this PR is opened against main. After #705 and #709 land, I will rebase this branch onto the new main so the diff reflects only the runtime → platform decoupling.

Testing

  • Simulation tests pass
  • Hardware tests pass

@ChaoZheng109 ChaoZheng109 force-pushed the a2a3/aicore-profiling-state branch from 4764678 to 13a6e50 Compare May 7, 2026 01:47
@ChaoZheng109 ChaoZheng109 changed the title Refactor: route AICore profiling state through KernelArgs, drop from Handshake Refactor(a2a3): decouple profiling from runtime, own it in platform May 7, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a unified host-side profiling framework and integrates it with L2 Swimlane, PMU, and Tensor Dump subsystems across multiple architectures. It introduces stable per-core staging rings to decouple AICore timing writes from AICPU buffer management and consolidates profiling enablement signals within KernelArgs. Feedback identifies a portability issue in the logging logic of the L2 and PMU collectors, where casting 64-bit addresses to unsigned long may lead to truncation on certain platforms; it is recommended to use the %p format specifier or PRIx64 macro instead.

Comment thread src/a2a3/platform/src/host/l2_perf_collector.cpp
Comment thread src/a2a3/platform/src/host/pmu_collector.cpp
Profiling becomes a platform-layer concern instead of part of the
runtime/AICore handshake contract. `enable_profiling_flag` and
`l2_perf_aicore_ring_addr` move out of the runtime's `Handshake`
struct and into `KernelArgs` + a platform-owned per-core state
surface (`set_aicore_profiling_flag` / `set_aicore_l2_perf_ring`
with matching getters), mirroring the AICPU-side
`set_l2_swimlane_enabled` / `set_pmu_enabled` pattern.

Effect:
- The runtime/AICore handshake carries only synchronization +
  identity fields. Adding a new profiling sub-feature no longer
  touches `Handshake` or `aicore_execute`'s signature.
- Profiling lifetime is fully owned by the platform: AICore kernel
  entry indexes its per-core `L2PerfAicoreRing*` from
  `KernelArgs::aicore_ring_addr` and publishes it via the setters;
  AICore code reads via the getters; runtime never sees the
  storage.

- Add `aicore/aicore_profiling_state.h` (set/get for flag + ring)
- Onboard backing: `[[block_local]]` statics in onboard
  aicore/kernel.cpp (weak symbols dedup across AIC/AIV)
- Sim backing: pthread TLS in sim aicore/kernel.cpp; sim launch
  ABI extended with `enable_profiling_flag` + `aicore_ring_addr`
- KernelArgs gains `aicore_ring_addr` + `enable_profiling_flag`;
  bit layout doc moves here from runtime.h
- Host `L2PerfCollector` publishes the per-core ring table that
  KernelArgs forwards
- Runtime `Handshake` shrinks to just sync + identity fields in
  both runtimes
@ChaoZheng109 ChaoZheng109 force-pushed the a2a3/aicore-profiling-state branch from 13a6e50 to 118b44b Compare May 7, 2026 03:54
@ChaoWao ChaoWao merged commit 8a723f7 into hw-native-sys:main May 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants