Summary
k_sem_give() is marked K_ISR_SAFE (i.e. IRAM_ATTR) so it can be called
from ESP-IDF interrupts that may run while the flash cache is disabled. However,
on its hot path it calls the static helper z_sem_pop_waiter()
(components/zkernel/src/k_sem.c), which is not K_ISR_SAFE and therefore
lands in flash-mapped (cached) text. When k_sem_give() is invoked from an IRAM
ISR during a window where the cache is disabled (e.g. a concurrent SPI
flash/NVS write or erase), fetching z_sem_pop_waiter() faults with a cache
access error and the chip panics.
Why this matters
The inline comment on k_sem_give() documents the exact contract this breaks:
it is K_ISR_SAFE so that an interrupt allocated with ESP_INTR_FLAG_IRAM can
give a semaphore safely while the cache is off. A very common path that hits
this is an IRAM-registered GPIO interrupt whose handler calls k_work_submit()
-- work submission wakes the target work queue via k_sem_give(). The boreas
GPIO driver installs its ISR service with ESP_INTR_FLAG_IRAM
(components/zdevice/src/gpio_dt.c), so any such handler that submits work and
happens to fire during a flash operation will fault.
Observed
Panic Guru Meditation Error: ... Cache error / Cache access error. Symbolized
backtrace (innermost first):
z_sem_pop_waiter k_sem.c <-- faulted here (cache error)
k_sem_give k_sem.c
k_work_submit_internal k_work.c
k_work_submit_to_queue k_work.c
k_work_submit k_work.c
<IRAM GPIO ISR handler> -> k_work_submit
gpio_esp32_isr gpio_dt.c
<esp_driver_gpio ISR dispatch>
MEPC resolves into the flash-mapped region (0x4200_0000+) at
z_sem_pop_waiter; every other frame on the stack is in IRAM (0x4080_0000+).
The fault is intermittent by nature -- it requires the interrupt to land inside
a cache-disabled flash window.
Root cause
z_sem_pop_waiter() (the wake-target popper called on k_sem_give()'s hot
path, and also by k_sem_reset()) lacks K_ISR_SAFE, so it is flash-resident
while its ISR-safe caller is in IRAM. An IRAM function must only call
IRAM-resident code when the cache may be disabled; this one link in the
k_sem_give call graph violates that.
Proposed fix
Mark z_sem_pop_waiter() K_ISR_SAFE. The function is pure list-walking over
the caller-owned waiter list with no FreeRTOS calls, so it is safe to place in
IRAM. Its other caller, k_sem_reset(), is flash-resident, but flash code
calling an IRAM function is fine.
static K_ISR_SAFE struct z_sem_waiter *z_sem_pop_waiter(struct k_sem *sem)
Verified on an esp32c5 target: the symbol relocates from the flash-mapped
region into IRAM, the cache-error panic no longer reproduces, the host test
suite is unaffected, and clang-format stays clean. IRAM cost is a single
small list-walk function (negligible).
Suggested follow-up
Audit the rest of the ISR-reachable call graph from K_ISR_SAFE entry points
(k_sem_give, k_sem_take with K_NO_WAIT, the k_work_submit chain) for any
other static helpers that are flash-resident. The same class of bug -- an
IRAM_ATTR function calling a non-IRAM_ATTR helper -- would be latent
anywhere a helper was factored out without carrying the attribute.
Summary
k_sem_give()is markedK_ISR_SAFE(i.e.IRAM_ATTR) so it can be calledfrom ESP-IDF interrupts that may run while the flash cache is disabled. However,
on its hot path it calls the static helper
z_sem_pop_waiter()(
components/zkernel/src/k_sem.c), which is notK_ISR_SAFEand thereforelands in flash-mapped (cached) text. When
k_sem_give()is invoked from an IRAMISR during a window where the cache is disabled (e.g. a concurrent SPI
flash/NVS write or erase), fetching
z_sem_pop_waiter()faults with a cacheaccess error and the chip panics.
Why this matters
The inline comment on
k_sem_give()documents the exact contract this breaks:it is
K_ISR_SAFEso that an interrupt allocated withESP_INTR_FLAG_IRAMcangive a semaphore safely while the cache is off. A very common path that hits
this is an IRAM-registered GPIO interrupt whose handler calls
k_work_submit()-- work submission wakes the target work queue via
k_sem_give(). The boreasGPIO driver installs its ISR service with
ESP_INTR_FLAG_IRAM(
components/zdevice/src/gpio_dt.c), so any such handler that submits work andhappens to fire during a flash operation will fault.
Observed
Panic
Guru Meditation Error: ... Cache error / Cache access error. Symbolizedbacktrace (innermost first):
MEPCresolves into the flash-mapped region (0x4200_0000+) atz_sem_pop_waiter; every other frame on the stack is in IRAM (0x4080_0000+).The fault is intermittent by nature -- it requires the interrupt to land inside
a cache-disabled flash window.
Root cause
z_sem_pop_waiter()(the wake-target popper called onk_sem_give()'s hotpath, and also by
k_sem_reset()) lacksK_ISR_SAFE, so it is flash-residentwhile its ISR-safe caller is in IRAM. An IRAM function must only call
IRAM-resident code when the cache may be disabled; this one link in the
k_sem_givecall graph violates that.Proposed fix
Mark
z_sem_pop_waiter()K_ISR_SAFE. The function is pure list-walking overthe caller-owned waiter list with no FreeRTOS calls, so it is safe to place in
IRAM. Its other caller,
k_sem_reset(), is flash-resident, but flash codecalling an IRAM function is fine.
Verified on an esp32c5 target: the symbol relocates from the flash-mapped
region into IRAM, the cache-error panic no longer reproduces, the host test
suite is unaffected, and
clang-formatstays clean. IRAM cost is a singlesmall list-walk function (negligible).
Suggested follow-up
Audit the rest of the ISR-reachable call graph from
K_ISR_SAFEentry points(
k_sem_give,k_sem_takewithK_NO_WAIT, thek_work_submitchain) for anyother static helpers that are flash-resident. The same class of bug -- an
IRAM_ATTRfunction calling a non-IRAM_ATTRhelper -- would be latentanywhere a helper was factored out without carrying the attribute.