[Audit][High] ECS query iterator has data race and staleness bug in next()

## 🔍 Module Scanned\n (automated audit scan)\n\n## 📝 Summary\nThe ECS query iterator's  function has two critical concurrency and correctness bugs: (1) the epoch validation check at line 90 is not protected by any lock, creating a data race with concurrent structural modifications, and (2) when the epoch changes,  returns  without resetting , causing the query to become permanently stale and skip all remaining entities.\n\n## 📍 Location\n- **File:** \n- **Function/Scope:**  function within \n\n## 🔴 Severity: High\n- **Critical:** Crashes, data corruption, security vulnerabilities, GPU device loss\n- **High:** Memory leaks, race conditions, incorrect rendering, broken features\n- **Medium:** Performance degradation, missing error handling, suboptimal patterns\n- **Low:** Code style, dead code, minor improvements\n\n## 💥 Impact\nWhen multiple threads use the ECS registry concurrently (e.g., physics system updating on one thread while render system queries on another), the query iterator can return stale or null results even when valid entities exist. Specifically:\n\n1. **Data Race**: The  check at line 90 reads  without synchronization, while other threads may be writing to it via , , or . This is a plain data race in Zig (concurrent read and write to same memory).\n\n2. **Permanent Staleness**: When the epoch check fails,  returns  but does NOT reset . On the next call to , the same check fails again (epoch still changed), and the query is permanently stuck returning  — skipping all remaining entities. This breaks the contract that a query should either return valid rows or indicate completion.\n\n## 🔎 Evidence\n\n\nThe epoch check at line 90 is a data race because:\n-  is read without any lock or atomic operation\n-  is written in  (line 40),  (line 48), and  (line 56)\n- These writes can happen from any thread using the registry\n\nThe staleness bug occurs because when , the function returns  immediately without resetting . If  is called again (e.g., after the modifying thread completes), the same epoch check fails again, and the query is stuck.\n\n## 🛠️ Proposed Fix\n\n1. **Fix the staleness bug** by resetting  when the epoch changes so the query can be re-iterated properly:\n\n\n2. **Address the data race** by either:\n   a. Documenting that queries must be completed before any structural modifications (single-threaded iteration), or\n   b. Using an atomic load for the epoch check, or\n   c. Having the caller explicitly check the epoch before/after iteration\n\nFor the data race, the cleanest fix is to make  an  and use  for the check. Alternatively, document that queries are not thread-safe and must be completed before concurrent // calls.\n\n## ✅ Acceptance Criteria\n- [ ] The  function correctly resets  when epoch changes, allowing re-iteration\n- [ ] No data race exists between  reads in  and writes in //\n- [ ] Existing ECS tests in  pass\n- [ ] A test verifying query behavior after structural modification is added or existing tests cover this case\n\n## 📚 References\n- Zig language reference on data races: two or more concurrent accesses to the same memory location, at least one being a write\n- Related existing issue: None found covering ECS query staleness\n- Component storage uses  which is also not thread-safe for concurrent writes + reads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Audit][High] ECS query iterator has data race and staleness bug in next() #732

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Audit][High] ECS query iterator has data race and staleness bug in next() #732

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions