Skip to content

Drain in-flight graphics tasks before NativeEngine teardown#1746

Open
bkaradzic-microsoft wants to merge 2 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:fix/native-engine-async-teardown-drain
Open

Drain in-flight graphics tasks before NativeEngine teardown#1746
bkaradzic-microsoft wants to merge 2 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:fix/native-engine-async-teardown-drain

Conversation

@bkaradzic-microsoft

Copy link
Copy Markdown
Contributor

Problem

NativeEngine::Dispose only called m_cancellationSource->cancel(). arcana cancellation short-circuits work at scheduling boundaries, but a task body that is already running on a threadpool thread keeps going to completion.

The graphics-loading entry points (loadProgram, loadTexture, loadCubeTexture) schedule their work as threadpool tasks that capture a raw Graphics::Texture* / call into bgfx. On app teardown, if the suite finishes while one of these bodies is mid-flight, Dispose cancels and teardown proceeds to free the texture / tear down the bgfx context — and the still-running task then touches freed resources, causing an access violation:

Access violation. Exception Code c0000005
  2: Plugins/NativeEngine/Source/NativeEngine.cpp  LoadTextureFromImage
  3: arcana .../internal_task.h  output_wrapper<...>::invoke<...LoadTexture...lambda...>
  ...
  6: cppwinrt .../Windows.System.Threading.h  WorkItemHandler::Invoke

This reproduces at shutdown, after the test run reports it finished.

Fix

Add a small AsyncTaskTracker (mutex + condition_variable + counter) whose RAII Scope token is captured into the four graphics-touching threadpool lambdas (the loadProgram body, the loadTexture body, and the two LoadCubeTextureFromImages inline continuations). The CPU-only image-parse bodies are intentionally not tracked.

Dispose now cancels and then waits for those tokens to drain before returning, so teardown can't free graphics resources out from under a running task. The token decrements on the threadpool thread (never on a JS-thread .then continuation), so Dispose — which runs on the JS thread — can never deadlock waiting on its own continuations. Counting is bound to the token's lifetime, so the increment/decrement pair can't be split by an exception.

A defensive if (cancellationSource->cancelled()) return; early-out is also added at the top of the two threadpool load bodies, so once teardown has begun they skip touching graphics resources entirely.

Testing

  • NativeEngine and Playground build clean (RelWithDebInfo, x64, D3D11).
  • Ran the previously-crashing Sprites test (loadTexture path) and a cubemap test (LoadCubeTextureFromImages path) headless — both exercise the load paths and tear down promptly with no hang and no crash.

NativeEngine::Dispose only cancelled the cancellation source, which
short-circuits task scheduling but does not stop a threadpool task
already executing its body. On teardown an in-flight loadTexture,
loadProgram, or loadCubeTexture body could touch a texture or bgfx
context that had already been freed, causing an access violation.

Add an AsyncTaskTracker whose RAII Scope token is captured into the four
graphics-touching threadpool lambdas. Dispose now cancels and then waits
for those tokens to drain before returning, so teardown can't free
resources out from under a running task. The token decrements on the
threadpool thread, so Dispose (JS thread) never deadlocks on its own
continuations. Also add a cancellation early-out to the load bodies to
skip work once teardown has begun.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 10, 2026 15:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens NativeEngine shutdown by ensuring any in-flight threadpool work that touches bgfx/graphics resources has completed before teardown proceeds, preventing use-after-free crashes during app/test shutdown.

Changes:

  • Added an AsyncTaskTracker (counter + condition_variable) and TrackAsyncTask() RAII token to track graphics-touching async work.
  • Updated NativeEngine::Dispose() to cancel work and then block until tracked tasks drain.
  • Captured tracking tokens into loadProgram, loadTexture, and cube-texture inline continuations; added cancellation early-outs for the two threadpool bodies.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
Plugins/NativeEngine/Source/NativeEngine.h Introduces AsyncTaskTracker + TrackAsyncTask() and stores the tracker on NativeEngine.
Plugins/NativeEngine/Source/NativeEngine.cpp Implements tracker logic, waits during Dispose(), and wires tracking/cancellation checks into async graphics load paths.

Adds a deterministic UnitTests regression test (Tests.NativeEngine.Teardown.cpp)
for the loadTexture async-teardown race. NativeEngine gains a test-only,
BABYLON_NATIVE_BUILD_APPS-gated JS method (_disposeDrainTestSchedule) that
schedules a tracked threadpool task -- the same TrackAsyncTask mechanism the
async texture/shader loaders use -- which signals start, sleeps, then signals
finish without touching any graphics resources.

The test lets that task get in flight, disposes the engine, and asserts the
task finished by the time Dispose returned. With the drain in place Dispose
blocks until the task completes (pass); without it Dispose returns early
(fail). Because the task touches nothing, the test is deterministic and never
relies on undefined behaviour.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants