cuda.core: graph slot table for node attachment lifetimes (draft)#2280
Draft
Andy-Jost wants to merge 2 commits into
Draft
cuda.core: graph slot table for node attachment lifetimes (draft)#2280Andy-Jost wants to merge 2 commits into
Andy-Jost wants to merge 2 commits into
Conversation
Completes step 3 of NVIDIA#1330 by exposing the captured graph as an explicit `GraphDefinition` view that shares ownership of the underlying `CUgraph`. The handle-layer plumbing landed in PR NVIDIA#2008; this commit wires up the user-facing surface and locks in the state-guard rules. State semantics: - PRIMARY builder: only valid after `end_building()`. Before `begin_building()` no graph exists; during capture the driver is the sole writer, so explicit access is unsafe. - CONDITIONAL_BODY builder: valid both before `begin_building()` (the body graph is allocated at conditional-node creation time) and after `end_building()`. This enables a hybrid flow where a conditional body is populated entirely via the explicit API, with no capture at all. - FORKED builder: never valid. Forked builders share the primary's graph; access through the primary instead. Tests cover the happy path, both hybrid flows on conditional bodies (populate-via-explicit-API and capture-then-augment), the three error states (forked, capturing, primary pre-capture), and the shared-ownership guarantee (the `GraphDefinition` survives the builder's `close()`). Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce a CUgraph user-object slot table that retains kernel arguments, events, host callbacks, and memcpy/memset operands through graph instantiation and launch. Accept Buffer operands with explicit dst_owner/src_owner overrides, split host-callback wiring into _host_callback, and add lifetime regression tests.
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft PR for self-review of the graph slot-table work that was held back while #2008 and #2026 landed.
This branch is rebased onto current
mainand currently includes theGraphBuilder.graph_definitioncommit from #2026 as well (intentional duplication for review). After #2026 merges, I plan to rebase again to drop that overlap before marking this ready for review.The slot table uses CUDA graph user objects to retain resources attached to graph nodes through instantiation and launch:
Bufferinputs with optionaldst_owner/src_owneroverridesChanges
resource_handles(graph_set_slot,make_opaque_py,make_opaque_malloc)._graph_node.pyxto populate slots._host_callback.pyx(replacing_utils.pyx)._weak_handles.pyxtest helper for observingDevicePtrHandlelifetimes.test_graph_definition_lifetime.pywith retention and capture-path coverage.Test plan
pip install -v .incuda_corepytest cuda_core/tests/graph/test_graph_definition_lifetime.py -vpytest cuda_core/tests/graph/ -von a GPU machineRelated
graph_definition).