Skip to content

cuda.core: graph slot table for node attachment lifetimes (draft)#2280

Draft
Andy-Jost wants to merge 2 commits into
NVIDIA:mainfrom
Andy-Jost:ajost/graph-slots
Draft

cuda.core: graph slot table for node attachment lifetimes (draft)#2280
Andy-Jost wants to merge 2 commits into
NVIDIA:mainfrom
Andy-Jost:ajost/graph-slots

Conversation

@Andy-Jost

Copy link
Copy Markdown
Contributor

Summary

Draft PR for self-review of the graph slot-table work that was held back while #2008 and #2026 landed.

This branch is rebased onto current main and currently includes the GraphBuilder.graph_definition commit from #2026 as well (intentional duplication for review). After #2026 merges, I plan to rebase again to drop that overlap before marking this ready for review.

The slot table uses CUDA graph user objects to retain resources attached to graph nodes through instantiation and launch:

  • Kernel launch arguments and kernel handles
  • Event record/wait nodes
  • Host callbacks (capture and explicit construction paths)
  • Memcpy/memset operands, including Buffer inputs with optional dst_owner / src_owner overrides

Changes

  • Add slot-table infrastructure in resource_handles (graph_set_slot, make_opaque_py, make_opaque_malloc).
  • Wire graph node creation paths in _graph_node.pyx to populate slots.
  • Split host-callback attachment into _host_callback.pyx (replacing _utils.pyx).
  • Add _weak_handles.pyx test helper for observing DevicePtrHandle lifetimes.
  • Expand test_graph_definition_lifetime.py with retention and capture-path coverage.

Test plan

  • Local pip install -v . in cuda_core
  • pytest cuda_core/tests/graph/test_graph_definition_lifetime.py -v
  • Full pytest cuda_core/tests/graph/ -v on a GPU machine

Related

Andy-Jost and others added 2 commits June 29, 2026 16:15
Completes step 3 of NVIDIA#1330 by exposing the captured graph as an explicit
`GraphDefinition` view that shares ownership of the underlying `CUgraph`.
The handle-layer plumbing landed in PR NVIDIA#2008; this commit wires up the
user-facing surface and locks in the state-guard rules.

State semantics:

- PRIMARY builder: only valid after `end_building()`. Before
  `begin_building()` no graph exists; during capture the driver is the
  sole writer, so explicit access is unsafe.
- CONDITIONAL_BODY builder: valid both before `begin_building()` (the
  body graph is allocated at conditional-node creation time) and after
  `end_building()`. This enables a hybrid flow where a conditional body
  is populated entirely via the explicit API, with no capture at all.
- FORKED builder: never valid. Forked builders share the primary's
  graph; access through the primary instead.

Tests cover the happy path, both hybrid flows on conditional bodies
(populate-via-explicit-API and capture-then-augment), the three error
states (forked, capturing, primary pre-capture), and the
shared-ownership guarantee (the `GraphDefinition` survives the
builder's `close()`).

Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce a CUgraph user-object slot table that retains kernel arguments,
events, host callbacks, and memcpy/memset operands through graph
instantiation and launch. Accept Buffer operands with explicit
dst_owner/src_owner overrides, split host-callback wiring into
_host_callback, and add lifetime regression tests.
@Andy-Jost Andy-Jost added this to the cuda.core next milestone Jun 30, 2026
@Andy-Jost Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Jun 30, 2026
@Andy-Jost Andy-Jost self-assigned this Jun 30, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Andy-Jost

Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant