Use shared_ptr for NVML event_set resource management by mdboom · Pull Request #2240 · NVIDIA/cuda-python

mdboom · 2026-06-22T14:20:44Z

This is a proof-of-concept of the proposed instructions in #2234. I asked Claude to use those instructions to port the management of NVML event sets to use shared_ptr. I didn't edit its output in any way. We should review this and point out any mistakes, and then use that to improve the instructions in #2234.

copy-pr-bot · 2026-06-22T14:20:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-06-22T15:04:49Z

/ok to test

mdboom · 2026-06-22T17:13:49Z

/ok to test

github-actions · 2026-06-22T17:34:18Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-2240/
https://nvidia.github.io/cuda-python/pr-preview/pr-2240/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-2240/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-2240/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

mdboom · 2026-06-23T15:05:34Z

/ok to test

mdboom · 2026-06-23T15:43:13Z

/ok to test

Andy-Jost · 2026-06-23T20:26:37Z

+};
+}  // namespace
+
+NvmlEventSetHandle create_nvml_event_set_handle(intptr_t handle) {


An open design question: some create_* functions take an already-created handle (as this one does), meaning allocation is done in Cython. Others take construction arguments and create the handle inside.

I have not been able to settle on one approach, but it seems like something we could standardize.

Here, if p_nvmlEventSetFree is NULL, or new fails, the handle would be leaked unless the caller guards it (granted, those seem like remote possibilities). Construction parameters would resolve that and also enable possible caching or recycling.

On the other hand, accepting construction parameters makes a fatter interface, with potentially many functions that just wrap existing CUDA APIs,

I don't have a coherent analysis, just a few disconnected thoughts like this.

Andy-Jost · 2026-06-23T20:42:12Z

+        new NvmlEventSetBox{{handle}},
+        [](NvmlEventSetBox* b) {
+            if (p_nvmlEventSetFree && b->resource.raw) {
+                p_nvmlEventSetFree(reinterpret_cast<void*>(b->resource.raw));


Missing GILReleaseGuard here.

Andy-Jost · 2026-06-23T20:43:29Z

+                NvmlSysEventSetFreeRequest req;
+                req.set = reinterpret_cast<void*>(b->resource.raw);
+                req.version = (unsigned int)(sizeof(NvmlSysEventSetFreeRequest) | (1u << 24u));
+                p_nvmlSysEventSetFree(&req);


ditto regarding GILReleaseGuard

Andy-Jost · 2026-06-23T20:59:23Z

+NvmlEventSetFreeFn p_nvmlEventSetFree = nullptr;
+NvmlSysEventSetFreeFn p_nvmlSysEventSetFree = nullptr;
+
+void register_nvml_event_set_fn_pointers(intptr_t event_set_free_fn,
+                                         intptr_t sys_event_set_free_fn) noexcept {
+    p_nvmlEventSetFree = reinterpret_cast<NvmlEventSetFreeFn>(event_set_free_fn);
+    p_nvmlSysEventSetFree = reinterpret_cast<NvmlSysEventSetFreeFn>(sys_event_set_free_fn);
+}


Can/should the registration of these NVML symbols be made to match the driver pattern, where p_* symbols are resolved via cuda.bindings in _resource_handles.pyx? The NVML initialization adds a wrinkle so it's not clear to me what's best.

Andy-Jost · 2026-06-23T21:08:49Z

+inline std::intptr_t as_intptr(const NvmlEventSetHandle& h) noexcept {
+    return h ? h->raw : 0;
+}
+
+inline std::intptr_t as_intptr(const NvmlSysEventSetHandle& h) noexcept {
+    return h ? h->raw : 0;
+}
+


I was expecting as_cu, as_intptr, and as_py to always come as a set. The NVML bindings don't appear to follow that pattern. When all three definitions collapse to the same thing, I wonder if it's better to still define each of the accessor functions. That's a genuine question: I don't know the answer.

Andy-Jost · 2026-06-23T21:11:22Z

+        # If device_register_events raises, create_nvml_event_set_handle already
+        # owns the handle and its shared_ptr deleter will free it.
+        self._h_event_set = create_nvml_event_set_handle(raw_set)
+        nvml.device_register_events(self._device_handle, event_bitmask, raw_set)


This is where I would expect to see as_py(self._h_event_set) in place of raw_set. Not because it changes anything functionally but because it promotes a consistent rule: "to dereference a handle for passing to a Python function, use as_py."

Andy-Jost · 2026-06-23T21:20:36Z

            If the GPU has fallen off the bus or is otherwise inaccessible.
        """
-        return EventData(nvml.event_set_wait_v2(self._event_set, timeout_ms))
+        return EventData(nvml.event_set_wait_v2(as_intptr(self._h_event_set), timeout_ms))


Similar to the above comment. My mental model:

for hash, printf, repr, etc. --> as_intptr

for passing to Python or returning to a user --> as_py

for passing to Cython --> as_cu

Maybe it makes sense to do something different for NVML; or, is it better to have consistency at every call site?

github-actions Bot added the cuda.core Everything related to the cuda.core module label Jun 22, 2026

mdboom mentioned this pull request Jun 22, 2026

cuda.core: document the canonical resource-lifetime pattern in AGENTS.md #2234

Open

mdboom requested a review from Andy-Jost June 22, 2026 15:04

mdboom force-pushed the use-shared-ptr-for-event-set branch from b5904e5 to 80b0d60 Compare June 22, 2026 16:47

Use shared_ptr for event sets

a41bf7c

mdboom force-pushed the use-shared-ptr-for-event-set branch from 80b0d60 to a41bf7c Compare June 23, 2026 13:36

Merge branch 'main' into use-shared-ptr-for-event-set

342d011

Andy-Jost reviewed Jun 23, 2026

View reviewed changes

Conversation

mdboom commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Jun 22, 2026

Uh oh!

mdboom commented Jun 22, 2026

Uh oh!

mdboom commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mdboom commented Jun 23, 2026

Uh oh!

mdboom commented Jun 23, 2026

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mdboom commented Jun 22, 2026 •

edited

Loading