Skip to content

Use shared_ptr for NVML event_set resource management#2240

Draft
mdboom wants to merge 2 commits into
NVIDIA:mainfrom
mdboom:use-shared-ptr-for-event-set
Draft

Use shared_ptr for NVML event_set resource management#2240
mdboom wants to merge 2 commits into
NVIDIA:mainfrom
mdboom:use-shared-ptr-for-event-set

Conversation

@mdboom

@mdboom mdboom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

This is a proof-of-concept of the proposed instructions in #2234. I asked Claude to use those instructions to port the management of NVML event sets to use shared_ptr. I didn't edit its output in any way. We should review this and point out any mistakes, and then use that to improve the instructions in #2234.

@copy-pr-bot

copy-pr-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Jun 22, 2026
@mdboom

mdboom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom mdboom requested a review from Andy-Jost June 22, 2026 15:04
@mdboom mdboom force-pushed the use-shared-ptr-for-event-set branch from b5904e5 to 80b0d60 Compare June 22, 2026 16:47
@mdboom

mdboom commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@github-actions

Copy link
Copy Markdown

@mdboom mdboom force-pushed the use-shared-ptr-for-event-set branch from 80b0d60 to a41bf7c Compare June 23, 2026 13:36
@mdboom

mdboom commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

@mdboom

mdboom commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test

};
} // namespace

NvmlEventSetHandle create_nvml_event_set_handle(intptr_t handle) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open design question: some create_* functions take an already-created handle (as this one does), meaning allocation is done in Cython. Others take construction arguments and create the handle inside.

I have not been able to settle on one approach, but it seems like something we could standardize.

Here, if p_nvmlEventSetFree is NULL, or new fails, the handle would be leaked unless the caller guards it (granted, those seem like remote possibilities). Construction parameters would resolve that and also enable possible caching or recycling.

On the other hand, accepting construction parameters makes a fatter interface, with potentially many functions that just wrap existing CUDA APIs,

I don't have a coherent analysis, just a few disconnected thoughts like this.

new NvmlEventSetBox{{handle}},
[](NvmlEventSetBox* b) {
if (p_nvmlEventSetFree && b->resource.raw) {
p_nvmlEventSetFree(reinterpret_cast<void*>(b->resource.raw));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing GILReleaseGuard here.

NvmlSysEventSetFreeRequest req;
req.set = reinterpret_cast<void*>(b->resource.raw);
req.version = (unsigned int)(sizeof(NvmlSysEventSetFreeRequest) | (1u << 24u));
p_nvmlSysEventSetFree(&req);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto regarding GILReleaseGuard

Comment on lines +1344 to +1351
NvmlEventSetFreeFn p_nvmlEventSetFree = nullptr;
NvmlSysEventSetFreeFn p_nvmlSysEventSetFree = nullptr;

void register_nvml_event_set_fn_pointers(intptr_t event_set_free_fn,
intptr_t sys_event_set_free_fn) noexcept {
p_nvmlEventSetFree = reinterpret_cast<NvmlEventSetFreeFn>(event_set_free_fn);
p_nvmlSysEventSetFree = reinterpret_cast<NvmlSysEventSetFreeFn>(sys_event_set_free_fn);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should the registration of these NVML symbols be made to match the driver pattern, where p_* symbols are resolved via cuda.bindings in _resource_handles.pyx? The NVML initialization adds a wrinkle so it's not clear to me what's best.

Comment on lines +718 to +725
inline std::intptr_t as_intptr(const NvmlEventSetHandle& h) noexcept {
return h ? h->raw : 0;
}

inline std::intptr_t as_intptr(const NvmlSysEventSetHandle& h) noexcept {
return h ? h->raw : 0;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting as_cu, as_intptr, and as_py to always come as a set. The NVML bindings don't appear to follow that pattern. When all three definitions collapse to the same thing, I wonder if it's better to still define each of the accessor functions. That's a genuine question: I don't know the answer.

# If device_register_events raises, create_nvml_event_set_handle already
# owns the handle and its shared_ptr deleter will free it.
self._h_event_set = create_nvml_event_set_handle(raw_set)
nvml.device_register_events(self._device_handle, event_bitmask, raw_set)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I would expect to see as_py(self._h_event_set) in place of raw_set. Not because it changes anything functionally but because it promotes a consistent rule: "to dereference a handle for passing to a Python function, use as_py."

If the GPU has fallen off the bus or is otherwise inaccessible.
"""
return EventData(nvml.event_set_wait_v2(self._event_set, timeout_ms))
return EventData(nvml.event_set_wait_v2(as_intptr(self._h_event_set), timeout_ms))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above comment. My mental model:

  • for hash, printf, repr, etc. --> as_intptr
  • for passing to Python or returning to a user --> as_py
  • for passing to Cython --> as_cu

Maybe it makes sense to do something different for NVML; or, is it better to have consistency at every call site?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants