Fix/system monitor#875
Open
jaagut wants to merge 3 commits into
Open
Conversation
Enhance GPU monitoring by integrating NVIDIA and AMD detection, updating collection methods, and adding support for nvidia-ml-py package
b051716 to
33366a8
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Improves the ROS2 system_monitor package to collect and publish more robust system workload metrics (notably GPU stats) across different hardware backends, and ensures the required Workload message is generated in bitbots_msgs.
Changes:
- Add
Workload.msgtobitbots_msgsinterface generation. - Refactor GPU monitoring to auto-detect NVIDIA (NVML), Jetson (sysfs), and AMD (pyamdgpuinfo) backends; tighten type consistency in collectors.
- Adjust sampling behavior (CPU smoothing + lower default update frequency) and add
nvidia-ml-pyto the Pixi environment.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/bitbots_msgs/CMakeLists.txt | Adds Workload.msg to rosidl-generated interfaces so downstream nodes can publish/subscribe it. |
| src/bitbots_misc/system_monitor/system_monitor/network_interfaces.py | Adds return type annotations for interface collection helpers. |
| src/bitbots_misc/system_monitor/system_monitor/monitor.py | Updates GPU collector call signature and aligns default “disabled” tuple types; minor comment grammar fix. |
| src/bitbots_misc/system_monitor/system_monitor/memory.py | Adds a typed return annotation for memory stats collection. |
| src/bitbots_misc/system_monitor/system_monitor/gpu.py | Replaces single-backend AMD logic with auto-detected NVIDIA/Jetson/AMD backends and improved error handling/logging. |
| src/bitbots_misc/system_monitor/system_monitor/cpus.py | Adds EMA smoothing for CPU usage values and updates return/type annotations. |
| src/bitbots_misc/system_monitor/config/config.yaml | Lowers default update frequency from 10 Hz to 2 Hz. |
| pixi.toml | Adds nvidia-ml-py dependency for NVML-based monitoring. |
| pixi.lock | Locks nvidia-ml-py into all environments. |
| .vscode/settings.json | Adds dictionary words related to new GPU monitoring terms. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+79
to
+85
| handle = pynvml.nvmlDeviceGetHandleByIndex(0) | ||
| load = float(pynvml.nvmlDeviceGetUtilizationRates(handle).gpu) | ||
| mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle) | ||
| vram_used = mem_info.used | ||
| vram_total = mem_info.total | ||
| temperature = float(pynvml.nvmlDeviceGetTemperature(handle, 0)) | ||
| return (load, vram_used, vram_total, temperature) |
| if raw_load is None: | ||
| continue | ||
| # Jetson reports GPU load in permille on current L4T kernels. | ||
| load = raw_load / 10.0 |
Comment on lines
+76
to
+82
| # smooth short-term sampling noise with exponential moving average | ||
| prev = _prev_usage[cpu_num] | ||
| if prev == 0.0: | ||
| smoothed = float(round(raw_usage, 2)) | ||
| else: | ||
| smoothed = float(round((raw_usage * _EMA_ALPHA) + (prev * (1.0 - _EMA_ALPHA)), 2)) | ||
|
|
Comment on lines
+72
to
+90
| def _collect_nvidia(node: Node) -> tuple[float, int, int, float]: | ||
| """Collect GPU metrics from NVIDIA GPU using pynvml.""" | ||
| try: | ||
| import pynvml | ||
|
|
||
| pynvml.nvmlInit() | ||
| try: | ||
| handle = pynvml.nvmlDeviceGetHandleByIndex(0) | ||
| load = float(pynvml.nvmlDeviceGetUtilizationRates(handle).gpu) | ||
| mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle) | ||
| vram_used = mem_info.used | ||
| vram_total = mem_info.total | ||
| temperature = float(pynvml.nvmlDeviceGetTemperature(handle, 0)) | ||
| return (load, vram_used, vram_total, temperature) | ||
| finally: | ||
| try: | ||
| pynvml.nvmlShutdown() | ||
| except Exception: | ||
| pass |
Comment on lines
+186
to
+188
| If `node` is provided the ROS node's logger will be used for messages. | ||
|
|
||
| node: ROS node for logging (required for backend detection and error logging) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Proposed changes
Related issues
Checklist
pixi run build