diff --git a/.agents/skills/README.md b/.agents/skills/README.md index fcb6560..df36a24 100644 --- a/.agents/skills/README.md +++ b/.agents/skills/README.md @@ -35,6 +35,7 @@ Agent skills for performance engineering across the NVIDIA Omniverse stack (Isaa |---|---| | [diagnose-perf](diagnose-perf/) | Quick triage — identify bottleneck category without profiling tools | | [perf-tuning](perf-tuning/) | Apply specific fixes for known performance issues | +| [kit-app-streaming-debug](kit-app-streaming-debug/) | Debug Kit app livestream bottlenecks using StreamSDK logs and browser WebRTC stats | ## Workflow @@ -54,6 +55,9 @@ Specialized profiling: - **tracy-memory** — when investigating memory leaks or allocation hotspots - **nvtx-python** — when profiling Python in standalone Isaac Lab (no Kit runtime) +Specialized streaming: +- **kit-app-streaming-debug** — when WebRTC/native app streaming lags, freezes, disconnects, drops frames, or logs StreamSDK warnings + Routing boundaries: - Use **benchmark-*** skills to run benchmark scripts and read benchmark outputs, not to diagnose or fix bottlenecks. - Use **diagnose-perf** for first-pass bottleneck triage before full profiling. diff --git a/.agents/skills/kit-app-streaming-debug/SKILL.md b/.agents/skills/kit-app-streaming-debug/SKILL.md new file mode 100644 index 0000000..14e2bd6 --- /dev/null +++ b/.agents/skills/kit-app-streaming-debug/SKILL.md @@ -0,0 +1,100 @@ +--- +name: kit-app-streaming-debug +description: Use when investigating Kit app livestream performance bottlenecks, WebRTC/native StreamSDK lag, freezes, dropped frames, browser WebRTC stats, copy fence timeouts, NVST_R_BUSY, disconnects, or resolution mismatch warnings in omni.kit.livestream. +--- + +# Kit App Streaming Debug + +Debug Kit livestream performance end-to-end before jumping into generic profiling. Always collect both server-side Kit logs and browser-side WebRTC evidence, then correlate by timestamp around the lag/freeze event. + +## Collect Kit Evidence + +Run the Kit app with StreamSDK event tracing and verbose file logs. Keep `--no-window` for app-stream tests when resolution mismatch is suspected. + +```bash +# Windows +.\repo.bat launch -- --no-window --/exts/omni.kit.livestream.app/primaryStream/enableEventTracing=true --/log/channels/omni.kit.livestream.streamsdk=verbose --/log/fileLogLevel=verbose + +# Linux +./repo.sh launch -- --no-window --/exts/omni.kit.livestream.app/primaryStream/enableEventTracing=true --/log/channels/omni.kit.livestream.streamsdk=verbose --/log/fileLogLevel=verbose +``` + +If extension context is needed, also enable: + +```bash +--/log/channels/omni.kit.livestream.webrtc=verbose +--/log/channels/omni.kit.livestream.app=verbose +``` + +Extract the known streaming failure markers: + +```bash +rg -n -C 3 "Timeout of [0-9]+ms exceeded waiting for copy fence|Cannot stream video frame because the video stream is not connected|Still cannot stream video frame|Cannot stream video frame with resolution|NVST_R_BUSY|Client disconnected from WebRTC server" path/to/kit.log +``` + +Interpret them this way: + +| Marker | Likely cause | Next check | +|---|---|---| +| `Timeout of 1000ms exceeded waiting for copy fence` | Renderer/GPU stall before the framebuffer copy completes | Correlate with frame time, GPU utilization, and Tracy/NSight render zones | +| `Cannot stream video frame because the video stream is not connected` | WebRTC session has not established, or video stream dropped | Check browser ICE state, connection events, and nearby disconnect logs | +| `Still cannot stream video frame...` | Connection did not recover inside the wait timeout, frame dropped | Treat as stream/session instability, not render cost by itself | +| `Cannot stream video frame with resolution ... differs...` | OS/window resize or AOV/render resolution changed after connect | Try `--no-window`, fixed resolution, disable viewport fill behavior, or use dynamic resize only if required | +| `NVST_R_BUSY` | StreamSDK is not accepting frames fast enough | If not adjacent to disconnect/reconnect, suspect encoder, network, or client backpressure | +| `Client disconnected from WebRTC server` | Client/session dropped | Align with browser ICE/NACK/PLI/freeze events | + +## Collect Browser Evidence + +Open `chrome://webrtc-internals` before connecting to the stream. After a lag event, use **Create dump** and inspect packet loss, jitter, frames dropped, decode time, bitrate, freezes, NACK/PLI, and the selected ICE candidate pair. + +For `~/proj/ov-web-rtc`, run the dev client with `npm run dev` and capture console logs. Add or inspect an `onStreamStats` callback like: + +```ts +onStreamStats: (message: StatsEvent) => { + const s = message.data.stats; + console.table({ + fps: s.fps, + rtdMs: s.rtd, + decodeMs: s.avgDecodeTime, + frameLoss: s.frameLoss, + packetLoss: s.packetLoss, + bandwidthMbps: s.totalBandwidth, + bitrateMbps: s.currentBitrate, + utilizedPct: s.utilizedBandwidth, + resolution: `${s.streamingResolutionWidth}x${s.streamingResolutionHeight}`, + codec: s.codec, + }); +} +``` + +Browser-side interpretation: + +| Symptom | Likely bottleneck | +|---|---| +| High `avgDecodeTime`, low packet loss | Client decode/GPU/display bottleneck | +| High `packetLoss`, `frameLoss`, NACK/PLI, jitter, or freezes | Network or congestion problem | +| `utilizedBandwidth` near 100%, bitrate near/above available bandwidth | Bandwidth cap or congestion control | +| Low browser FPS with clean browser stats but Kit copy-fence/NVST warnings | Server render/copy/encoder/backpressure | +| Browser resolution differs from expected Kit stream resolution | Resize/configuration mismatch | + +## Profiling Follow-Up + +Use Tracy only after logs indicate the server side is involved. Add the profiling args to the same Kit run: + +```bash +--/app/profilerBackend=tracy --/app/profileFromStart=true --/profiler/gpu/tracyInject/enabled=true --/app/profilerMask=1 --/plugins/carb.profiler-tracy.plugin/fibersAsThreads=false --/profiler/channels/carb.events/enabled=false --/profiler/channels/carb.tasking/enabled=false --/rtx/addTileGpuAnnotations=true --/profiler/enabled=true --/profiler/gpu=true --enable omni.kit.profiler.window +``` + +Use the `profiling` skill for capture mechanics and `nsys-analyze` for trace analysis. Do not promise StreamSDK-internal zones when StreamSDK source is unavailable; correlate log timestamps with surrounding Kit zones such as `SharedFrameBuffer::waitForCopy`, `SharedFrameBuffer::streamBuffer`, render zones, and frame markers. + +## Report Shape + +Summarize: + +1. Kit run args and log path. +2. Browser artifacts: console stats and `webrtc-internals` dump. +3. Timeline around the lag event. +4. Matched markers and likely bottleneck category: renderer/GPU, session/network, resize/config, StreamSDK/encoder backpressure, or client decode. +5. Next action: config fix, network/client investigation, or server-side profiling. + +Source references in this workspace: `~/proj/kit-livestream` for `omni.kit.livestream.app` and `omni.kit.livestream.webrtc`; `~/proj/ov-web-rtc` for browser stats and dev client logging. diff --git a/README.md b/README.md index 116f884..0cc6274 100644 --- a/README.md +++ b/README.md @@ -94,6 +94,7 @@ This repo ships a set of [Claude Code agent skills](.agents/skills/) for working - [nvtx-python](.agents/skills/nvtx-python/SKILL.md) — trace Python functions with NVTX outside Kit/Carbonite - [diagnose-perf](.agents/skills/diagnose-perf/SKILL.md) — first-responder triage for slow FPS, stutter, or latency - [perf-tuning](.agents/skills/perf-tuning/SKILL.md) — apply guide-backed fixes for known performance bottlenecks +- [kit-app-streaming-debug](.agents/skills/kit-app-streaming-debug/SKILL.md) — debug Kit app livestream bottlenecks using StreamSDK logs and browser WebRTC stats ## Security