[OvPhysX] Snapshot env_0 stage pre-clone to speed up OvPhysX warm-up#5679
[OvPhysX] Snapshot env_0 stage pre-clone to speed up OvPhysX warm-up#5679AntoineRichard wants to merge 9 commits into
Conversation
Add the ovphysx entry to the FrameView factory and teach _get_backend to recognise OvPhysxManager so calls to FrameView(...) return the OVPhysX-backed view instead of falling through to FabricFrameView.
Implement a Warp-native batched-prim view that mirrors NewtonSiteFrameView's site-resolution approach against the OVPhysX scene data provider's body_q array. Each prim resolves at init to a (body_idx, site_local) pair via USD ancestor walk; world poses are computed on GPU as body_q[bid] * site_local. Scales and visibility delegate to an internal UsdFrameView. The view defers initialization to PhysicsEvent.PHYSICS_READY when the SDP is not yet built. Adds contract-suite coverage via the shared frame_view_contract_utils plus OVPhysX-specific tests for factory dispatch, deferred-init guarding, and the requires_newton_model error path.
Read body poses directly from an OVPhysX RIGID_BODY_POSE tensor binding
(mirroring the ContactSensor data path) instead of going through the
scene data provider's Newton state. Scenes that don't declare
requires_newton_model=True (e.g. headless training without a Newton-style
visualizer) can now use OvPhysxFrameView -- the previous design raised at
PHYSICS_READY because the SDP exposes body_q only when the Newton model
build is requested.
Site discovery handles both InteractiveScene modes:
* clone_usd=True: every env has USD prims, one per env matches the
pattern and resolves directly to its rigid-body ancestor.
* clone_usd=False: only env_0 has authored USD prims; env_1..N are
physics-layer clones. The binding row count becomes the authoritative
site count, and per-env site paths are synthesized from the env_0
template prim with env_0 replaced by each row's env_id.
The Warp kernels are unchanged. count, prim_paths, and prims honor the
clone_usd=False expansion. Tests updated to drive _pose_buf directly
(detach the binding after one warm-up read) instead of mutating SDP
body_q, and the now-obsolete requires_newton_model error test is removed.
Under OVPhysX's clone_usd=False scenes (the default for InteractiveScene), USD only holds env_0 -- env_1..N are physics-layer clones via physx.clone(). SensorBase.__init__ derives _num_envs from len(find_matching_prims(...)) so any FrameView-using sensor (RayCaster, MultiMeshRayCaster, Camera) sees _num_envs=1 even when the scene has many envs. The sensor's _reset_mask_torch is then sized 1, and the first reset(env_ids=[0..N-1]) triggers a CUDA assert from out-of-bounds indexing. Walk the call stack at construction time to capture the sensor that owns this view, then at the end of _initialize_impl re-allocate its env-sized buffers (_ALL_ENV_MASK, _reset_mask + torch view, _is_outdated, _timestamp, _timestamp_last_update) to match the OVPhysX RIGID_BODY_POSE binding's row count. Duck-typed -- works for any SensorBase subclass without an isaaclab.sensors import dependency. Mirrors the local fix the OVPhysX ContactSensor already applies to itself at contact_sensor.py:240-248. This is a hack confined to the OVPhysX backend. A cleaner long-term fix would source _num_envs from InteractiveScene.num_envs (or move the under-cloning behaviour to be opt-in for rendering rather than backend- keyed), but both are larger changes that touch core IsaacLab. Verified: Anymal-D rough velocity env with presets=ovphysx + 64 envs now initializes past env.reset() without the CUDA assert.
InteractiveScene now USD-replicates the per-env asset subtree for every backend including OVPhysX (clone_usd=True). OVPhysX's add_usd + physx.clone tolerates the extra USD content at runtime, so the per-env USD prims layer cleanly on top of the physics-side replicas. Sensors that discover _num_envs via find_matching_prims now see N prims under OVPhysX too, matching the behaviour they already had under PhysX and Newton. Removes the OvPhysxFrameView workaround that the previous clone_usd=False behaviour had forced: * OvPhysxFrameView no longer walks the call stack to patch the caller sensor's _num_envs. _num_envs now reaches the binding row count naturally. FrameView parity fixes inspired by a PhysX/Newton review: * OvPhysxFrameView._initialize_impl rejects prim_paths that resolve to a rigid body with a clear ValueError, mirroring the NewtonSiteFrameView guard. FrameView is for non-physics sensor frames; callers should use RigidObject/Articulation for body-level control. * OvPhysxFrameView.get_scales / set_scales docstrings now make explicit that these read/write the static USD xformOp:scale attribute and do not propagate to PhysX collision-shape scale (unlike Newton's shape_scale path).
InteractiveScene now USD-replicates every env's asset subtree for all backends (clone_usd=True from the previous commit), but OvPhysxManager was still handing the full N-env stage to physx.add_usd. The wheel loaded all N USD-defined bodies as independent prims, so the subsequent physx.clone() ran onto already-populated targets and never produced the clone-lineage that the wheel's create_tensor_binding fast path expects. At 4096 envs this turned every binding-creation call into a multi-second USD enumeration -- the hang in articulation init. Re-shape _warmup_and_load to export an env_0-scoped USD file: 1. Export the full stage to disk (existing flatten-and-write). 2. Re-open the exported file as an Sdf.Layer. 3. Delete every /World/envs/env_<i> prim spec for i != 0 from the layer. 4. Re-export. The live USD stage held by SimulationContext is untouched -- sensors (RayCaster, Camera, ContactSensor discovery) still see N envs and discover _num_envs = N correctly. Only the file passed to the wheel is scoped to env_0 + globals. physx.clone() then repopulates env_1..N at the physics layer with proper lineage, and create_tensor_binding walks a 1-USD-path result that auto-extends across the N clones -- the fast path that clone_usd=False used to give us implicitly. Net effect: keeps the previous commit's clone_usd=True flip (sensor parity across backends) while restoring OVPhysX's per-env scaling. No test changes required; the FrameView/ContactSensor suite stays at 27/27 pass on CPU.
Adds a Limitations section to _export_env0_only_stage covering the three assumptions the workaround makes: * Homogeneous envs -- per-env USD-authored physics overrides (mass, friction, collision filters under env_<i!=0>) are dropped from the file handed to physx.add_usd. Sensors and visualizers still see them in the live stage, so a divergence is possible. Per-env state has to be written via the runtime APIs instead. * Global path convention -- physics-relevant prims must live outside /World/envs (or under env_0) to survive the export. * Static topology -- envs added/removed after warmup require a re-warmup with a re-exported stage.
Mirrors the same fix applied to the contact sensor cfg/test; develop's isaaclab.utils no longer re-exports configclass as a callable, so the test's @configclass decorator needs the explicit module import.
OvPhysxManager._warmup_and_load previously called sim_stage.Export() on the live USD stage and then stripped /World/envs/env_<i!=0> from the exported file. With clone_usd=True, the live stage carries every env's authored subtree, so the export step flattens 4096 envs to disk -- a 31s cost at scale on Anymal-D Rough even though the eventual wheel input is env_0-only. ovphysx_replicate runs from InteractiveScene.clone_environments BEFORE cloner.usd_replicate inflates the stage. At that point the live stage holds only env_0's authored content (plus globals), and a stage.Export of the un-cloned stage is cheap. Register that snapshot on OvPhysxManager and have _warmup_and_load consume it directly, falling back to the old export-and-strip path for callers that don't flow through InteractiveScene.clone_environments (tests, single-env scenes).
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Code Review Summary
Overview
This PR introduces a significant performance optimization (~31s speedup at 4096 envs) by snapshotting the USD stage before clone inflation, along with a new OvPhysxFrameView for GPU-native pose queries.
✅ Strengths
-
Sound Architecture: The pre-clone snapshot approach correctly identifies that
ovphysx_replicateruns beforeusd_replicate, making it the ideal interception point. -
Defensive Fallback: The
_export_env0_only_stagefallback preserves compatibility for callers that bypassInteractiveScene.clone_environments. -
Well-Documented Limitations: The docstring for
_export_env0_only_stageclearly outlines the homogeneous-env assumption, global path convention, and static topology constraints. -
Comprehensive OvPhysxFrameView: The new FrameView implementation includes proper handling of both
clone_usd=Trueandclone_usd=Falsemodes with synthetic path expansion. -
Solid Test Coverage: Contract-suite integration and specific deferred-init tests are included.
🔴 Critical Issues
None identified — the implementation is sound.
🟡 Important Suggestions
-
State Reset Completeness (
ovphysx_manager.py)In
initialize(),_pre_clone_stage_pathis reset toNone, but if a previous run's temp file still exists (e.g., the process didn't clean up), and_tmp_diris reused, there could be stale file confusion. Consider explicitly cleaning the snapshot file ininitialize():if cls._pre_clone_stage_path and os.path.exists(cls._pre_clone_stage_path): try: os.remove(cls._pre_clone_stage_path) except OSError: pass cls._pre_clone_stage_path = None
-
Idempotency Edge Case (
register_pre_clone_stage_snapshot)The method is documented as idempotent, but the guard
if cls._pre_clone_stage_path is not None: returndoesn't verify the file still exists. If the file was deleted between calls, subsequent warmup would fail. Consider:if cls._pre_clone_stage_path is not None: if os.path.exists(cls._pre_clone_stage_path): return # File was deleted; allow re-snapshot cls._pre_clone_stage_path = None
-
Layer Caching Clarification (
_export_env0_only_stage)Using
Sdf.Layer.FindOrOpenmay return a cached layer from a previous session. Consider adding a comment explaining this is intentional, or useSdf.Layer.OpenAsAnonymousif cache isolation is preferred.
🟢 Minor Notes
-
Import Style: The
import osinside_ensure_physx_schemas_registered()is redundant sinceosis already imported at module level. Consider removing the local import. -
Logging Consistency:
logger.infois used for snapshot creation butlogger.debugfor the "no /World/envs" case. Consider consistent log levels for similar operational messages. -
Magic Constant:
WORLD_BODY_INDEX = -1is well-named but could benefit from a brief docstring explaining it's used in Warp kernels to indicate world-attached sites. -
Test Fixture Clarity: The test file comment about detaching the binding for synthetic pose injection is helpful — consider adding a similar note in the OvPhysxFrameView class docstring about the binding lifecycle.
📋 Test Coverage Assessment
Covered:
- Factory dispatch to OvPhysxFrameView ✓
- Deferred initialization guard ✓
- Shared FrameView contract suite ✓
Gaps to Consider (non-blocking):
- No explicit test for the pre-clone snapshot path vs. fallback path
- No test for
_export_env0_only_stagewith edge cases (single env, missing/World/envs) - No benchmark test validating the 31s speedup claim
Verdict: Approve ✅
This is a well-designed performance optimization with proper fallback handling and good test coverage. The suggestions above are minor improvements that don't block merging. The ~31s speedup at scale is a meaningful improvement for training workflows.
Greptile SummaryThis PR cuts ~31 s from OVPhysX warmup at large env counts by snapshotting the USD stage inside
Confidence Score: 3/5The warmup snapshot logic is sound for the happy path, but two defects in the fallback and kitless code paths need fixing before merge. The new ovphysx_manager.py (call site for Important Files Changed
Sequence DiagramsequenceDiagram
participant IS as InteractiveScene
participant OR as ovphysx_replicate
participant UR as usd_replicate
participant OM as OvPhysxManager
participant Stage as USD Stage
IS->>OR: "physics_clone_fn() [stage = env_0 only]"
OR->>OM: register_pre_clone_stage_snapshot(stage)
OM->>Stage: stage.Export(scene_env0.usda)
Note over OM: _pre_clone_stage_path set
OR->>OM: register_clone(source, targets, positions)
IS->>UR: usd_replicate() [inflates stage to env_0..N]
Note over Stage: Stage now holds env_0..N USD prims
IS->>OM: "reset() -> _warmup_and_load()"
alt fast path
OM-->>OM: "stage_file = scene_env0.usda"
else fallback
OM->>Stage: Export full stage
OM->>OM: _export_env0_only_stage() strip env_1..N
end
OM->>OM: physx.add_usd(stage_file)
loop pending_clones
OM->>OM: physx.clone(source, targets)
end
OM->>OM: physx.warmup_gpu()
OM-->>IS: PhysicsEvent.PHYSICS_READY
|
| _physx_schemas_registered: ClassVar[bool] = False | ||
|
|
||
| @classmethod | ||
| def _ensure_physx_schemas_registered(cls) -> None: | ||
| """Register the ``PhysxSchema`` USD plugin shipped with the ovphysx wheel. | ||
|
|
||
| In Kit-based runs ``omni.physx`` registers the schema; in kitless | ||
| runs it must be registered manually before the wheel can match | ||
| ``PhysxContactReportAPI`` and friends on the stage. The wheel | ||
| bundles the plugin under ``ovphysx/plugins/usd/PhysxSchema``. This | ||
| method is idempotent — :meth:`pxr.Plug.Registry.RegisterPlugins` | ||
| is a no-op once the plugin is registered. | ||
| """ | ||
| if cls._physx_schemas_registered: | ||
| return | ||
| try: | ||
| import os # noqa: PLC0415 | ||
|
|
||
| import ovphysx # noqa: PLC0415 | ||
|
|
||
| from pxr import Plug # noqa: PLC0415 | ||
| except Exception: | ||
| return | ||
| plugin_root = os.path.join(os.path.dirname(ovphysx.__file__), "plugins", "usd") | ||
| for sub in ("PhysxSchema/resources", "PhysxSchemaAddition/resources"): | ||
| path = os.path.join(plugin_root, sub) | ||
| if os.path.isdir(path): | ||
| Plug.Registry().RegisterPlugins(path) | ||
| cls._physx_schemas_registered = True |
There was a problem hiding this comment.
_ensure_physx_schemas_registered has no call site
This method was added in this PR but is never called anywhere in the codebase (confirmed via repo-wide search). Its docstring explicitly states that in kitless runs "it must be registered manually before the wheel can match PhysxContactReportAPI and friends on the stage." Without a call in initialize() or _warmup_and_load(), kitless runs would silently skip schema registration and any USD stage with PhysxContactReportAPI, PhysxJointAPI, etc. would not resolve those schemas — causing silent failures when loading physics prims.
| layer = Sdf.Layer.FindOrOpen(target_file) | ||
| if layer is None: |
There was a problem hiding this comment.
After
sim_stage.Export(target_file), using Sdf.Layer.FindOrOpen(target_file) looks up the path in the Sdf layer registry first. On a re-warmup in the same process (where _tmp_dir is reused and the path is identical), the registry would return the cached layer with old content rather than the freshly exported file. Sdf.Layer.Reload() must be called on the found layer if it was already open, or the file should be opened anonymously to bypass the registry entirely.
| layer = Sdf.Layer.FindOrOpen(target_file) | |
| if layer is None: | |
| # Use FindOrOpen and unconditionally reload so that a re-warmup in the | |
| # same process (reusing the same tmp path) sees the freshly-exported file | |
| # rather than a stale cached layer from the Sdf registry. | |
| layer = Sdf.Layer.FindOrOpen(target_file) | |
| if layer is not None: | |
| layer.Reload() | |
| if layer is None: |
| try: | ||
| import os # noqa: PLC0415 | ||
|
|
||
| import ovphysx # noqa: PLC0415 | ||
|
|
||
| from pxr import Plug # noqa: PLC0415 | ||
| except Exception: | ||
| return |
There was a problem hiding this comment.
import os is a standard-library module that will never fail to import. Including it inside the broad except Exception: return block means any ImportError from ovphysx or pxr.Plug silently suppresses schema registration, but the root cause (missing wheel or missing pxr) is completely hidden. Move import os outside the try block.
| try: | |
| import os # noqa: PLC0415 | |
| import ovphysx # noqa: PLC0415 | |
| from pxr import Plug # noqa: PLC0415 | |
| except Exception: | |
| return | |
| try: | |
| import ovphysx # noqa: PLC0415 | |
| from pxr import Plug # noqa: PLC0415 | |
| except Exception: | |
| return |
Description
Cuts ~31 seconds off OVPhysX warmup at 4096 envs.
The OvPhysX warmup path in :meth:
~isaaclab_ovphysx.physics.OvPhysxManager._warmup_and_loadpreviously calledstage.Export(target_file)on the live USD stage and then opened the resulting file as an :class:Sdf.Layerto delete/World/envs/env_<i!=0>from it. Withclone_usd=True(introduced by PR #5678), the live stage carries every per-env subtree authored, so the export step had to flatten 4096 envs to disk before the strip could discard most of them. Measured cost onIsaac-Velocity-Rough-Anymal-D-v0at 4096 envs: ~31 s of pure setup.:func:
~isaaclab_ovphysx.cloner.ovphysx_replicateruns from :meth:~isaaclab.scene.InteractiveScene.clone_environmentsbefore :func:cloner.usd_replicateinflates the stage. At that moment the live stage holds onlyenv_0's authored content (plus globals), andstage.Exportis cheap. This PR snapshots the stage there and has :meth:_warmup_and_loadconsume the snapshot directly. The old export-and-strip path is preserved as a fallback for callers that bypassInteractiveScene.clone_environments(single-env tests, etc.).Depends on PR #5678 (FrameView) for the existing
_export_env0_only_stagefallback path.Fixes # (issue)
Type of change
Screenshots
N/A — backend init perf, no UI surface.
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched package (do not editCHANGELOG.rstor bumpextension.toml— CI handles that)CONTRIBUTORS.mdor my name already exists there