feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677
feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677pv-nvidia wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
🔍 Code Review Update
Review of new commits: PR was rebased and expanded since last review.
Previously reviewed:b15d6235| Now reviewing:80838c00
Summary
The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:
- Service locator infrastructure on
SimulationContext - Service locator tests + changelog
- Indexed Fabric transform kernels in
isaaclab.utils.warp.fabric FabricStageCacheas a shared hierarchy handle- (NEW) Merge commit to consolidate branches
- (NEW)
FabricFrameViewrewrite withget/set_local_poses+ dirty tracking
✅ New Additions Since Last Review
- Fabric-accelerated local poses:
set_local_poses/get_local_posesnow usewp.indexedfabricarrayto read/writeomni:fabric:localMatrixdirectly on the GPU - Bidirectional world↔local sync:
set_world_poses→ recomputes localMatrix via_sync_local_from_world()set_local_poses→ marks_world_dirty, world recomputed on nextget_world_poses
- Per-view dirty tracking:
_world_dirtyflag is instance-scoped, so concurrent views on the same stage don't clear each other's flag - Parent matrix handling:
_build_parent_indexed_array()+_compute_parent_fabric_indices()for parent world matrix lookups - Topology-adaptive:
PrepareForReuse()calls +_rebuild_trans_ro_arrays()for automatic recovery - Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation
🔧 Remaining Observations
[Minor] Index array dtype mismatch still present
The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:
# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.
[Minor] Undefined buffer references in get_local_poses
get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:
if use_cached:
translations_wp = self._fabric_local_translations_buf
orientations_wp = self._fabric_local_orientations_bufThese don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.
📋 Architecture Notes
The world↔local propagation design is clean:
- Write world → update local:
_sync_local_from_world()runsupdate_indexed_local_matrix_from_worldkernel immediately after world writes - Write local → lazy world update:
_world_dirtyflag defersupdate_indexed_world_matrix_from_localuntil the next world read
This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.
📋 Verdict
LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.
Automated review by isaaclab-review-bot • Reviewed at 80838c00
40b5aeb to
80838c0
Compare
…pagation Add Warp kernels that operate on wp.indexedfabricarray for direct local↔world matrix propagation without round-tripping through USD: - decompose_indexed_fabric_transforms: extract pos/quat/scale from ifa - compose_indexed_fabric_transforms: write pos/quat/scale into ifa - update_indexed_local_matrix_from_world: local = inv(parent) * world - update_indexed_world_matrix_from_local: world = parent * local Also refactor existing kernels to use wp.where (branchless) instead of if/else for broadcast index selection. These kernels are the foundation for Fabric-accelerated get/set_local_poses in FabricFrameView.
…factory
FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.
Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView
FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).
UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
Extract _compose_fabric_transform() to deduplicate the kernel-launch logic shared by set_world_poses and set_scales. The initial USD->Fabric sync now composes position, orientation, and scale in one call, so PrepareForReuse is invoked exactly once per logical update. Also replace assert with RuntimeError in _rebuild_fabric_arrays so the topology-change guard survives python -O.
Introduces FabricStageCache — a lightweight cache for the usdrt stage attachment and IFabricHierarchy handles, registered as a service on SimulationContext via the service locator (set_service/get_service). Multiple FabricFrameView instances now share a single hierarchy handle instead of each creating its own. The cache is automatically closed on SimulationContext.clear_instance(). Also replaces the assert device check with a proper RuntimeError and removes the now-unused isaaclab.sim import from fabric_frame_view.py.
# Conflicts: # source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
- get_local_poses / set_local_poses via indexed fabric kernels - get_scales / set_scales with full Fabric acceleration - Per-view dirty tracking with automatic re-propagation - Topology-change recovery via _rebuild_fabric_arrays - 13 new integration tests (49 total pass)
80838c0 to
9ff3155
Compare
There was a problem hiding this comment.
🔍 Code Review Update
Reviewing new commits: PR rebased and updated since last review.
Previously reviewed:eb5582ec| Now reviewing:9ff3155
Summary
This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.
✅ Strengths
1. Clean Architecture
FabricStageCacheprovides shared hierarchy handles via the service locator pattern, avoiding per-view duplication- Three persistent selections (
trans_sel_ro,world_sel_rw,local_sel_rw) cleanly separate read vs. write access patterns - Factory dispatch in
FrameView._get_backend()correctly routes toUsdFrameViewwhen Fabric is unavailable
2. Robust World↔Local Consistency
- Bidirectional dirty tracking:
set_local_posesmarks_world_dirty, deferred until next world read _sync_local_from_world()/_sync_world_from_local_if_dirty()keep matrices consistent- Per-view dirty flags prevent concurrent views from clearing each other's pending syncs
3. Topology-Adaptive Design
PrepareForReuse()+ lazy array rebuild in_get_*_array()handles Fabric memory layout changes_rebuild_trans_ro_arrays()consolidates index and indexed-array refresh
4. Excellent Test Coverage
- 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
test_set_local_then_get_world_with_rotated_parentvalidates transpose-convention correctnesstest_multi_view_per_view_dirty_isolationcatches per-stage vs. per-view flag bugs
🔧 Minor Observations
[Minor] Index array dtype mismatch
_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:
return wp.array(indices, dtype=wp.int32, device=self._device)Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.
[Nit] Docstring transpose-convention note
The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.
[Style] Empty sentinel shape
_fabric_empty_2d_array_sentinel uses shape (0, 0):
self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.
📋 Architecture Notes
The asymmetric sync strategy is well-reasoned:
- Write world → sync local immediately:
_sync_local_from_world()runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after - Write local → lazy world sync:
_world_dirtyflag defers theworld = parent * localkernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read
📋 CI Status
CI checks are currently pending. The pre-commit and changelog checks have passed.
📋 Verdict
LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.
Automated review by isaaclab-review-bot • Reviewed at 9ff31550
Motivation
Resolves the core feature request: Fabric-accelerated local-pose operations for
FabricFrameView. Previously, only world poses were GPU-accelerated via Fabric — local poses fell back to USD round-trips, defeating the purpose of Fabric for parent-child hierarchies.Changes
New operations on
FabricFrameView:get_local_poses()omni:fabric:localMatrixvia indexed kernelsset_local_poses()get_scales()set_scales()Dirty propagation:
set_local_poses()marks world matrix dirty → nextget_world_poses()re-propagatesset_world_poses()marks local matrix dirty → nextget_local_poses()re-derivesTopology recovery:
_rebuild_fabric_arrays(): whenPrepareForReuse()detects topology changes (prims added/removed), all selections and indexed arrays are rebuilt automatically instead of crashingInfrastructure used:
FabricStageCacheservice (feat: add FabricStageCache service for shared hierarchy handles #5676) for shared hierarchy handleswp.indexedfabricarrayfor zero-copy access to Fabric dataTests
49 tests pass (36 existing + 13 new):
Dependencies
Merge Order
This PR is part of a stacked series for Fabric-accelerated local poses: