Skip to content

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677

Draft
pv-nvidia wants to merge 7 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses
Draft

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677
pv-nvidia wants to merge 7 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses

Conversation

@pv-nvidia
Copy link
Copy Markdown
Contributor

Motivation

Resolves the core feature request: Fabric-accelerated local-pose operations for FabricFrameView. Previously, only world poses were GPU-accelerated via Fabric — local poses fell back to USD round-trips, defeating the purpose of Fabric for parent-child hierarchies.

Changes

New operations on FabricFrameView:

Method Description
get_local_poses() Read local pos/quat from omni:fabric:localMatrix via indexed kernels
set_local_poses() Write local pos/quat → propagate to world matrix
get_scales() Read scale from Fabric world matrix
set_scales() Write scale → recompose world matrix

Dirty propagation:

  • set_local_poses() marks world matrix dirty → next get_world_poses() re-propagates
  • set_world_poses() marks local matrix dirty → next get_local_poses() re-derives
  • Per-view tracking: multiple views don't interfere with each other's dirty state

Topology recovery:

  • _rebuild_fabric_arrays(): when PrepareForReuse() detects topology changes (prims added/removed), all selections and indexed arrays are rebuilt automatically instead of crashing

Infrastructure used:

Tests

49 tests pass (36 existing + 13 new):

  • Local-pose round-trips (set→get identity)
  • World↔local propagation correctness
  • Per-view dirty isolation
  • Topology-change recovery
  • Scaled parent/child combinations

Dependencies


Merge Order

This PR is part of a stacked series for Fabric-accelerated local poses:

  1. feat: add typed service locator to SimulationContext #5672 — service locator
  2. feat: add FabricStageCache service for shared hierarchy handles #5676 — FabricStageCache
  3. feat: add indexed fabric transform kernels for local/world matrix propagation #5675 — indexed fabric kernels
  4. refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory #5673 — factory dispatch (independent refactor)
  5. refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform #5674 — fused compose (independent refactor)
  6. feat: Fabric-accelerated get/set_local_poses via indexedfabricarray #5677 — Fabric local poses ← this PR (depends on feat: add typed service locator to SimulationContext #5672, feat: add FabricStageCache service for shared hierarchy handles #5676, feat: add indexed fabric transform kernels for local/world matrix propagation #5675)

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 18, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: b15d6235 | Now reviewing: 80838c00

Summary

The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:

  1. Service locator infrastructure on SimulationContext
  2. Service locator tests + changelog
  3. Indexed Fabric transform kernels in isaaclab.utils.warp.fabric
  4. FabricStageCache as a shared hierarchy handle
  5. (NEW) Merge commit to consolidate branches
  6. (NEW) FabricFrameView rewrite with get/set_local_poses + dirty tracking

✅ New Additions Since Last Review

  • Fabric-accelerated local poses: set_local_poses / get_local_poses now use wp.indexedfabricarray to read/write omni:fabric:localMatrix directly on the GPU
  • Bidirectional world↔local sync:
    • set_world_poses → recomputes localMatrix via _sync_local_from_world()
    • set_local_poses → marks _world_dirty, world recomputed on next get_world_poses
  • Per-view dirty tracking: _world_dirty flag is instance-scoped, so concurrent views on the same stage don't clear each other's flag
  • Parent matrix handling: _build_parent_indexed_array() + _compute_parent_fabric_indices() for parent world matrix lookups
  • Topology-adaptive: PrepareForReuse() calls + _rebuild_trans_ro_arrays() for automatic recovery
  • Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation

🔧 Remaining Observations

[Minor] Index array dtype mismatch still present

The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:

# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)

Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.

[Minor] Undefined buffer references in get_local_poses

get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:

if use_cached:
    translations_wp = self._fabric_local_translations_buf
    orientations_wp = self._fabric_local_orientations_buf

These don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.

📋 Architecture Notes

The world↔local propagation design is clean:

  • Write world → update local: _sync_local_from_world() runs update_indexed_local_matrix_from_world kernel immediately after world writes
  • Write local → lazy world update: _world_dirty flag defers update_indexed_world_matrix_from_local until the next world read

This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.

📋 Verdict

LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.


Automated review by isaaclab-review-bot • Reviewed at 80838c00

@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 3 times, most recently from 40b5aeb to 80838c0 Compare May 18, 2026 17:10
pv-nvidia added 7 commits May 20, 2026 13:52
…pagation

Add Warp kernels that operate on wp.indexedfabricarray for direct
local↔world matrix propagation without round-tripping through USD:

- decompose_indexed_fabric_transforms: extract pos/quat/scale from ifa
- compose_indexed_fabric_transforms: write pos/quat/scale into ifa
- update_indexed_local_matrix_from_world: local = inv(parent) * world
- update_indexed_world_matrix_from_local: world = parent * local

Also refactor existing kernels to use wp.where (branchless) instead of
if/else for broadcast index selection.

These kernels are the foundation for Fabric-accelerated get/set_local_poses
in FabricFrameView.
…factory

FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.

Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView

FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).

UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
Extract _compose_fabric_transform() to deduplicate the kernel-launch
logic shared by set_world_poses and set_scales. The initial USD->Fabric
sync now composes position, orientation, and scale in one call, so
PrepareForReuse is invoked exactly once per logical update.

Also replace assert with RuntimeError in _rebuild_fabric_arrays so the
topology-change guard survives python -O.
Introduces FabricStageCache — a lightweight cache for the usdrt stage
attachment and IFabricHierarchy handles, registered as a service on
SimulationContext via the service locator (set_service/get_service).

Multiple FabricFrameView instances now share a single hierarchy handle
instead of each creating its own. The cache is automatically closed on
SimulationContext.clear_instance().

Also replaces the assert device check with a proper RuntimeError and
removes the now-unused isaaclab.sim import from fabric_frame_view.py.
# Conflicts:
#	source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
- get_local_poses / set_local_poses via indexed fabric kernels
- get_scales / set_scales with full Fabric acceleration
- Per-view dirty tracking with automatic re-propagation
- Topology-change recovery via _rebuild_fabric_arrays
- 13 new integration tests (49 total pass)
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch from 80838c0 to 9ff3155 Compare May 20, 2026 14:11
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Reviewing new commits: PR rebased and updated since last review.
Previously reviewed: eb5582ec | Now reviewing: 9ff3155

Summary

This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.

✅ Strengths

1. Clean Architecture

  • FabricStageCache provides shared hierarchy handles via the service locator pattern, avoiding per-view duplication
  • Three persistent selections (trans_sel_ro, world_sel_rw, local_sel_rw) cleanly separate read vs. write access patterns
  • Factory dispatch in FrameView._get_backend() correctly routes to UsdFrameView when Fabric is unavailable

2. Robust World↔Local Consistency

  • Bidirectional dirty tracking: set_local_poses marks _world_dirty, deferred until next world read
  • _sync_local_from_world() / _sync_world_from_local_if_dirty() keep matrices consistent
  • Per-view dirty flags prevent concurrent views from clearing each other's pending syncs

3. Topology-Adaptive Design

  • PrepareForReuse() + lazy array rebuild in _get_*_array() handles Fabric memory layout changes
  • _rebuild_trans_ro_arrays() consolidates index and indexed-array refresh

4. Excellent Test Coverage

  • 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
  • test_set_local_then_get_world_with_rotated_parent validates transpose-convention correctness
  • test_multi_view_per_view_dirty_isolation catches per-stage vs. per-view flag bugs

🔧 Minor Observations

[Minor] Index array dtype mismatch

_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:

return wp.array(indices, dtype=wp.int32, device=self._device)

Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.

[Nit] Docstring transpose-convention note

The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.

[Style] Empty sentinel shape

_fabric_empty_2d_array_sentinel uses shape (0, 0):

self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)

This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.

📋 Architecture Notes

The asymmetric sync strategy is well-reasoned:

  • Write world → sync local immediately: _sync_local_from_world() runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after
  • Write local → lazy world sync: _world_dirty flag defers the world = parent * local kernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read

📋 CI Status

CI checks are currently pending. The pre-commit and changelog checks have passed.

📋 Verdict

LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.


Automated review by isaaclab-review-bot • Reviewed at 9ff31550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant