[Fix] Exclude numpy 2.3.5 from every IsaacLab install path#5656
Conversation
There was a problem hiding this comment.
Automated Review Summary
PR #5656: [Fix] Force numpy>=2.4.1 in --install to escape OpenBLAS atfork SIGSEGV
✅ Overview
This PR addresses a critical SIGSEGV crash caused by numpy 2.3.5's vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) during SimulationApp startup. The fix forces numpy >= 2.4.1 after the pin-pink installation step.
🔍 Key Findings
✅ Strengths
-
Correct Root Cause Identification: The PR correctly identifies that
pip install --upgrade --force-reinstall pin pin-pink==3.1.0 daqp==0.8.5resolves numpy independently, landing on 2.3.5 due to cmeel-boost'snumpy<2.4cap. -
Strategic Placement: Adding the numpy upgrade after
_ensure_pink_ik_dependencies_installedensures it's the final pip operation, preventing subsequent steps from downgrading numpy. -
Graceful Error Handling: Using
check=Falsewith a clear warning allows the installation to complete even if the numpy upgrade fails, while informing users of the risk. -
Good Documentation: The inline comment references upstream issues (
numpy/numpy#30092,OpenMathLib/OpenBLAS#5520) and internal tracking (OMPE-92261). -
Comprehensive Testing: 54/54 Pink IK unit tests verified locally against numpy 2.4.5.
⚠️ Minor Observations
-
Early Return Addition (line 226): The
returnstatement added after the pin-pink install failure is correct but represents a subtle behavior change - previously the function would continue (doing nothing useful) whereas now it explicitly returns. This is actually an improvement. -
Dependency Warning: Users may see a pip resolver warning about cmeel-boost's
numpy<2.4constraint. This is documented in the PR description as cosmetic and expected.
📋 Files Changed
| File | Changes | Assessment |
|---|---|---|
install.py |
+18 lines | ✅ Clean implementation |
jichuanh-force-numpy-install.rst |
+13 lines | ✅ Proper changelog format |
🧪 CI Status
Pre-commit and wheel build passing. Installation tests pending.
📌 Verdict
LGTM - This is a well-reasoned fix for a critical crash. The implementation follows existing patterns, handles errors gracefully, and is well-documented. The approach of forcing numpy >= 2.4.1 as the final pip step is appropriate given the constraint resolution complexities.
🤖 Review generated by isaaclab-review-bot
Update (1499d08): Reviewed new changes since beab776:
📁 New/Updated Files
| File | Status | Assessment |
|---|---|---|
docker/Dockerfile.curobo |
Added numpy upgrade | ✅ Applies same fix to cuRobo Docker image |
changelog.d/jichuanh-force-numpy-install.rst |
Added | ✅ Comprehensive changelog entry |
install.py |
Enhanced | ✅ Better documentation with upstream issue refs |
🔍 Changes Analysis
-
Docker cuRobo Fix: Adds
RUN ${ISAACLAB_PATH}/isaaclab.sh -p -m pip install --upgrade "numpy>=2.4.1"after curobo and isaaclab_teleop installs. This is necessary becausenvidia-curoboandisaaclab_teleoppull index-retargeting→pin→ cmeel-boost, which would otherwise drag numpy back to 2.3.5. -
Changelog Entry: Well-written documentation covering both the base
--installfix and the Dockerfile.curobo fix. Explains the root cause clearly. -
install.py Enhancement: The
_ensure_numpy_above_openblas_atfork_bugfunction now has comprehensive docstring explaining the technical details and rationale.
✅ Verdict
Changes look good. The cuRobo Docker fix is a necessary addition to ensure the numpy upgrade is applied consistently across all installation paths. LGTM.
🤖 Update by isaaclab-review-bot
Update (6c1ad76): Reviewed new changes since 1499d08:
📁 New/Updated Files
| File | Status | Assessment |
|---|---|---|
source/isaaclab/setup.py |
Updated | ✅ Added numpy>=2,!=2.3.5 constraint |
source/isaaclab_rl/setup.py |
Updated | ✅ Added numpy>=2,!=2.3.5 constraint |
source/isaaclab_tasks/setup.py |
Updated | ✅ Added numpy>=2,!=2.3.5 constraint |
source/isaaclab_visualizers/setup.py |
Updated | ✅ Added numpy>=2,!=2.3.5 constraint |
isaaclab_rl/changelog.d/... |
Added | ✅ Changelog entry |
isaaclab_tasks/changelog.d/... |
Added | ✅ Changelog entry |
isaaclab_visualizers/changelog.d/... |
Added | ✅ Changelog entry |
🔍 Changes Analysis
-
Defense in Depth: The
!=2.3.5exclusion in all setup.py files adds another layer of protection by explicitly blocking the broken numpy version from pip's intermediate resolutions, not just forcing an upgrade at the end. -
Consistent Constraint Pattern: All subpackages now use consistent
numpy>=2,!=2.3.5specification with comments pointing to the main explanation insource/isaaclab/setup.py. -
Comprehensive Changelogs: Each affected package has its own changelog entry documenting the fix.
✅ Verdict
Excellent additions. This defense-in-depth approach is a good improvement over relying solely on the post-install upgrade. By excluding numpy 2.3.5 from the package requirements, pip will actively avoid it during resolution rather than potentially installing it and then upgrading. This makes the fix more robust across different installation scenarios. LGTM.
🤖 Update by isaaclab-review-bot
Update (eae5a01): Reviewed changes since 6c1ad76:
🔄 Strategy Change: From Forced Upgrade to Exclusion
The approach has been simplified from forcing numpy>=2.4.1 upgrade to exclusively using numpy!=2.3.5 exclusions across all install paths.
📁 Changes Summary
| File | Change | Assessment |
|---|---|---|
docker/Dockerfile.base |
ARM pre-install: numpy → numpy!=2.3.5 |
✅ Consistent exclusion |
docker/Dockerfile.curobo |
Removed post-install upgrade, added numpy!=2.3.5 to curobo install |
✅ Cleaner approach |
source/isaaclab/install.py |
Passes numpy!=2.3.5 in pip commands instead of forcing upgrade |
✅ Less aggressive |
source/isaaclab_mimic/setup.py |
Added numpy!=2.3.5 |
✅ New coverage |
source/isaaclab_teleop/setup.py |
Added numpy!=2.3.5 |
✅ New coverage |
source/isaaclab/changelog.d/... |
Updated with comprehensive list of exclusion sites | ✅ Thorough docs |
🔍 Analysis
This is a cleaner approach compared to the previous force-upgrade strategy:
- Less Intrusive: Exclusion (
!=2.3.5) lets pip choose any valid numpy version rather than forcing a specific minimum (>=2.4.1) - Better Compatibility: Won't conflict with other constraints that might cap numpy below 2.4
- Consistent Pattern: All sites now use the same
numpy!=2.3.5pattern - Comprehensive Coverage: Added exclusions to previously-missed packages (
isaaclab_mimic,isaaclab_teleop)
✅ Verdict
LGTM - The simplified exclusion-only approach is more robust and less likely to cause resolver conflicts. Good decision to pivot from the upgrade-forcing strategy. All install paths are now covered with the !=2.3.5 exclusion.
🤖 Update by isaaclab-review-bot
beab776 to
1499d08
Compare
1499d08 to
6c1ad76
Compare
numpy 2.3.5 ships a vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes Kit's libomni.platforminfo fork() during SimulationApp startup. The release is excluded at every site that pulls numpy directly or transitively, so no pip resolve during isaaclab.sh --install or any Docker image build can land on it -- even transiently: source/isaaclab/setup.py source/isaaclab_tasks/setup.py source/isaaclab_rl/setup.py source/isaaclab_visualizers/setup.py source/isaaclab_teleop/setup.py (transitive via dex-retargeting) source/isaaclab_mimic/setup.py (transitive via h5py) isaaclab.cli.commands.install._ensure_pink_ik_dependencies_installed isaaclab.cli.commands.install._maybe_preinstall_arm_nlopt docker/Dockerfile.base (ARM nlopt prep) docker/Dockerfile.curobo (ARM nlopt prep + nvidia-curobo install) Each touchpoint adds only the ``!=2.3.5`` exclusion; no other version constraints are introduced. Validated: - env_isaaclab_test smoke test (numpy 2.4.5 + cmeel pinocchio + pink + daqp + qpsolvers all import; toy IK solve OK). - IsaacLab Pink IK unit tests: 54/54 pass against numpy 2.4.5. - PR isaac-sim#5655 worst-case run (diagnostic imports numpy before pytest spawns Kit, the order that originally crashed): 36 pass / 0 fail. The isaaclab_physx surface gripper SIGSEGV is gone. Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
6c1ad76 to
eae5a01
Compare
Greptile SummaryThis PR adds
Confidence Score: 5/5Safe to merge. All ten install sites are consistently updated with the !=2.3.5 exclusion, and the final resolved numpy version (2.3.4) is validated by CI and Pink IK unit tests. The change is narrow and surgical — it adds a single version exclusion string at every install site that can pull numpy. The pre-existing state of some setup.py files (no >=2 lower bound on isaaclab_rl, isaaclab_visualizers, isaaclab_mimic, isaaclab_teleop) is unchanged and not introduced by this PR; those packages are always co-installed with isaaclab which enforces numpy>=2. The force-reinstall path in _ensure_pink_ik_dependencies_installed is correct — the combined constraint from pin-pink / cmeel-boost (numpy<2.4) and the new !=2.3.5 exclusion pins resolution to 2.3.4. All CI checks are green and 54/54 Pink IK tests pass. No files require special attention. All ten changed install sites apply the exclusion consistently. Important Files Changed
Reviews (1): Last reviewed commit: "[Fix] Exclude numpy 2.3.5 from every Isa..." | Re-trigger Greptile |
There was a problem hiding this comment.
Automated Review Summary
PR #5656: [Fix] Exclude numpy 2.3.5 from every IsaacLab install path
Commit: eb676ca
✅ Overview
This PR addresses a critical SIGSEGV crash caused by numpy 2.3.5's vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) registering a buggy pthread_atfork handler that crashes Kit's fork() during SimulationApp startup. The fix adds numpy!=2.3.5 exclusions at every install site that pulls numpy.
🔍 Key Findings
✅ Strengths
-
Comprehensive Coverage: All 10 install paths are addressed:
- 6
setup.pyfiles (isaaclab, isaaclab_tasks, isaaclab_rl, isaaclab_visualizers, isaaclab_teleop, isaaclab_mimic) - 2
install.pylocations (_maybe_preinstall_arm_nlopt,_ensure_pink_ik_dependencies_installed) - 2 Dockerfile sites (Dockerfile.base ARM prep, Dockerfile.curobo ARM + cuRobo install)
- 6
-
Correct Design Rationale: The PR correctly identifies that pip's resolver runs per-invocation, meaning constraints from already-installed packages are not enforced in subsequent resolves. The exclusion must appear at every numpy-pulling site.
-
Minimal Invasiveness: Using
!=2.3.5exclusion rather than forcing>=2.4avoids conflicts with cmeel-boost'snumpy<2.4cap, letting pip land on 2.3.4 naturally. -
Thorough Documentation:
- Comprehensive PR description explaining the root cause and design decisions
- Inline comments at each modification site referencing the main rationale
- Proper changelog entries for each affected package
-
Validated Approach: 54/54 Pink IK tests pass locally, Docker build logs audited to confirm no 2.3.5 installation, CI green.
⚠️ Acknowledged Maintenance Considerations
The PR honestly documents these ongoing maintenance requirements:
- New IsaacLab submodules with numpy deps must include the exclusion
- New Dockerfile pip lines pulling numpy need the constraint
- If numpy 2.3.6+ introduces a new regression in the 2.3.x line, additional exclusions would be needed
These are acceptable trade-offs for a targeted, surgical fix.
📋 Files Review
| Component | Files | Assessment |
|---|---|---|
| Core isaaclab | setup.py, install.py |
✅ Correct constraint placement with detailed comments |
| Sub-packages | 5 additional setup.py |
✅ Consistent pattern, appropriate cross-references |
| Docker | Dockerfile.base, Dockerfile.curobo |
✅ Mirrors Python install paths correctly |
| Changelogs | 6 .rst files |
✅ Proper format, clear explanations |
🧪 Test Coverage Assessment
Existing validation:
- Local Pink IK unit tests: 54/54 pass against numpy 2.4.5 (demonstrates ABI compatibility)
- Docker build audit: numpy 2.3.5 never installed, final state is 2.3.4
- PR #5655 diagnostic run: confirms the atfork crash is resolved
- CI: pre-commit, wheel build passing
Risk mitigation: The crash is immediately visible (SIGSEGV at startup), so any regression would be caught quickly rather than silently degrading functionality.
📌 Verdict
LGTM — This is a well-executed fix for a critical crash. The exclusion-based approach is appropriately surgical, the coverage is comprehensive across all install paths, and the documentation is thorough. The maintenance burden acknowledged by the PR is an acceptable trade-off for avoiding conflicts with upstream dependency caps.
🤖 Review generated by isaaclab-review-bot
There was a problem hiding this comment.
Automated Review Summary
PR #5656: [Fix] Exclude numpy 2.3.5 from every IsaacLab install path
Commit: b04f0da
✅ Overview
This PR addresses a critical SIGSEGV crash caused by numpy 2.3.5's vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) registering a buggy pthread_atfork handler that crashes Kit's fork() during SimulationApp startup. The fix adds numpy!=2.3.5 exclusions at every install site that pulls numpy.
🔍 Key Findings
✅ Strengths
-
Comprehensive Coverage: All 10 install paths are addressed:
- 6
setup.pyfiles (isaaclab, isaaclab_tasks, isaaclab_rl, isaaclab_visualizers, isaaclab_teleop, isaaclab_mimic) - 2
install.pylocations (_maybe_preinstall_arm_nlopt,_ensure_pink_ik_dependencies_installed) - 2 Dockerfile sites (Dockerfile.base ARM prep, Dockerfile.curobo ARM + cuRobo install)
- 6
-
Correct Design Rationale: The PR correctly identifies that pip's resolver runs per-invocation, meaning constraints from already-installed packages are not enforced in subsequent resolves. The exclusion must appear at every numpy-pulling site.
-
Minimal Invasiveness: Using
!=2.3.5exclusion rather than forcing>=2.4avoids conflicts with cmeel-boost'snumpy<2.4cap, letting pip land on 2.3.4 naturally. -
Thorough Documentation:
- Comprehensive PR description explaining the root cause and design decisions
- Inline comments at each modification site referencing the main rationale
- Proper changelog entries for each affected package
-
Validated Approach: 54/54 Pink IK tests pass locally, Docker build logs audited to confirm no 2.3.5 installation, CI green.
⚠️ Observations
-
Maintenance Considerations (minor): The PR honestly documents ongoing maintenance requirements—new submodules must include the exclusion. This is an acceptable trade-off for a targeted fix.
-
No Automated Version Guard (nit): There's no automated CI test asserting numpy != 2.3.5 at runtime. The PR's §8.1 proposes such a test for a future numpy>=2.4 upgrade but defers it as out of scope. Given that the crash is immediately visible (SIGSEGV), this is acceptable for now.
-
Transitive Dependency Paths Well-Mapped (positive): The PR correctly traces transitive paths—isaaclab_teleop via dex-retargeting→pin→cmeel-boost, isaaclab_mimic via h5py—and adds the exclusion at each terminal.
📋 Files Review
| Component | Files | Assessment |
|---|---|---|
| Core isaaclab | setup.py, install.py |
✅ Correct constraint placement with detailed comments |
| Sub-packages | 5 additional setup.py |
✅ Consistent pattern, appropriate cross-references |
| Docker | Dockerfile.base, Dockerfile.curobo |
✅ Mirrors Python install paths correctly |
| Changelogs | 6 .rst files |
✅ Proper format, clear explanations |
🧪 Test Coverage Assessment
Existing validation:
- Local Pink IK unit tests: 54/54 pass against numpy 2.4.5 (demonstrates ABI compatibility)
- Docker build audit: numpy 2.3.5 never installed, final state is 2.3.4
- PR #5655 diagnostic run: confirms the atfork crash is resolved
- CI: pre-commit, wheel build, changelog checks passing
Risk mitigation: The crash is immediately visible (SIGSEGV at startup), so any regression would be caught quickly rather than silently degrading functionality.
📌 Verdict
LGTM — This is a well-executed fix for a critical crash. The exclusion-based approach is appropriately surgical, the coverage is comprehensive across all install paths, and the documentation is thorough. The maintenance burden acknowledged by the PR is an acceptable trade-off for avoiding conflicts with upstream dependency caps.
🤖 Review generated by isaaclab-review-bot
1. Summary
Adds
numpy!=2.3.5at every install site that pulls numpy (10 touchpoints: 6setup.py, 2install.pypip argvs, 2 Dockerfile pip lines) so the broken release whose vendored OpenBLAS atfork handler crashes Kit'slibomni.platforminfofork()never lands in any pip resolve. Final image ends onnumpy 2.3.4; supersedes #5642.2. Sites covered
!=2.3.5is the only delta added at each site — no>=2constraints introduced where none existed.source/isaaclab/setup.pyINSTALL_REQUIRESsource/isaaclab_tasks/setup.pyINSTALL_REQUIRESsource/isaaclab_rl/setup.pyINSTALL_REQUIRESsource/isaaclab_visualizers/setup.pyINSTALL_REQUIRESsource/isaaclab_teleop/setup.pyINSTALL_REQUIRESentrydex-retargeting→pin→ cmeel-boostsource/isaaclab_mimic/setup.pyINSTALL_REQUIRESentryh5pysource/isaaclab/.../install.py_ensure_pink_ik_dependencies_installedpip argvpin pin-pink daqpsource/isaaclab/.../install.py_maybe_preinstall_arm_nloptpip argvdocker/Dockerfile.baseRUN pip install_maybe_preinstall_arm_nloptdocker/Dockerfile.curobonvidia-curoboinstallisaaclab_teleop's post---installreinstall inDockerfile.curobois covered by the teleopsetup.pyentry — no separate Dockerfile argv needed.3. Design notes
3.1 Why declared at every site, not just
source/isaaclab/setup.pypip's resolver runs per-invocation.
pip install -e <pkg>collects only<pkg>'s direct + transitive deps for that resolve — it does not re-enforce constraints declared by other already-installed packages. Sonumpy!=2.3.5inisaaclab/setup.pyis honored whenpip install -e source/isaaclabis the target, but ignored duringpip install -e source/isaaclab_teleop(which pulls cmeel-boost) and during the pin-pink force-reinstall. The!=2.3.5constraint has to appear in every numpy-pulling resolve.3.2 Why this lands on
2.3.4, not>=2.4pin-pinkpullspin→ cmeel-boost, whoseRequires-Dist: numpy >=2.3, <2.4is a hard upper bound. Combined with!=2.3.5, pip picks2.3.4. The atfork bug is exclusive to2.3.5's vendored OpenBLAS hash, so2.3.4is safe.3.3 Path to
numpy >= 2.4(out of scope here — see §8)Adding
numpy>=2.4in the same pip resolve conflicts with cmeel-boost's<2.4cap →ResolutionImpossible. The only way past it is a separatepip install --upgrade numpy>=2.4.1invocation that runs after cmeel-boost is already installed: pip prints one resolver warning about cmeel-boost's cap, then installsnumpy 2.4.5anyway (verified locally; numpy's stable C ABI since 2.0 keeps cmeel's compiled extensions working). That step is fragile in a refactor — see §8 for the design with safety nets if it's wanted.4. Validation
Single-variable A/B between #5655 (negative arm, no fix) and #5684 (positive arm, fix cherry-picked)
Both branched from develop (
a9b62101ca6) with an identical 1-filesource/conftest.pydiagnostic that prints the resolved numpy version + bundled OpenBLAS hash at pytest session start. The only difference between the two branches is the 10-site exclusion in this PR —git diffbetween them equals exactly this PR's patch.dep-manifest in every CI job confirms which numpy bundle actually landed:
numpy 2.3.5 + libscipy_openblas64_-fdde5778.so(the broken bundle).numpy 2.3.4 + libscipy_openblas64_-8fb3d286.so(the safe bundle).Result: 4 jobs flipped FAIL → PASS — direct causal evidence of the atfork bug being resolved by this PR:
isaaclab (core) [3/3]isaaclab_mimicisaaclab_teleoptest-skillgenisaaclab_physxlibomni.platforminfo.plugin.so!std::mersenne_twister_engine, signal 11)test_contact_sensor.pyassertion (develop-side regression, likely from ContactSensor #5422)isaaclab_newtonModuleNotFoundError: 'isaaclab_newton.test.mock_interfaces'(develop-side test packaging issue from #4722)Negative-arm crash backtrace example (from
isaaclab_physx):Positive-arm same job: dep-manifest shows
numpy 2.3.4 + -8fb3d286, all tests run to completion, no SIGSEGV anywhere in the log.Cross-arm fail-on-both jobs (
isaaclab_tasks [1/3],[2/3],[3/3],test-curobo) fail for develop-side reasons independent of OpenBLAS — preset / rendering / curobo test regressions that need separate triage outside this PR's scope.Earlier local + per-version evidence (kept for completeness)
env_isaaclab_test):numpy 2.4.5smoke + 54/54 IsaacLab Pink IK unit tests pass (test_pink_ik_components.py21/21,test_local_frame_task.py24/24,test_null_space_posture_task.py9/9). The 2.3.4 surface is structurally the same — only the bundled openblas hash differs.numpyinstall/uninstall line confirmed; final statenumpy 2.3.4; noSuccessfully installed ... numpy-2.3.5anywhere.libcarb.tasking.plugin.so!make_fcontextflake).5. Risk
2.3.4is point-in-time safe. If numpy 2.3.6 ships a fresh regression in the 2.3.x line, another exclusion would need to be added at every site.!=2.3.5. Same risk as any per-package constraint.pip installline in a Dockerfile that pulls numpy needs the argv too.6. Dependencies
Single-variable A/B that validates this PR
Two paired diagnostic PRs branching off the same develop SHA, where the only delta between them is the 10-site exclusion in this PR:
source/conftest.pydep-manifest dump. Expected: numpy 2.3.5 +libscipy_openblas64_-fdde5778.so;isaaclab_physx/isaaclab_newton/isaaclab_rlSIGSEGV inlibomni.platforminfofork().eae5a01c+ identicalsource/conftest.py. Expected: numpy 2.3.4 +libscipy_openblas64_-8fb3d286.so; the three canary jobs pass cleanly.git diffbetween the two arms equals exactly this PR's 10-site exclusion patch — no other code, action.yml, or install overrides. The canary FAIL → PASS flip with only that delta is the causal proof.Both diagnostic PRs are do-not-merge; close after evidence is captured.
7. Test plan
./isaaclab.sh -f) cleannumpy 2.3.5never appears8. Out of scope / follow-ups
8.1 Upgrade to
numpy >= 2.4A defense-in-depth bump to
numpy 2.4.5(which ships the upstream OpenBLAS atfork fix and is forward-compatible with future 2.3.x regressions) requires a separate post-install upgrade step, because the cmeel-boost cap blocks>=2.4in any single pip resolve. Recommended shape if/when it's wanted:_ensure_numpy_above_openblas_atfork_bug()ininstall.py, called fromcommand_installafter_ensure_pink_ik_dependencies_installedRUN ... pip install --upgrade "numpy>=2.4.1"at end ofDockerfile.curobo(after teleop + curobo installs)--installand would otherwise drop backsource/isaaclab/test/install_ci/test_numpy_version.pyassertingnumpy.__version__ >= "2.4.1"The CI assertion is the key piece: without it, the post-install upgrade is "separate install that can easily get lost." With it, any regression is loud.
8.2 Upstream relaxation of cmeel-boost cap
Cleanest long-term fix: file an issue with cmake-wheel/cmeel-boost asking for
Requires-Dist: numpy >=2.3(drop the<2.4upper bound). numpy 2.x ABI has been stable since 2.0, so the cap is over-conservative. Once relaxed upstream, IsaacLab can resolve tonumpy >= 2.4natively without the post-install dance.References
Checklist
pre-commitchecks with./isaaclab.sh --format