[24.04_linux-nvidia-6.17-next] NVIDIA: VR: SAUCE: cxl: guard unlinked endpoints by nirmoy · Pull Request #465 · NVIDIA/NV-Kernels

nirmoy · 2026-06-15T14:46:20Z

Summary

Fix NVBug 6274048 for 24.04_linux-nvidia-6.17-next.

Launchpad: https://bugs.launchpad.net/bugs/2143032

cxlmd->endpoint starts as ERR_PTR(-ENXIO) until endpoint port registration completes. Guard CXL helper paths with IS_ERR_OR_NULL() before dereferencing it.

BOS note

I did not find evidence that this 6.17-next boot/probe NULL dereference has reproduced on BOS. BOS CXL Type-2/reset coverage is tracked separately:

PR #448, with BOS remote-branch commits 47dcd72db344a (CONFIG_VFIO_CXL_CORE) and 07fa68eb25751 (vfio_cxl_reset()).
PR #440, with BOS remote-branch commit 5eaafc5097618 (PCI/CXL: Hide SBR from reset_methods if masked by CXL).

Those SHAs are on upstream/26.04_linux-nvidia-bos. If the same early cxlmd->endpoint access is reproduced on BOS, backport this guard there too.

Testing

git diff --check
scripts/checkpatch.pl --strict --no-tree: clean
Focused CXL object build passed in a clean temp worktree with CXL options enabled; CONFIG_WERROR was disabled for an existing unrelated enum cxl_regloc_type warning.

nirmoy · 2026-06-15T14:53:51Z

BaseOS Kernel Review

Summary

No issues found across the reviewed commits.

Findings: no problems found

Latest watcher review: open review

Kernel deb build: failed (failure log, build artifacts)

Head: 2be47e06e588

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

github-actions · 2026-06-15T14:56:46Z

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ✅ All checks passed

Details

Checking 1 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2be47e06e588 │ [SAUCE] cxl: guard unlinked memdev endpoints                     │ N/A        │ N/A     │ nirmoyd                   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

cxlmd->endpoint starts as ERR_PTR(-ENXIO) until endpoint port registration links the memdev to a real cxl_port. Treat NULL and error pointers as "endpoint not linked" before dereferencing cxlmd->endpoint in CXL helper paths. Fixes: eb61834 ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation") Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>

nvmochs · 2026-06-25T19:03:15Z

No issues with this patch.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

clsotog

Acked-by: Carol L Soto <csoto@nvidia.com>

sforshee

Patch looks good.

Acked-by: Seth Forshee <sforshee@nvidia.com>

nvmochs · 2026-06-25T23:01:20Z

Merged, closing PR.

aff4ccee4530 NVIDIA: VR: SAUCE: cxl: Guard unlinked memdev endpoints

The current code updates the tail call counter (TCC) using a pre-increment approach, it stores the incremented value back to memory before performing any boundary or target validation checks. This causes two major issues: 1. When a tail call fails because the target program is NULL, the TCC is incorrectly incremented and saved in memory anyway. 2. This dummy increment implicitly consumes one slot of the allowed tail call budget. As a result, the subsequent loop reaches the maximum limit prematurely, leading to a test failure where the actual loop count is 32 instead of the expected 33. Fix this by deferring the counter update. Change the branch condition to BPF_JSGE (greater or equal) so that we check the boundary first. The TCC is only incremented and stored back to memory after the boundary check and the NULL-target check both pass. Before: $ sudo ./test_progs -t tailcalls/tailcall_3 ... test_tailcall_count:FAIL:tailcall count unexpected tailcall count: actual 32 != expected 33 ... #465/3 tailcalls/tailcall_3:FAIL #465 tailcalls:FAIL After: $ sudo ./test_progs -t tailcalls/tailcall_3 #465/3 tailcalls/tailcall_3:OK #465 tailcalls:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Fixes: c0fcc95 ("LoongArch: BPF: Fix the tailcall hierarchy") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

nirmoy force-pushed the codex/nvbug6274048-cxl-endpoint-guard-6.17-next branch from 5dd7359 to a945fdb Compare June 15, 2026 15:07

nirmoy changed the title ~~[24.04_linux-nvidia-6.17-next] cxl: guard unlinked memdev endpoints~~ [24.04_linux-nvidia-6.17-next] NVIDIA: VR: SAUCE: cxl: guard unlinked endpoints Jun 15, 2026

nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from 7a62271 to 51267da Compare June 19, 2026 12:02

nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch from 4a7f97e to 2333f65 Compare June 25, 2026 12:07

nirmoy force-pushed the codex/nvbug6274048-cxl-endpoint-guard-6.17-next branch from a945fdb to 2be47e0 Compare June 25, 2026 16:18

nirmoy marked this pull request as ready for review June 25, 2026 16:34

nirmoy requested review from clsotog and nvmochs June 25, 2026 16:35

nirmoy added the help wanted Extra attention is needed label Jun 25, 2026

nirmoy added the has_1_ack label Jun 25, 2026

clsotog approved these changes Jun 25, 2026

View reviewed changes

nirmoy added has_2_acks and removed help wanted Extra attention is needed has_1_ack labels Jun 25, 2026

sforshee approved these changes Jun 25, 2026

View reviewed changes

nvmochs closed this Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[24.04_linux-nvidia-6.17-next] NVIDIA: VR: SAUCE: cxl: guard unlinked endpoints#465

[24.04_linux-nvidia-6.17-next] NVIDIA: VR: SAUCE: cxl: guard unlinked endpoints#465
nirmoy wants to merge 1 commit into
NVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:codex/nvbug6274048-cxl-endpoint-guard-6.17-next

nirmoy commented Jun 15, 2026 •

edited

Loading

Uh oh!

nirmoy commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

nvmochs commented Jun 25, 2026

Uh oh!

clsotog left a comment

Uh oh!

sforshee left a comment

Uh oh!

nvmochs commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

nirmoy commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

BOS note

Testing

Uh oh!

nirmoy commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BaseOS Kernel Review

Summary

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Validation Report

Patchscan ✅ No Missing Fixes

PR Lint ✅ All checks passed

Uh oh!

nvmochs commented Jun 25, 2026

Uh oh!

clsotog left a comment

Choose a reason for hiding this comment

Uh oh!

sforshee left a comment

Choose a reason for hiding this comment

Uh oh!

nvmochs commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nirmoy commented Jun 15, 2026 •

edited

Loading

nirmoy commented Jun 15, 2026 •

edited

Loading

github-actions Bot commented Jun 15, 2026 •

edited

Loading