[24.04_linux-nvidia-6.17-next] NVIDIA: VR: SAUCE: cxl: guard unlinked endpoints#465
Closed
nirmoy wants to merge 1 commit into
Closed
Conversation
Collaborator
Author
BaseOS Kernel ReviewSummaryNo issues found across the reviewed commits. Findings: no problems found Latest watcher review: open review Kernel deb build: failed (failure log, build artifacts) Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
Contributor
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 1 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2be47e06e588 │ [SAUCE] cxl: guard unlinked memdev endpoints │ N/A │ N/A │ nirmoyd │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
5dd7359 to
a945fdb
Compare
7a62271 to
51267da
Compare
4a7f97e to
2333f65
Compare
cxlmd->endpoint starts as ERR_PTR(-ENXIO) until endpoint port registration links the memdev to a real cxl_port. Treat NULL and error pointers as "endpoint not linked" before dereferencing cxlmd->endpoint in CXL helper paths. Fixes: eb61834 ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation") Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
a945fdb to
2be47e0
Compare
Collaborator
|
No issues with this patch.
|
clsotog
approved these changes
Jun 25, 2026
clsotog
left a comment
Collaborator
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
sforshee
approved these changes
Jun 25, 2026
sforshee
left a comment
Collaborator
There was a problem hiding this comment.
Patch looks good.
Acked-by: Seth Forshee <sforshee@nvidia.com>
Collaborator
|
Merged, closing PR. |
nvidia-bfigg
pushed a commit
that referenced
this pull request
Jun 27, 2026
The current code updates the tail call counter (TCC) using a pre-increment approach, it stores the incremented value back to memory before performing any boundary or target validation checks. This causes two major issues: 1. When a tail call fails because the target program is NULL, the TCC is incorrectly incremented and saved in memory anyway. 2. This dummy increment implicitly consumes one slot of the allowed tail call budget. As a result, the subsequent loop reaches the maximum limit prematurely, leading to a test failure where the actual loop count is 32 instead of the expected 33. Fix this by deferring the counter update. Change the branch condition to BPF_JSGE (greater or equal) so that we check the boundary first. The TCC is only incremented and stored back to memory after the boundary check and the NULL-target check both pass. Before: $ sudo ./test_progs -t tailcalls/tailcall_3 ... test_tailcall_count:FAIL:tailcall count unexpected tailcall count: actual 32 != expected 33 ... #465/3 tailcalls/tailcall_3:FAIL #465 tailcalls:FAIL After: $ sudo ./test_progs -t tailcalls/tailcall_3 #465/3 tailcalls/tailcall_3:OK #465 tailcalls:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Fixes: c0fcc95 ("LoongArch: BPF: Fix the tailcall hierarchy") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix NVBug 6274048 for
24.04_linux-nvidia-6.17-next.Launchpad: https://bugs.launchpad.net/bugs/2143032
cxlmd->endpointstarts asERR_PTR(-ENXIO)until endpoint port registration completes. Guard CXL helper paths withIS_ERR_OR_NULL()before dereferencing it.BOS note
I did not find evidence that this 6.17-next boot/probe NULL dereference has reproduced on BOS. BOS CXL Type-2/reset coverage is tracked separately:
CONFIG_VFIO_CXL_CORE) and 07fa68eb25751 (vfio_cxl_reset()).PCI/CXL: Hide SBR from reset_methods if masked by CXL).Those SHAs are on
upstream/26.04_linux-nvidia-bos. If the same earlycxlmd->endpointaccess is reproduced on BOS, backport this guard there too.Testing
git diff --checkscripts/checkpatch.pl --strict --no-tree: cleanCONFIG_WERRORwas disabled for an existing unrelatedenum cxl_regloc_typewarning.