Skip to content

perftest: add support for rocm dmabuf for additional operating systems#392

Open
gigabyte132 wants to merge 1 commit into
linux-rdma:masterfrom
gigabyte132:master
Open

perftest: add support for rocm dmabuf for additional operating systems#392
gigabyte132 wants to merge 1 commit into
linux-rdma:masterfrom
gigabyte132:master

Conversation

@gigabyte132

@gigabyte132 gigabyte132 commented Jun 9, 2026

Copy link
Copy Markdown

Some operating systems, especially the ones designed for running containers like rhel bootc or fedoracoreos don't expose "/boot/config-%s" but rather paths like /lib/modules/%s/config . This has already been fixed upstream in ROCM see: https://github.com/ROCm/rocm-systems/blob/develop/projects/rccl/src/misc/rocmwrap.cc#L282 .

This PR adds DMABUF support in rdma-perftest for additional os's by looping over some common paths where the /boot/config-%s equivalent lives

additionally it checks /proc/kallsyms for exported kernel symbols if none of the files match/are available, which is especially the case in containerized environments. This also follows the implemetation at ROCm upstream https://github.com/ROCm/rocm-systems/blob/develop/projects/rccl/src/misc/rocmwrap.cc#L367

Signed-off-by: Raulian-Ionut Chiorescu <raulian-ionut.chiorescu@cern.ch>
@gigabyte132

gigabyte132 commented Jun 10, 2026

Copy link
Copy Markdown
Author

Validated perftest with a test inside a containerized environment with W7900 Radeon Pro GPUs and 200G RoCEv2 capable NICs

./ib_write_bw -d bnxt_re0 --use_rocm=0 -a -x 3 --report_gbits -F --use_rocm_dmabuf

Using ROCm Device with ID: 0, Name: AMD Radeon Pro W7900, PCI Bus ID: 0x43, GCN Arch: gfx1100
using DMA-BUF for GPU buffer address at 0x7f11f6800000 aligned at 0x7f11f6800000 with aligned size 16777216
dmabuf export addr 0x7f11f6800000 16777216 to dmabuf fd 8 offset 0
allocated 16777216 bytes of GPU buffer at 0x7f11f6800000
Calling ibv_reg_dmabuf_mr(offset=0, size=16777216, addr=0x7f11f6800000, fd=8) for QP #0
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : bnxt_re0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON           Lock-free      : OFF
 ibv_wr* API     : OFF          Using Enhanced Reorder      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 CQE Poll Batch  : Dynamic
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Use ROCm memory : ON
 Data ex. method : Ethernet
 NUMA node       : 1
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x2c01 PSN 0x86d7e9 RKey 0x2000511 VAddr 0x007f11f7000000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:188:185:190:157
 remote address: LID 0000 QPN 0x2c01 PSN 0x6095ee RKey 0x2000413 VAddr 0x007f8626000000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:188:185:190:156
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          5000           0.053549            0.053235            3.327170
 4          5000             0.11               0.11                 3.283084
 8          5000             0.21               0.21                 3.316507
 16         5000             0.43               0.43                 3.323857
 32         5000             0.86               0.85                 3.323319
 64         5000             1.72               1.67                 3.270231
 128        5000             3.47               3.43                 3.346796
 256        5000             6.89               6.85                 3.343944
 512        5000             13.58              13.40                3.271659
 1024       5000             27.38              26.89                3.282913
 2048       5000             54.32              54.05                3.299028
 4096       5000             109.08             104.07               3.176014
 8192       5000             117.11             113.05               1.725054
 16384      5000             127.53             119.73               0.913483
 32768      5000             181.48             138.25               0.527373
 65536      5000             193.12             160.56               0.306241
 131072     5000             193.88             176.58               0.168398
 262144     5000             194.08             185.45               0.088429
 524288     5000             194.16             190.31               0.045373
 1048576    5000             194.19             192.74               0.022976
 2097152    5000             194.20             194.03               0.011565
 4194304    5000             194.68             194.67               0.005802
 8388608    5000             195.00             195.00               0.002906
---------------------------------------------------------------------------------------
deallocating GPU buffer 0x7f11f6800000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant