Bug 2050979

Summary: [RHEL8.6] when qperf tests are run on ALL ROCE & QEDE IW devices, segfault core dumps generated in the server host
Product: Red Hat Enterprise Linux 8 Reporter: Brian Chae <bchae>
Component: qperfAssignee: Nobody <nobody>
Status: CLOSED WONTFIX QA Contact: Infiniband QE <infiniband-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.6CC: akarlsso, dledford, hwkernel-mgr, rdma-dev-team
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1950272 Environment:
Last Closed: 2023-08-05 07:28:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1950272    
Bug Blocks: 1903942, 2049647    

Comment 1 Brian Chae 2022-02-05 14:00:20 UTC
Additional info on RHEL8.6.0 qperf test results.
All of the tests passed unlike in RHEL8.4.

Test results for qperf on rdma-virt-01:
4.18.0-363.el8.x86_64, rdma-core-37.2-1.el8, mlx4, roce.45, & mlx4_1
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ping server
      PASS |      0 | conf
      PASS |      0 | rc_bi_bw
      PASS |      0 | rc_bw
      PASS |      0 | rc_lat
      PASS |      0 | rc_rdma_read_bw
      PASS |      0 | rc_rdma_read_lat
      PASS |      0 | rc_rdma_write_bw
      PASS |      0 | rc_rdma_write_lat
      PASS |      0 | rc_rdma_write_poll_lat
      PASS |      0 | rc_compare_swap_mr
      PASS |      0 | rc_fetch_add_mr
      PASS |      0 | ver_rc_compare_swap
      PASS |      0 | ver_rc_fetch_add
      PASS |      0 | quit

Checking for failures and known issues:
  no test failures


Yet the same segfaults in the server host as stated in the problem description.

Comment 4 Brian Chae 2023-06-05 10:58:57 UTC
As of RHEL-8.9, the qperf cores files, as well, when qperf test was run on iRDMA iWARP device.


TIME                            PID   UID   GID SIG COREFILE  EXE
Sun 2023-06-04 13:57:53 EDT  104747     0     0  11 none      /usr/bin/qperf
Sun 2023-06-04 13:57:56 EDT  104750     0     0  11 none      /usr/bin/qperf
Sun 2023-06-04 13:57:58 EDT  104762     0     0  11 none      /usr/bin/qperf
Sun 2023-06-04 13:58:02 EDT  104778     0     0  11 none      /usr/bin/qperf
Sun 2023-06-04 13:58:04 EDT  104781     0     0  11 none      /usr/bin/qperf
total 0
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)


Clients: rdma-qe-38
Servers: rdma-qe-39

DISTRO=RHEL-8.9.0-20230521.41

+ [23-06-04 13:56:08] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

+ [23-06-04 13:56:08] uname -a
Linux rdma-qe-39.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-06-04 13:56:08] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=429d9f1d-500d-4006-a3b5-2455ab53eebb ro crashkernel=auto resume=UUID=f4509d89-d3e0-454b-8fe1-f347d8d3cbf9 console=ttyS0,115200n81

+ [23-06-04 13:56:08] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch

+ [23-06-04 13:56:08] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
==> /sys/class/infiniband/irdma0/fw_ver <==
1.57

==> /sys/class/infiniband/irdma1/fw_ver <==
1.57

+ [23-06-04 13:56:08] lspci
+ [23-06-04 13:56:08] grep -i -e ethernet -e infiniband -e omni -e ConnectX
41:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
41:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
c1:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
c1:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe

Comment 6 RHEL Program Management 2023-08-05 07:28:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.