Bug 2212555 - [RHEL-9.3] all test cases in perftest failed with "Unexpected CM event bl blka 7" error on iRDMA RoCE devices
Summary: [RHEL-9.3] all test cases in perftest failed with "Unexpected CM event bl blk...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: perftest
Version: 9.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kamal Heib
QA Contact: Brian Chae
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-05 21:11 UTC by Brian Chae
Modified: 2023-06-26 14:23 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-26 14:23:26 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-159027 0 None None None 2023-06-06 02:33:44 UTC

Description Brian Chae 2023-06-05 21:11:09 UTC
Description of problem:

All of perftest testcases failed with the following error:

Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection

and return code of 1.


perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7886455:
5.14.0-316.el9.x86_64, rdma-core-44.0-2.el9, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      FAIL |      1 | ib_read_bw RC
      FAIL |      1 | ib_read_lat RC
      FAIL |      1 | ib_send_bw RC
      FAIL |      1 | ib_send_lat RC
      FAIL |      1 | ib_write_bw RC
      FAIL |      1 | ib_write_lat RC

This is regression from RHEL-9.2.0-20230228.28.



Version-Release number of selected component (if applicable):

Clients: rdma-qe-39
Servers: rdma-qe-38

DISTRO=RHEL-9.3.0-20230521.45

+ [23-05-25 12:23:55] cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 Beta (Plow)

+ [23-05-25 12:23:55] uname -a
Linux rdma-qe-39.rdma.lab.eng.rdu2.redhat.com 5.14.0-316.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 19 13:18:40 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-25 12:23:55] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-316.el9.x86_64 root=UUID=d310605f-8ec9-46e6-ae94-14a823879fce ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=8089ef9b-fb2f-45c9-861e-a5781cbf8740 console=ttyS0,115200n81

+ [23-05-25 12:23:55] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el9.x86_64
linux-firmware-20230404-134.el9.noarch

+ [23-05-25 12:23:55] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
==> /sys/class/infiniband/irdma0/fw_ver <==
1.57

==> /sys/class/infiniband/irdma1/fw_ver <==
1.57

+ [23-05-25 12:23:55] lspci
+ [23-05-25 12:23:55] grep -i -e ethernet -e infiniband -e omni -e ConnectX
41:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
41:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
c1:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
c1:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe

+ [23-05-25 12:23:55] rpm -q perftest
perftest-4.5.0.20-4.el9.x86_64


How reproducible:

100%

Steps to Reproduce:
1. On the server host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 

2. On the client host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38

Actual results:

All of the above perftest commands resulted in the same errors:

+ [23-05-25 12:23:59] timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection
+ [23-05-25 12:23:59] RQA_check_result -r 1 -t 'ib_read_bw RC'


+ [23-05-25 12:27:30] timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection
+ [23-05-25 12:27:30] RQA_check_result -r 1 -t 'ib_read_lat RC'



+ [23-05-25 12:29:59] timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection
+ [23-05-25 12:29:59] RQA_check_result -r 1 -t 'ib_send_bw RC'


Down to the last in the test sequence...


+ [23-05-25 12:39:30] timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.38
Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection
+ [23-05-25 12:39:30] RQA_check_result -r 1 -t 'ib_write_lat RC'



Expected results:



perftest test results on rdma-dev-30/rdma-dev-31 & Beaker job J:7586502:
5.14.0-283.el9.x86_64, rdma-core-44.0-2.el9, i40e, iw, E810-C & irdma1
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      FAIL |    135 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      FAIL |    135 | ib_write_lat RC



https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/03/75865/7586502/13484472/157019398/results_perftest.txt


Additional info:

Comment 2 Brian Chae 2023-06-26 14:23:26 UTC
Per Afom,

Hello all,

Similar to RHEL-8.9, RHEL-9.3 perftest on iRDMA & perftest-23.04.0.0.23-1.el9 is also passing with ""limit inline data size" update to our test script. I am going to update the BZ with this and we can probably close the BZ.
perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7981566:
5.14.0-327.el9.x86_64, rdma-core-46.0-1.el9, i40e, iw, E810-XXV & irdma1
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures

perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7981566:
5.14.0-327.el9.x86_64, rdma-core-46.0-1.el9, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures
Thanks,
Afom

So, we can close this bug...


Note You need to log in before you can comment on or make changes to this bug.