Bug 2212215 - [RHEL8.9] all test cases in perftest failed with "Unexpected CM event bl blka 7" error on iRDMA RoCE devices
Summary: [RHEL8.9] all test cases in perftest failed with "Unexpected CM event bl blka...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: perftest
Version: 8.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kamal Heib
QA Contact: Infiniband QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-04 20:24 UTC by Brian Chae
Modified: 2023-06-26 14:08 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-19 02:24:31 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-158937 0 None None None 2023-06-04 20:29:44 UTC

Description Brian Chae 2023-06-04 20:24:20 UTC
Description of problem:

All of perftest testcases failed with the following error:

Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection

and return code of 1.


perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7883970:
4.18.0-492.el8.x86_64, rdma-core-44.0-2.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      FAIL |      1 | ib_read_bw RC
      FAIL |      1 | ib_read_lat RC
      FAIL |      1 | ib_send_bw RC
      FAIL |      1 | ib_send_lat RC
      FAIL |      1 | ib_write_bw RC
      FAIL |      1 | ib_write_lat RC

This is a regression from RHEL-8.8.0-20230228.22.




Version-Release number of selected component (if applicable):

Clients: rdma-qe-39
Servers: rdma-qe-38

DISTRO=RHEL-8.9.0-20230521.41

+ [23-05-24 12:44:25] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

+ [23-05-24 12:44:25] uname -a
Linux rdma-qe-39.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-24 12:44:25] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=94414d8d-4218-4f56-85b5-9b558923a596 ro crashkernel=auto resume=UUID=bdafcf70-7355-4bad-b6ae-07711eee4ce1 console=ttyS0,115200n81

+ [23-05-24 12:44:25] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch

+ [23-05-24 12:44:25] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
==> /sys/class/infiniband/irdma0/fw_ver <==
1.57

==> /sys/class/infiniband/irdma1/fw_ver <==
1.57

+ [23-05-24 12:44:25] lspci
+ [23-05-24 12:44:25] grep -i -e ethernet -e infiniband -e omni -e ConnectX
41:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
41:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
c1:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
c1:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe


+ [23-05-24 12:44:25] rpm -q perftest
perftest-4.5.0.20-4.el8.x86_64



How reproducible:

100%



Steps to Reproduce:
1. On the server host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 

2. On the client host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39







3.

Actual results:

All of the above perftest commands resulted in the same errors:

+ [23-05-24 12:44:29] timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 12:44:29] RQA_check_result -r 1 -t 'ib_read_bw RC'




+ [23-05-24 12:48:01] timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 12:48:01] RQA_check_result -r 1 -t 'ib_read_lat RC'


Down to the last in the test sequence...

+ [23-05-24 13:00:01] timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 13:00:01] RQA_check_result -r 1 -t 'ib_write_lat RC'


Expected results:

With RHEL-8.8.0-20230531.2 build, the following results are expected.


perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7927291:
4.18.0-477.10.1.el8_8.x86_64, rdma-core-44.0-2.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      FAIL |    135 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      FAIL |    135 | ib_write_lat RC


Refer to the following beaker test job ID:

https://beaker.engineering.redhat.com/jobs/7927291

for perftest : https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/06/79272/7927291/14023642/161221433/754135404/resultoutputfile.log



Additional info:

Comment 2 Afom T. Michael 2023-06-17 15:31:39 UTC
By applying "limit inline data size" to 96 (-I 96), tests are consistently passing with perftest-23.04.0.0.23-1.el8.

Some examples of test commands on client side are as follows:
  $ timeout 3m ib_send_bw -a -c RC -d irdma1 -i 1 -F -R -I 96 172.31.50.38
  $ timeout 3m ib_write_lat -a -c RC -d irdma1 -i 1 -F -R -I 96 172.31.50.38
  $ timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R -I 96 172.31.45.38
  $ timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R -I 96 172.31.45.38

perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7983292:
4.18.0-497.el8.x86_64, rdma-core-46.0-1.el8.1, i40e, iw, E810-XXV & irdma1
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures

perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7983292:
4.18.0-497.el8.x86_64, rdma-core-46.0-1.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures

Comment 3 Kamal Heib 2023-06-19 02:24:31 UTC
Closing this bz as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.