Bug 2212215

Summary: [RHEL8.9] all test cases in perftest failed with "Unexpected CM event bl blka 7" error on iRDMA RoCE devices
Product: Red Hat Enterprise Linux 8 Reporter: Brian Chae <bchae>
Component: perftestAssignee: Kamal Heib <kheib>
Status: CLOSED NOTABUG QA Contact: Infiniband QE <infiniband-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.9CC: dacampbe, dledford, hwkernel-mgr, rdma-dev-team, tmichael
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-19 02:24:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian Chae 2023-06-04 20:24:20 UTC
Description of problem:

All of perftest testcases failed with the following error:

Unexpected CM event bl blka 7
 Unable to perform rdma_client function
 Unable to init the socket connection

and return code of 1.


perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7883970:
4.18.0-492.el8.x86_64, rdma-core-44.0-2.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      FAIL |      1 | ib_read_bw RC
      FAIL |      1 | ib_read_lat RC
      FAIL |      1 | ib_send_bw RC
      FAIL |      1 | ib_send_lat RC
      FAIL |      1 | ib_write_bw RC
      FAIL |      1 | ib_write_lat RC

This is a regression from RHEL-8.8.0-20230228.22.




Version-Release number of selected component (if applicable):

Clients: rdma-qe-39
Servers: rdma-qe-38

DISTRO=RHEL-8.9.0-20230521.41

+ [23-05-24 12:44:25] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

+ [23-05-24 12:44:25] uname -a
Linux rdma-qe-39.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-24 12:44:25] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=94414d8d-4218-4f56-85b5-9b558923a596 ro crashkernel=auto resume=UUID=bdafcf70-7355-4bad-b6ae-07711eee4ce1 console=ttyS0,115200n81

+ [23-05-24 12:44:25] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch

+ [23-05-24 12:44:25] tail /sys/class/infiniband/irdma0/fw_ver /sys/class/infiniband/irdma1/fw_ver
==> /sys/class/infiniband/irdma0/fw_ver <==
1.57

==> /sys/class/infiniband/irdma1/fw_ver <==
1.57

+ [23-05-24 12:44:25] lspci
+ [23-05-24 12:44:25] grep -i -e ethernet -e infiniband -e omni -e ConnectX
41:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
41:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
c1:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
c1:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe


+ [23-05-24 12:44:25] rpm -q perftest
perftest-4.5.0.20-4.el8.x86_64



How reproducible:

100%



Steps to Reproduce:
1. On the server host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 

2. On the client host, issue the following perftest commands

timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_write_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39







3.

Actual results:

All of the above perftest commands resulted in the same errors:

+ [23-05-24 12:44:29] timeout 3m ib_read_bw -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 12:44:29] RQA_check_result -r 1 -t 'ib_read_bw RC'




+ [23-05-24 12:48:01] timeout 3m ib_read_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 12:48:01] RQA_check_result -r 1 -t 'ib_read_lat RC'


Down to the last in the test sequence...

+ [23-05-24 13:00:01] timeout 3m ib_write_lat -a -c RC -d irdma0 -i 1 -F -R 172.31.45.39
Unexpected CM event bl blka 7                  <<<==================
 Unable to perform rdma_client function        <<<==================
 Unable to init the socket connection          <<<==================
+ [23-05-24 13:00:01] RQA_check_result -r 1 -t 'ib_write_lat RC'


Expected results:

With RHEL-8.8.0-20230531.2 build, the following results are expected.


perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7927291:
4.18.0-477.10.1.el8_8.x86_64, rdma-core-44.0-2.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      FAIL |    135 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      FAIL |    135 | ib_write_lat RC


Refer to the following beaker test job ID:

https://beaker.engineering.redhat.com/jobs/7927291

for perftest : https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/06/79272/7927291/14023642/161221433/754135404/resultoutputfile.log



Additional info:

Comment 2 Afom T. Michael 2023-06-17 15:31:39 UTC
By applying "limit inline data size" to 96 (-I 96), tests are consistently passing with perftest-23.04.0.0.23-1.el8.

Some examples of test commands on client side are as follows:
  $ timeout 3m ib_send_bw -a -c RC -d irdma1 -i 1 -F -R -I 96 172.31.50.38
  $ timeout 3m ib_write_lat -a -c RC -d irdma1 -i 1 -F -R -I 96 172.31.50.38
  $ timeout 3m ib_send_bw -a -c RC -d irdma0 -i 1 -F -R -I 96 172.31.45.38
  $ timeout 3m ib_send_lat -a -c RC -d irdma0 -i 1 -F -R -I 96 172.31.45.38

perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7983292:
4.18.0-497.el8.x86_64, rdma-core-46.0-1.el8.1, i40e, iw, E810-XXV & irdma1
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures

perftest test results on rdma-qe-38/rdma-qe-39 & Beaker job J:7983292:
4.18.0-497.el8.x86_64, rdma-core-46.0-1.el8.1, i40e, roce.45, E810-XXV & irdma0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
Checking for failures and known issues:
  no test failures

Comment 3 Kamal Heib 2023-06-19 02:24:31 UTC
Closing this bz as NOTABUG.