Bug 2211665 - [RHEL8.9] perftest fails on "ib_write_lat RC" test when tested on bnxt RoCE, BCM57504
Summary: [RHEL8.9] perftest fails on "ib_write_lat RC" test when tested on bnxt RoCE, ...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: perftest
Version: 8.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kamal Heib
QA Contact: Brian Chae
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-01 11:41 UTC by Brian Chae
Modified: 2023-06-28 19:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-158798 0 None None None 2023-06-01 11:44:09 UTC

Description Brian Chae 2023-06-01 11:41:30 UTC
Description of problem:

"ib_write_lat RC" test fail consistently when perftest is tested on BCM57504 - it failed due to 3-minute timer, which means the data throughput performance was very low, compared with the same test over other bnxt RoCE devices, like BCM57414.


Version-Release number of selected component (if applicable):

Clients: rdma-qe-35
Servers: rdma-qe-34

DISTRO=RHEL-8.9.0-20230521.41

+ [23-05-27 15:56:21] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

+ [23-05-27 15:56:21] uname -a
Linux rdma-qe-35.rdma.lab.eng.rdu2.redhat.com 4.18.0-492.el8.x86_64 #1 SMP Tue May 9 14:50:21 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-27 15:56:21] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-492.el8.x86_64 root=UUID=aeafc44d-6857-41fb-b528-f3a2697fa426 ro crashkernel=auto resume=UUID=e29086f9-4a66-4501-b566-8a60ef6e209a console=ttyS0,115200n81

+ [23-05-27 15:56:21] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch

+ [23-05-27 15:56:21] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
216.4.16.0

==> /sys/class/infiniband/bnxt_re1/fw_ver <==
216.4.16.0

==> /sys/class/infiniband/bnxt_re2/fw_ver <==
216.4.16.0

==> /sys/class/infiniband/bnxt_re3/fw_ver <==
216.4.16.0

+ [23-05-27 15:56:21] lspci
+ [23-05-27 15:56:21] grep -i -e ethernet -e infiniband -e omni -e ConnectX
19:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
19:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.2 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
5e:00.3 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)

+ [23-05-27 15:56:21] rpm -q perftest
perftest-4.5.0.20-4.el8.x86_64


How reproducible:
100%


Steps to Reproduce:
1. On the server host, issue

timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R


2. On the client host, issue

+ [23-05-27 15:58:05] timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R 172.31.45.248

3.

Actual results:

/usr/bin/rhts_sync_block -s ib_write_lat-RC-ready_-roce.45-0 rdma-qe-34 -- Blocking state(s) =  14_ib_write_lat-RC-ready_-roce.45-0
+ [23-05-27 15:58:04] sleep 1
+ [23-05-27 15:58:05] timeout 3m ib_write_lat -a -c RC -d bnxt_re1 -i 1 -F -R 172.31.45.248
+ [23-05-27 16:01:05] RQA_check_result -r 124 -t 'ib_write_lat RC'


Refer to beaker test job: https://beaker.engineering.redhat.com/jobs/7898077

T:160852234
+00:37:33
/kernel/infiniband/perftest 




Expected results:



When tested on BCM57414,


+ [23-05-27 16:05:55] timeout 3m ib_write_lat -a -c RC -d bnxt_re3 -i 1 -F -R 172.31.45.24
---------------------------------------------------------------------------------------
                    RDMA_Write Latency Test
 Dual-port       : OFF		Device         : bnxt_re3
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: OFF
 ibv_wr* API     : OFF
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 96[B]
 rdma_cm QPs	 : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x01d2 PSN 0x2a0fc6
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:25
 remote address: LID 0000 QPN 0x00d2 PSN 0x9723a5
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:24
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec] 
 2       1000          4.27           9.00         4.31     	       4.32        	0.06   		4.69    		9.00   
 4       1000          4.21           8.83         4.24     	       4.25        	0.03   		4.60    		8.83   
 8       1000          4.22           7.04         4.26     	       4.27        	0.00   		4.60    		7.04   
 16      1000          4.13           4.77         4.27     	       4.28        	0.00   		4.46    		4.77   
 32      1000          4.30           5.24         4.34     	       4.35        	0.00   		4.68    		5.24   
 64      1000          4.35           5.17         4.38     	       4.40        	0.00   		4.72    		5.17   
 128     1000          5.06           6.10         5.09     	       5.10        	0.00   		5.45    		6.10   
 256     1000          5.18           6.00         5.21     	       5.22        	0.00   		5.56    		6.00   
 512     1000          5.29           9.44         5.33     	       5.34        	0.03   		5.72    		9.44   
 1024    1000          5.60           6.43         5.64     	       5.67        	0.00   		6.06    		6.43   
 2048    1000          6.23           11.60        6.29     	       6.31        	0.03   		6.56    		11.60  
 4096    1000          7.52           13.62        7.57     	       7.60        	0.11   		7.92    		13.62  
 8192    1000          8.74           9.46         8.80     	       8.84        	0.00   		9.11    		9.46   
 16384   1000          11.44          12.50        11.64    	       11.67       	0.00   		12.14   		12.50  
 32768   1000          16.79          18.06        16.88    	       16.95       	0.00   		17.40   		18.06  
 65536   1000          27.50          28.33        27.56    	       27.64       	0.00   		28.17   		28.33  
 131072  1000          48.89          49.79        48.99    	       49.06       	0.00   		49.58   		49.79  
 262144  1000          91.74          96.25        91.83    	       91.89       	0.03   		92.28   		96.25  
 524288  1000          177.37         178.39       177.46   	       177.52      	0.00   		178.03  		178.39 
 1048576 1000          348.66         350.79       348.77   	       348.84      	0.04   		349.33  		350.79 
 2097152 1000          691.26         692.64       691.36   	       691.43      	0.04   		692.02  		692.64 
 4194304 1000          1376.42        1378.20      1376.51  	       1376.60     	0.11   		1377.72 		1378.20
 8388608 1000          2746.77        2749.07      2746.87  	       2746.98     	0.13   		2748.31 		2749.07
---------------------------------------------------------------------------------------
+ [23-05-27 16:06:16] RQA_check_result -r 0 -t 'ib_write_lat RC'

Refer to 

https://beaker.engineering.redhat.com/jobs/7898076

task 
/kernel/infiniband/perftest


Additional info:

Comment 1 Brian Chae 2023-06-01 11:47:50 UTC
On rdma-qe-30/31, another BCM57414, "ib_write_lat RC" passed fine, too.

perftest test results on rdma-qe-30/rdma-qe-31 & Beaker job J:7915423:
4.18.0-492.el8.x86_64, rdma-core-44.0-2.el8.1, bnxt_en, roce.45, BCM57414 & bnxt_re0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC.   <<<=============
Checking for failures and known issues:
  no test failures


Note You need to log in before you can comment on or make changes to this bug.