Bug 1866984

Summary: perftest ib_read_bw command fails on host with bnxt_re
Product: Red Hat Enterprise Linux 8 Reporter: Afom T. Michael <tmichael>
Component: perftestAssignee: Selvin Xavier (Broadcom) <sxavier>
Status: CLOSED WONTFIX QA Contact: Infiniband QE <infiniband-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.3CC: dledford, hwkernel-mgr, linville, rdma-dev-team
Target Milestone: rc   
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-26 20:01:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Afom T. Michael 2020-08-07 03:30:45 UTC
Description of problem:
On RHEL-8.3, running perftest ib_read_bw on host with bnxt_re fails with "Completion with error at client.." as shown below.

On Server:
$ ib_read_bw -a -c RC -F -d bnxt_re0 -p 1 -F -R

On client:
$ ib_read_bw -a -c RC -F -d bnxt_re0 -p 1 -F -R 172.31.45.125
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF		Device         : bnxt_re0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 7
 Outstand reads  : 126
 rdma_cm QPs	 : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x104c PSN 0x51e3c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:45:126
 remote address: LID 0000 QPN 0x104b PSN 0xe0099b
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:43:125
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             0.00               0.00   		   0.000023
 4          1000             16.84              16.82  		   4.408122
 8          1000             34.12              34.08  		   4.467469
 16         1000             68.04              67.94  		   4.452546
 32         1000             132.92             132.81 		   4.351879
 64         1000             271.76             271.64 		   4.450570
 128        1000             530.90             528.35 		   4.328280
 256        1000             1034.76            1032.94		   4.230926
 512        1000             1880.65            1879.10		   3.848392
 1024       1000             2877.39            2877.29		   2.946343
 2048       1000             4003.16            4001.18		   2.048602
 4096       1000             4300.25            4299.94		   1.100786
 8192       1000             4377.41            4377.31		   0.560295
 16384      1000             4348.94            4348.75		   0.278320
 32768      1000             4359.67            4359.59		   0.139507
 65536      1000             4276.04            4276.01		   0.068416
 131072     1000             4332.24            4332.23		   0.034658
 262144     1000             4328.03            4328.01		   0.017312
 524288     1000             4325.76            4325.31		   0.008651
 1048576    1000             4326.06            4325.42		   0.004325
 2097152    1000             1016.48            1015.40		   0.000508 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0xc845fc50
scnt=228, ccnt=100

+ [20-08-06 22:53:59] RQA_check_result -r 17 -t 'ib_read_bw RC'


Version-Release number of selected component (if applicable):
[root@rdma-dev-26 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.3 Beta (Ootpa)
[root@rdma-dev-26 ~]$ uname -r
4.18.0-221.el8.x86_64
[root@rdma-dev-26 ~]$ rpm -qa | grep -E "rdma|ibverbs|perftest|infiniband-diags"
perftest-4.4-2.el8.x86_64
infiniband-diags-29.0-3.el8.x86_64
rdma-core-devel-29.0-3.el8.x86_64
librdmacm-utils-29.0-3.el8.x86_64
rdma-core-29.0-3.el8.x86_64
libibverbs-29.0-3.el8.x86_64
kernel-kernel-infiniband-libibverbs-utils-0.1-38.noarch
librdmacm-29.0-3.el8.x86_64
kernel-kernel-infiniband-perftest-1.1-57.noarch
libibverbs-utils-29.0-3.el8.x86_64
[root@rdma-dev-26 ~]$ ibstatus 
Infiniband device 'bnxt_re0' port 1 status:
	default gid:	 fe80:0000:0000:0000:020a:f7ff:feea:cd90
	base lid:	 0x0
	sm lid:		 0x0
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 100 Gb/sec (4X EDR)
	link_layer:	 Ethernet

[root@rdma-dev-26 ~]$ ibv_devinfo 
hca_id:	bnxt_re0
	transport:			InfiniBand (0)
	fw_ver:				214.0.194.0
	node_guid:			020a:f7ff:feea:cd90
	sys_image_guid:			020a:f7ff:feea:cd90
	vendor_id:			0x14e4
	vendor_part_id:			5652
	hw_ver:				0x4540
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

[root@rdma-dev-26 ~]$
[root@rdma-dev-26 ~]$ lspci | grep Broadcom
[...snip...]
04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57454 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet (rev 01)
[root@rdma-dev-26 ~]$ 

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL-8.3 Beta on hosts with BCM57454
2. Run ib_read_bw command as shown above
3.

Actual results:
[root@rdma-dev-26 perftest]$ timeout 3m ib_read_bw -a -c RC -F -d bnxt_re0 -p 1 -F -R 172.31.45.125 -u 16
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF		Device         : bnxt_re0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 7
 Outstand reads  : 126
 rdma_cm QPs	 : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x105a PSN 0x582620
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:45:126
 remote address: LID 0000 QPN 0x1059 PSN 0xc8d8b4
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:43:125
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 2          1000             0.02               0.00   		   0.000016
 4          1000             16.91              16.88  		   4.423967
 8          1000             33.97              33.92  		   4.446124
 16         1000             67.94              67.87  		   4.448101
 32         1000             132.53             132.47 		   4.340876
 64         1000             270.56             270.40 		   4.430310
 128        1000             538.72             538.00 		   4.407301
 256        1000             1083.81            1082.59		   4.434300
 512        1000             2084.26            2066.57		   4.232326
 1024       1000             4104.21            4099.25		   4.197633
 2048       1000             7225.43            7224.39		   3.698888
 4096       1000             10460.52            10454.78		   2.676423
 8192       1000             11464.48            11462.09		   1.467147
 16384      1000             11577.87            11577.25		   0.740944
 32768      1000             11623.93            11623.84		   0.371963
 65536      1000             11649.89            11649.37		   0.186390
 131072     1000             11662.95            11662.86		   0.093303
 262144     1000             132.29             128.97 		   0.000516
 524288     1000             332.95             324.95 		   0.000650
 1048576    1000             315.29             312.10 		   0.000312
 2097152    1000             267.52             251.04 		   0.000126
 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0x7e8a0060
scnt=328, ccnt=200
[root@rdma-dev-26 perftest]$ echo $?
17
[root@rdma-dev-26 perftest]$

Expected results:
The command to pass

Additional info:
As requested in https://bugzilla.redhat.com/show_bug.cgi?id=1832709#c11, adding "-u 16" to the test didn't help.

Comment 1 Afom T. Michael 2020-08-07 03:32:46 UTC
Per https://bugzilla.redhat.com/show_bug.cgi?id=1832709#c11, assigning to Selvin.

Comment 3 John W. Linville 2021-01-26 20:01:31 UTC
Given the age of this bug and the lack of progress so far, a fix seems unlikely in the near term.