Bug 1468996 - ib_write_bw failed over ConnectX-4 Lx/ROCE
ib_write_bw failed over ConnectX-4 Lx/ROCE
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: perftest (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity high
: rc
: ---
Assigned To: Jarod Wilson
Infiniband QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-10 04:23 EDT by zguo
Modified: 2017-07-10 15:01 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description zguo 2017-07-10 04:23:44 EDT
Description of problem:
[root@rdma-virt-02 ~]$ ib_write_bw -c RC -d mlx5_2

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx5_2
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 2
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x02ff PSN 0xf115ef RKey 0x008458 VAddr 0x002b5eb80d2000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:43:92
 remote address: LID 0000 QPN 0x0301 PSN 0xd1ed0b RKey 0x00641b VAddr 0x002b766ae6f000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:93
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdam_cm
 Failed to exchange data between server and clients
[root@rdma-virt-03 ~]$ timeout 3m ib_write_bw 172.31.40.92 -c RC -d mlx5_2
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF		Device         : mlx5_2
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 2
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0301 PSN 0xd1ed0b RKey 0x00641b VAddr 0x002b766ae6f000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:93
 remote address: LID 0000 QPN 0x02ff PSN 0xf115ef RKey 0x008458 VAddr 0x002b5eb80d2000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:43:92
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0x81
scnt=128, ccnt=0
 Failed to complete run_iter_bw function successfully

Version-Release number of selected component (if applicable):
[root@rdma-virt-02 ~]$ ethtool -i mlx5_roce
driver: mlx5_core
version: 3.0-1 (January 2015)
firmware-version: 14.18.1000
expansion-rom-version: 
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
[root@rdma-virt-02 ~]$ ibstat mlx5_2
CA 'mlx5_2'
	CA type: MT4117
	Number of ports: 1
	Firmware version: 14.18.1000
	Hardware version: 0
	Node GUID: 0xe41d2d0300fda72a
	System image GUID: 0xe41d2d0300fda72a
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x04010000
		Port GUID: 0xe61d2dfffefda72a
		Link layer: Ethernet
[root@rdma-virt-02 ~]$ lspci | grep Mell
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root@rdma-virt-02 ~]$ rpm -q perftest
perftest-3.4-1.el7.x86_64 

[root@rdma-virt-03 ~]$ uname -r
3.10.0-693.el7.x86_64
[root@rdma-virt-03 ~]$ ethtool -i mlx5_roce
driver: mlx5_core
version: 3.0-1 (January 2015)
firmware-version: 14.18.1000
expansion-rom-version: 
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
[root@rdma-virt-03 ~]$ ibstat mlx5_2
CA 'mlx5_2'
	CA type: MT4117
	Number of ports: 1
	Firmware version: 14.18.1000
	Hardware version: 0
	Node GUID: 0xe41d2d0300fda736
	System image GUID: 0xe41d2d0300fda736
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x04010000
		Port GUID: 0xe61d2dfffefda736
		Link layer: Ethernet

[root@rdma-virt-03 ~]$ lspci | grep Mell
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
 Completion with error at client
 Failed status 12: wr_id 0 syndrom 0x81
scnt=128, ccnt=0
 Failed to complete run_iter_bw function successfully

Expected results:
ib_write_bw run successfully

Additional info:
Issue can be reproduced on rhel-7.3 kernel 3.10.0-514.el7.x86_64

Note You need to log in before you can comment on or make changes to this bug.