RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1902855 - [RHEL8.4] performance degradation with "ib_send_lat RC" test when tested on mlx5 MT27700 CX-4 ROCE device
Summary: [RHEL8.4] performance degradation with "ib_send_lat RC" test when tested on m...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: perftest
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.4
Assignee: Honggang LI
QA Contact: Brian Chae
URL:
Whiteboard:
Depends On:
Blocks: 1903942
TreeView+ depends on / blocked
 
Reported: 2020-11-30 19:57 UTC by Brian Chae
Modified: 2022-03-24 15:38 UTC (History)
5 users (show)

Fixed In Version: perftest-4.4-8.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 14:45:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Brian Chae 2020-11-30 19:57:08 UTC
Description of problem:

The perftest of "ib_send_lat RC" test on data size of 4194304 bytes and 8388608 bytes had test time increase of about 250 fold when compared with the same test on other MLX4 ROCE device, like MLX5 CX-3. This is also true when the same "ib_send_lat RC" on the same mlx5 CX-4 with RHEL-8.3. The RDMA lab host with this issue are rdma-dev-21 and rdma-dev-22. 

This is a performance degradation on RHEL-8.4 and regression issue from RHEL-8.3.



Version-Release number of selected component (if applicable):


DISTRO=RHEL-8.4.0-20201128.n.0

+ [20-11-30 13:49:07] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)

+ [20-11-30 13:49:07] uname -a
Linux rdma-dev-22.lab.bos.redhat.com 4.18.0-254.el8.x86_64 #1 SMP Thu Nov 26 08:47:50 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
+ [20-11-30 13:49:07] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-254.el8.x86_64 root=/dev/mapper/rhel_rdma--dev--22-root ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=/dev/mapper/rhel_rdma--dev--22-swap rd.lvm.lv=rhel_rdma-dev-22/root rd.lvm.lv=rhel_rdma-dev-22/swap console=ttyS1,115200n81

+ [20-11-30 13:49:07] rpm -q rdma-core linux-firmware
rdma-core-32.0-1.el8.x86_64
linux-firmware-20201022-100.gitdae4b4cd.el8.noarch
+ [20-11-30 13:49:07] tail /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/mlx5_2/fw_ver
==> /sys/class/infiniband/mlx5_0/fw_ver <==
12.28.1002

==> /sys/class/infiniband/mlx5_1/fw_ver <==
12.28.1002

==> /sys/class/infiniband/mlx5_2/fw_ver <==
12.28.1002
+ [20-11-30 13:49:07] lspci
+ [20-11-30 13:49:07] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
82:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
82:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
+ [20-11-30 13:49:07] lscpu


rdma-core-32.0-1.el8



How reproducible:

100%

Steps to Reproduce:

Device info
============

CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.28.1002
	Hardware version: 0
	Node GUID: 0x248a07030056b834
	System image GUID: 0x248a07030056b834
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x00010000
		Port GUID: 0x268a07fffe56b834
		Link layer: Ethernet


6: mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 24:8a:07:56:b8:34 brd ff:ff:ff:ff:ff:ff
    inet 172.31.40.122/24 brd 172.31.40.255 scope global dynamic noprefixroute mlx5_roce
       valid_lft 3427sec preferred_lft 3427sec
    inet6 fe80::268a:7ff:fe56:b834/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
23: mlx5_roce.45@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 24:8a:07:56:b8:34 brd ff:ff:ff:ff:ff:ff
    inet 172.31.45.122/24 brd 172.31.45.255 scope global dynamic noprefixroute mlx5_roce.45
       valid_lft 3427sec preferred_lft 3427sec
    inet6 fe80::268a:7ff:fe56:b834/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
24: mlx5_roce.43@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 24:8a:07:56:b8:34 brd ff:ff:ff:ff:ff:ff
    inet 172.31.43.122/24 brd 172.31.43.255 scope global dynamic noprefixroute mlx5_roce.43
       valid_lft 3427sec preferred_lft 3427sec
    inet6 fe80::268a:7ff:fe56:b834/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever




1. Bring up two RDMA hosts with the above software/build
2. issue the following perftest command on the server host

root@rdma-dev-21 ~]$ timeout 10m ib_send_lat -a -c RC -d mlx5_0 -i 1 -F -R

3. issue the following perfest command on the client host 

[root@rdma-dev-22 ~]$ timeout 10m ib_send_lat -a -c RC -d mlx5_0 -i 1 -F -R 172.31.45.121


Actual results:

[root@rdma-dev-22 ~]$ timeout 10m ib_send_lat -a -c RC -d mlx5_0 -i 1 -F -R 172.31.45.121
---------------------------------------------------------------------------------------
                    Send Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 7
 Max inline data : 236[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x011c PSN 0x2abeb5
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:122
 remote address: LID 0000 QPN 0x011c PSN 0x2e2cd7
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:121
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec] 
 2       1000          1.18           3.09         1.23                1.23             0.04            1.31                  3.09   
 4       1000          1.18           2.41         1.22                1.22             0.05            1.31                  2.41   
 8       1000          1.18           2.22         1.22                1.22             0.05            1.28                  2.22   
 16      1000          1.18           2.00         1.22                1.22             0.04            1.28                  2.00   
 32      1000          1.18           3.23         1.22                1.23             0.06            1.31                  3.23   
 64      1000          1.26           2.25         1.29                1.29             0.04            1.36                  2.25   
 128     1000          1.28           3.31         1.32                1.32             0.07            1.39                  3.31   
 256     1000          1.64           3.27         1.68                1.68             0.04            1.83                  3.27   
 512     1000          1.71           2.88         1.75                1.77             0.06            1.93                  2.88   
 1024    1000          1.84           2.37         1.89                1.92             0.07            2.06                  2.37   
 2048    1000          2.09           3.00         2.13                2.15             0.05            2.34                  3.00   
 4096    1000          2.58           3.48         2.62                2.63             0.04            2.82                  3.48   
 8192    1000          3.00           3.29         3.04                3.06             0.05            3.25                  3.29   
 16384   1000          3.77           4.64         3.89                3.90             0.10            4.18                  4.64   
 32768   1000          5.33           7.43         5.42                5.48             0.13            5.85                  7.43   
 65536   1000          8.47           10.06        8.71                8.72             0.15            9.02                  10.06  
 131072  1000          18.81          19.86        19.09               19.11            0.14            19.51                 19.86  
 262144  1000          31.27          32.49        31.80               31.81            0.21            32.29                 32.49  
 524288  1000          56.04          57.71        56.63               56.65            0.27            57.50                 57.71  
 1048576 1000          105.53         107.63       106.12              106.15           0.29            107.10                107.63 
 2097152 1000          204.46         206.53       205.12              205.16           0.33            206.09                206.53 
 4194304 1000          402.73         2150785.13      493.85           97111.29         427875.91            2147478.09               2150785.13
 8388608 1000          950.45         2137753.19      1112.66          39726.68         274756.53            2100862.37               2137753.19
---------------------------------------------------------------------------------------

Normally, the above test takes less than 3min; but with the tremendous performance degradation for 4194304 and 8388608 bytes of data sizes, the test time increased to 6 minutes.


Expected results:


+ [20-11-27 07:54:07] timeout 3m ib_send_lat -a -c RC -d mlx5_0 -i 1 -F -R 172.31.45.121
---------------------------------------------------------------------------------------
                    Send Latency Test
 Dual-port       : OFF		Device         : mlx5_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 7
 Max inline data : 236[B]
 rdma_cm QPs	 : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0112 PSN 0x1e5128
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:45:122
 remote address: LID 0000 QPN 0x0112 PSN 0x7854d
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:45:121
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec] 
 2       1000          1.18           2.86         1.23     	       1.23        	0.05   		1.31    		2.86   
 4       1000          1.18           2.70         1.22     	       1.22        	0.04   		1.28    		2.70   
 8       1000          1.18           2.43         1.23     	       1.23        	0.04   		1.28    		2.43   
 16      1000          1.19           2.16         1.22     	       1.22        	0.04   		1.28    		2.16   
 32      1000          1.18           2.07         1.23     	       1.23        	0.04   		1.29    		2.07   
 64      1000          1.26           2.37         1.29     	       1.31        	0.07   		1.55    		2.37   
 128     1000          1.27           2.14         1.31     	       1.31        	0.03   		1.37    		2.14   
 256     1000          1.63           3.32         1.68     	       1.68        	0.04   		1.82    		3.32   
 512     1000          1.71           2.85         1.76     	       1.77        	0.05   		1.92    		2.85   
 1024    1000          1.83           2.74         1.89     	       1.92        	0.08   		2.08    		2.74   
 2048    1000          2.07           2.96         2.12     	       2.13        	0.05   		2.31    		2.96   
 4096    1000          2.53           3.59         2.59     	       2.60        	0.05   		2.74    		3.59   
 8192    1000          2.90           3.41         2.96     	       2.98        	0.07   		3.20    		3.41   
 16384   1000          3.57           4.69         3.68     	       3.73        	0.13   		4.16    		4.69   
 32768   1000          4.92           6.43         5.06     	       5.12        	0.17   		5.63    		6.43   
 65536   1000          8.02           9.59         8.26     	       8.26        	0.15   		8.57    		9.59   
 131072  1000          17.87          19.14        18.18    	       18.20       	0.12   		18.51   		19.14  
 262144  1000          28.60          30.65        29.20    	       29.19       	0.28   		29.74   		30.65  
 524288  1000          50.06          54.90        50.69    	       51.15       	1.14   		54.12   		54.90  
 1048576 1000          92.81          99.26        95.02    	       95.22       	1.70   		98.74   		99.26  
 2097152 1000          178.71         184.45       183.33   	       183.32      	0.48   		184.19  		184.45 
 4194304 1000          349.79         355.46       351.75   	       351.83      	1.29   		355.16  		355.46 
 8388608 1000          692.37         699.42       694.63   	       694.71      	1.23   		698.25  		699.42 
---------------------------------------------------------------------------------------


Additional info:

Comment 11 Honggang LI 2020-12-31 09:18:06 UTC
[root@rdma-dev-21 ~]$ lspci -nn | grep  8086:6f0
00:00.0 Host bridge [0600]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 [8086:6f00] (rev 01)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 [8086:6f02] (rev 01)
00:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 [8086:6f04] (rev 01)
00:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 [8086:6f08] (rev 01)
00:03.1 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 [8086:6f09] (rev 01)
80:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 [8086:6f02] (rev 01)
80:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 [8086:6f08] (rev 01)

https://lore.kernel.org/patchwork/patch/820922/

The CPU of rdma-dev-21/22 is not PCIe Relaxed Ordering compliant, so please run perftest with '--disable_pcie_relaxed'.

[root@rdma-dev-21 ~]$ ib_send_lat --disable_pcie_relaxed  -a -c RC -d mlx5_0 -i 1 -F -R
<snip>
 PCIe relax order: OFF  <====
<snip>

[root@rdma-dev-22 ~]$ ib_send_lat --disable_pcie_relaxed -a -c RC -d mlx5_0 -i 1 -F -R  172.31.45.121

Comment 14 Honggang LI 2021-01-29 11:27:31 UTC
https://github.com/linux-rdma/perftest/pull/117

Comment 18 Brian Chae 2021-02-05 12:21:13 UTC
The perftest was re-tested with the latest build, RHEL-8.4.0-20210205.n.0, on mlx5 MT27700 CX-4 ROCE device.

o RDMA lab hots 
  rdma-dev-21(server) / 22(client) host pair.

o Build info

DISTRO=RHEL-8.4.0-20210205.n.0

+ [21-02-05 06:32:51] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)

+ [21-02-05 06:32:51] uname -a
Linux rdma-dev-22.lab.bos.redhat.com 4.18.0-282.el8.x86_64 #1 SMP Tue Feb 2 14:09:52 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
+ [21-02-05 06:32:51] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-282.el8.x86_64 root=/dev/mapper/rhel_rdma--dev--22-root ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=/dev/mapper/rhel_rdma--dev--22-swap rd.lvm.lv=rhel_rdma-dev-22/root rd.lvm.lv=rhel_rdma-dev-22/swap console=ttyS1,115200n81

+ [21-02-05 06:32:51] rpm -q rdma-core linux-firmware
rdma-core-32.0-4.el8.x86_64
linux-firmware-20201218-102.git05789708.el8.noarch
+ [21-02-05 06:32:51] tail /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/mlx5_2/fw_ver
==> /sys/class/infiniband/mlx5_0/fw_ver <==
12.28.1002

==> /sys/class/infiniband/mlx5_1/fw_ver <==
12.28.1002

==> /sys/class/infiniband/mlx5_2/fw_ver <==
12.28.1002

+ [21-02-05 06:32:51] lspci
+ [21-02-05 06:32:51] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
82:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
82:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]

o Test result

Test results for perftest on rdma-dev-22:
4.18.0-282.el8.x86_64, rdma-core-32.0-4.el8, mlx5, roce.45, & mlx5_0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | ib_atomic_bw RC
      PASS |      0 | ib_atomic_lat RC
      PASS |      0 | ib_read_bw RC
      PASS |      0 | ib_read_lat RC
      PASS |      0 | ib_send_bw RC
      PASS |      0 | ib_send_lat RC
      PASS |      0 | ib_write_bw RC
      PASS |      0 | ib_write_lat RC
      PASS |      0 | raw_ethernet_bw RC
      PASS |      0 | raw_ethernet_lat RC

Checking for failures and known issues:
  no test failures

o ib_send_lat perftest result, showing the performace data

+ [21-02-05 06:34:37] timeout 3m ib_send_lat -a -c RC -d mlx5_0 -i 1 -F -R 172.31.45.121 <<<============= 
---------------------------------------------------------------------------------------
                    Send Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: Unsupported
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 7
 Max inline data : 236[B]
 rdma_cm QPs     : ON
 Data ex. method : rdma_cm
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0111 PSN 0x1be678
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:45:122
 remote address: LID 0000 QPN 0x0111 PSN 0x88219a
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:172:31:40:121
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       1000          1.18           1.97         1.22                1.22             0.02            1.31                    1.97
 4       1000          1.18           2.09         1.22                1.22             0.04            1.27                    2.09
 8       1000          1.18           2.14         1.22                1.22             0.04            1.27                    2.14
 16      1000          1.17           2.05         1.22                1.22             0.04            1.27                    2.05
 32      1000          1.18           3.22         1.22                1.22             0.07            1.28                    3.22
 64      1000          1.25           2.43         1.28                1.29             0.04            1.38                    2.43
 128     1000          1.26           2.25         1.30                1.31             0.04            1.37                    2.25
 256     1000          1.63           2.91         1.67                1.68             0.06            1.83                    2.91
 512     1000          1.70           3.00         1.75                1.76             0.06            1.95                    3.00
 1024    1000          1.82           3.19         1.88                1.91             0.08            2.08                    3.19
 2048    1000          2.05           2.38         2.11                2.12             0.04            2.30                    2.38
 4096    1000          2.53           3.21         2.58                2.60             0.06            2.74                    3.21
 8192    1000          2.89           4.05         2.95                2.97             0.07            3.17                    4.05
 16384   1000          3.56           5.01         3.65                3.72             0.13            4.15                    5.01
 32768   1000          4.91           6.11         5.02                5.09             0.16            5.59                    6.11
 65536   1000          8.01           9.37         8.26                8.28             0.13            8.65                    9.37
 131072  1000          17.81          19.05        18.17               18.19            0.13            18.51                   19.05
 262144  1000          28.55          29.96        29.18               29.20            0.32            29.85                   29.96
 524288  1000          50.20          55.86        52.25               52.61            1.03            55.51                   55.86
 1048576 1000          92.93          97.86        94.78               94.62            0.79            96.69                   97.86
 2097152 1000          178.64         184.00       180.88              181.31           1.38            183.81                  184.00
 4194304 1000          349.77         356.88       350.70              351.28           1.41            356.23                  356.88 <<<============ 
 8388608 1000          692.43         699.28       694.66              694.76           1.25            698.49                  699.28 <<<============
---------------------------------------------------------------------------------------

Now, the above perftest of "ib_send_lat" shows the performance is in par with RHEL8.3 perftest test results.

Comment 20 errata-xmlrpc 2021-05-18 14:45:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RDMA stack bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1594

Comment 21 Karsten Weiss 2022-03-24 15:38:22 UTC
Question:

So if I get this right --disable_pcie_relaxed is an option that (manually) works
around the issue in perftest(!) on affected platforms.

However, what about other Infiniband-using software with similar traffic patterns?
Is every other program supposed to introduce such an option, too?

I would appreciate if some could explain the situation.


Note You need to log in before you can comment on or make changes to this bug.