Bug 2034948
| Summary: | [RHEL9.0.0] ib_send_lat RC and ib_write_lat RC tests on BXNT ROCE device always ends with error - failed status 2 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Brian Chae <bchae> |
| Component: | rdma-core | Assignee: | Nobody <nobody> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Chae <bchae> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 9.0 | CC: | anantha.subramanyam, dledford, rdma-dev-team, selvin.xavier, zguo |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | 9.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rdma-core-37.2-1.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2024865 | Environment: | |
| Last Closed: | 2022-05-17 15:53:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2024865 | ||
| Bug Blocks: | 2026666, 2027125 | ||
|
Description
Brian Chae
2021-12-22 14:41:25 UTC
bad rdma-core commit 66aba73d4a7a025689154676048d34e8915bd74b.
commit 66aba73d4a7a025689154676048d34e8915bd74b
Author: Selvin Xavier <selvin.xavier>
Date: Mon Aug 2 10:03:07 2021 -0700
bnxt_re/lib: Move hardware queue to 16B aligned indices
Move SQ and RQ indices from WQE boundary to
16B boundary alignment. Changing the SQ-wqe posting
algorithm accordingly. The new alignment needs to pull
a 16B slot from the hardware queue and initialize the
current 16B into the hardware buffer. Depending on the
max possible wqe size supported by hardware, the number
of 16B slots are calculated and pulled for initialization.
Currently 128B wqe is supported and it requires 8 slots.
*** Bug 2029137 has been marked as a duplicate of this bug. *** hi, selvin this one is a serious libbnxt_re regression. It impacts perftest, openmpi, libfabric/fabtests, librdmacm. Created a rdma-core pull request for this regression. Need to pick this patch once the fix is merged to rdma-core. https://github.com/linux-rdma/rdma-core/pull/1120 thanks, Selvin Honggang, The patch is merged to rdma-core. Can you pull this patch to 9.0? Thanks, Selvin
> The patch is merged to rdma-core. Can you pull this patch to 9.0?
yes. thanks
Verification was conducted as the following:
1. build and packages
DISTRO=RHEL-9.0.0-20220128.1
+ [22-01-30 08:40:07] cat /etc/redhat-release
Red Hat Enterprise Linux release 9.0 Beta (Plow)
+ [22-01-30 08:40:07] uname -a
Linux rdma-qe-25.rdma.lab.eng.rdu2.redhat.com 5.14.0-48.el9.x86_64 #1 SMP PREEMPT Mon Jan 24 22:40:42 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
+ [22-01-30 08:40:07] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-48.el9.x86_64 root=/dev/mapper/rhel_rdma--qe--25-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-102400T:512M resume=/dev/mapper/rhel_rdma--qe--25-swap rd.lvm.lv=rhel_rdma-qe-25/root rd.lvm.lv=rhel_rdma-qe-25/swap console=ttyS0,115200n81
+ [22-01-30 08:40:07] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el9.x86_64
linux-firmware-20211027-123.el9.noarch
+ [22-01-30 08:40:07] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
20.8.30.0
==> /sys/class/infiniband/bnxt_re1/fw_ver <==
20.8.30.0
==> /sys/class/infiniband/bnxt_re2/fw_ver <==
216.0.51.0
==> /sys/class/infiniband/bnxt_re3/fw_ver <==
216.0.51.0
+ [22-01-30 08:40:07] lspci
+ [22-01-30 08:40:07] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
+ [22-01-30 08:40:09] rpm -q perftest
perftest-4.5-12.el9.x86_64
2. Result
Test results for perftest on rdma-qe-25:
5.14.0-48.el9.x86_64, rdma-core-37.2-1.el9, bnxt_en, roce.45, & bnxt_re3
Result | Status | Test
---------+--------+------------------------------------
PASS | 0 | ib_read_bw RC
PASS | 0 | ib_read_lat RC
PASS | 0 | ib_send_bw RC
PASS | 0 | ib_send_lat RC
PASS | 0 | ib_write_bw RC
PASS | 0 | ib_write_lat RC
- successful
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: RDMA stack), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:3950 |