Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2211495

Summary: [RHEL-8.9] most of qperf tests failed when tested on QEDR iWARP device
Product: Red Hat Enterprise Linux 8 Reporter: Brian Chae <bchae>
Component: qperfAssignee: Nobody <nobody>
Status: CLOSED MIGRATED QA Contact: Infiniband QE <infiniband-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.9CC: dledford, hwkernel-mgr, rdma-dev-team
Target Milestone: rcKeywords: MigratedToJIRA, Regression
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-21 13:40:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian Chae 2023-05-31 19:07:15 UTC
Description of problem:

The following qperf tests failed when tested on QEDR iWARP.

      FAIL |      1 | rc_bi_bw
      FAIL |      1 | rc_bw
      FAIL |      1 | rc_lat
      FAIL |      1 | rc_rdma_read_bw
      FAIL |      1 | rc_compare_swap_mr
      FAIL |      1 | rc_fetch_add_mr
      FAIL |      1 | ver_rc_compare_swap
      FAIL |      1 | ver_rc_fetch_add

These are regressions from RHEL-8.8.0-20230228.22.




Version-Release number of selected component (if applicable):

Clients: rdma-dev-03
Servers: rdma-dev-02

DISTRO=RHEL-8.9.0-20230531.26

+ [23-05-31 13:53:08] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.9 Beta (Ootpa)

+ [23-05-31 13:53:08] uname -a
Linux rdma-dev-03.rdma.lab.eng.rdu2.redhat.com 4.18.0-494.el8.x86_64 #1 SMP Mon May 22 11:16:32 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-31 13:53:08] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-494.el8.x86_64 root=UUID=4c97e3ba-2618-4d9b-80d0-a5c87ef7d0a5 ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=auto resume=UUID=b7fec9d8-7600-49eb-b89a-ad19248be0d0 console=ttyS1,115200

+ [23-05-31 13:53:08] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el8.1.x86_64
linux-firmware-20230515-115.gitd1962891.el8.noarch

+ [23-05-31 13:53:08] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver
==> /sys/class/infiniband/qedr0/fw_ver <==
8. 59. 1. 0

==> /sys/class/infiniband/qedr1/fw_ver <==
8. 59. 1. 0

+ [23-05-31 13:53:08] lspci
+ [23-05-31 13:53:08] grep -i -e ethernet -e infiniband -e omni -e ConnectX
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)


+ [23-05-31 13:53:08] rpm -q qperf
qperf-0.4.11-3.el8.x86_64



How reproducible:
100%


Steps to Reproduce:
1. On the RDMA server host, issue
   qperf
2. On the RDMA client host, issue

qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bi_bw
qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bw
qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_lat
qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_rdma_read_bw
qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_compare_swap_mr
qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_fetch_add_mr
qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_compare_swap
qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_fetch_add



3.

Actual results:

+ [23-05-31 13:53:15] qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bi_bw
server: 
rc_bi_bw:
warning: -i set but not used in test rc_bi_bw
rc_bi_bw failed: WR flush failure
+ [23-05-31 13:53:15] RQA_check_result -r 1 -t rc_bi_bw


+ [23-05-31 13:53:15] qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bw
server: 
rc_bw:
warning: -i set but not used in test rc_bw
+ [23-05-31 13:53:15] RQA_check_result -r 1 -t rc_bw


+ [23-05-31 13:53:15] qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_lat
server: 
rc_lat:
warning: -i set but not used in test rc_lat
+ [23-05-31 13:53:15] RQA_check_result -r 1 -t rc_lat


qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_rdma_read_bw
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
rc_rdma_read_bw:
warning: -i set but not used in test rc_rdma_read_bw
rc_rdma_read_bw failed: WR flush failure
+ [23-05-31 13:53:15] RQA_check_result -r 1 -t rc_rdma_read_bw



+ [23-05-31 13:53:20] qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_compare_swap_mr
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
rc_compare_swap_mr:
warning: -i set but not used in test rc_compare_swap_mr
rc_compare_swap_mr failed: WR flush failure
+ [23-05-31 13:53:20] RQA_check_result -r 1 -t rc_compare_swap_mr



+ [23-05-31 13:53:20] qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_fetch_add_mr
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
rc_fetch_add_mr:
warning: -i set but not used in test rc_fetch_add_mr
rc_fetch_add_mr failed: WR flush failure
+ [23-05-31 13:53:20] RQA_check_result -r 1 -t rc_fetch_add_mr


+ [23-05-31 13:53:20] qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_compare_swap
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
ver_rc_compare_swap:
warning: -i set but not used in test ver_rc_compare_swap
ver_rc_compare_swap failed: WR flush failure
+ [23-05-31 13:53:20] RQA_check_result -r 1 -t ver_rc_compare_swap


+ [23-05-31 13:53:20] qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_fetch_add
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
[qelr_poll_cq_req:2145]Error: POLL CQ with ROCE_CQE_REQ_STS_WORK_REQUEST_FLUSHED_ERR. QP icid=0x290
ver_rc_fetch_add:
warning: -i set but not used in test ver_rc_fetch_add
ver_rc_fetch_add failed: WR flush failure
+ [23-05-31 13:53:20] RQA_check_result -r 1 -t ver_rc_fetch_add







Expected results:



qperf test results from RHEL-8.8.0-20230228.22




+ [23-03-02 17:15:58] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_bi_bw
rc_bi_bw:
warning: -i set but not used in test rc_bi_bw
    bw             =  5.03 GB/sec
    msg_rate       =  76.7 K/sec
    use_cm         =     1 
    loc_cpus_used  =    22 % cpus
    rem_cpus_used  =    23 % cpus
+ [23-03-02 17:16:00] RQA_check_result -r 0 -t rc_bi_bw


+ [23-03-02 17:16:00] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_bw
rc_bw:
warning: -i set but not used in test rc_bw
    bw              =  2.74 GB/sec
    msg_rate        =  41.9 K/sec
    use_cm          =     1 
    send_cost       =  30.6 ms/GB
    recv_cost       =  63.8 ms/GB
    send_cpus_used  =   8.5 % cpus
    recv_cpus_used  =  17.5 % cpus
+ [23-03-02 17:16:02] RQA_check_result -r 0 -t rc_bw


+ [23-03-02 17:16:02] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_lat
rc_lat:
warning: -i set but not used in test rc_lat
    latency        =  11.8 us
    msg_rate       =  84.9 K/sec
    use_cm         =     1 
    loc_cpus_used  =    36 % cpus
    rem_cpus_used  =  48.5 % cpus
+ [23-03-02 17:16:08] RQA_check_result -r 0 -t rc_lat


+ [23-03-02 17:16:08] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_rdma_read_bw
rc_rdma_read_bw:
warning: -i set but not used in test rc_rdma_read_bw
    bw              =  1.85 GB/sec
    msg_rate        =  28.3 K/sec
    use_cm          =     1 
    recv_cost       =  56.6 ms/GB
    recv_cpus_used  =  10.5 % cpus
+ [23-03-02 17:16:10] RQA_check_result -r 0 -t rc_rdma_read_bw


+ [23-03-02 17:16:23] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_compare_swap_mr
rc_compare_swap_mr:
warning: -i set but not used in test rc_compare_swap_mr
    msg_rate        =  79.1 K/sec
    use_cm          =     1 
    send_cost       =   432 sec/GB
    recv_cost       =   7.9 sec/GB
    send_cpus_used  =  27.5 % cpus
    recv_cpus_used  =   0.5 % cpus
+ [23-03-02 17:16:25] RQA_check_result -r 0 -t rc_compare_swap_mr



+ [23-03-02 17:16:25] qperf -v -i qedr1:1 -cm 1 172.31.50.103 rc_fetch_add_mr
rc_fetch_add_mr:
warning: -i set but not used in test rc_fetch_add_mr
    msg_rate        =  79.1 K/sec
    use_cm          =     1 
    send_cost       =   503 sec/GB
    recv_cost       =  15.8 sec/GB
    send_cpus_used  =    32 % cpus
    recv_cpus_used  =     1 % cpus
+ [23-03-02 17:16:27] RQA_check_result -r 0 -t rc_fetch_add_mr




+ [23-03-02 17:16:27] qperf -v -i qedr1:1 -cm 1 172.31.50.103 ver_rc_compare_swap
ver_rc_compare_swap:
warning: -i set but not used in test ver_rc_compare_swap
    msg_rate        =  79.1 K/sec
    use_cm          =     1 
    send_cost       =   432 sec/GB
    recv_cost       =  15.8 sec/GB
    send_cpus_used  =  27.5 % cpus
    recv_cpus_used  =     1 % cpus
+ [23-03-02 17:16:29] RQA_check_result -r 0 -t ver_rc_compare_swap




+ [23-03-02 17:16:29] qperf -v -i qedr1:1 -cm 1 172.31.50.103 ver_rc_fetch_add
ver_rc_fetch_add:
warning: -i set but not used in test ver_rc_fetch_add
    msg_rate        =  79.1 K/sec
    use_cm          =     1 
    send_cost       =   463 sec/GB
    recv_cost       =  31.6 sec/GB
    send_cpus_used  =  29.5 % cpus
    recv_cpus_used  =     2 % cpus
+ [23-03-02 17:16:31] RQA_check_result -r 0 -t ver_rc_fetch_add




Additional info:

Comment 1 RHEL Program Management 2023-09-21 13:39:56 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 2 RHEL Program Management 2023-09-21 13:40:11 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.