Bug 1964126

Summary: [RHEL8.5] pyverbs-tests fail over roce-mlx5-cx5-27800 HCAs with error on test_tx_packet_reformat test
Product: Red Hat Enterprise Linux 8 Reporter: Brian Chae <bchae>
Component: rdma-coreAssignee: Nobody <nobody>
Status: CLOSED WONTFIX QA Contact: Brian Chae <bchae>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.5CC: hwkernel-mgr, rdma-dev-team
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-24 07:27:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian Chae 2021-05-24 18:55:50 UTC
Description of problem:

When run on roce-mlx5-cx5-27800 HCAs, pyverbs-tests fail with the following error.

test_tx_packet_reformat (tests.test_mlx5_flow.Mlx5MatcherTest) ... ERROR
Traceback (most recent call last):
  File "mlx5dv_flow.pyx", line 122, in pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.close
  File "base.pyx", line 42, in pyverbs.base.close_weakrefs
  File "flow.pyx", line 121, in pyverbs.flow.Flow.close
  File "flow.pyx", line 126, in pyverbs.flow.Flow.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to destroy Flow. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.__dealloc__'
Traceback (most recent call last):
  File "mlx5dv_flow.pyx", line 122, in pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.close
  File "base.pyx", line 42, in pyverbs.base.close_weakrefs
  File "flow.pyx", line 121, in pyverbs.flow.Flow.close
  File "flow.pyx", line 126, in pyverbs.flow.Flow.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to destroy Flow. Errno: 9, Bad file descriptor
Traceback (most recent call last):
  File "mlx5dv_flow.pyx", line 125, in pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.close
pyverbs.pyverbs_error.PyverbsRDMAError: Destroy matcher failed.. Errno: 9, Bad file descriptor
Exception ignored in: 'pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.__dealloc__'
Traceback (most recent call last):
  File "mlx5dv_flow.pyx", line 125, in pyverbs.providers.mlx5.mlx5dv_flow.Mlx5FlowMatcher.close



Version-Release number of selected component (if applicable):


DISTRO=RHEL-8.5.0-20210521.n.1

Red Hat Enterprise Linux release 8.5 Beta (Ootpa)

Linux rdma-perf-03.lab.bos.redhat.com 4.18.0-305.8.el8.x86_64 #1 SMP Mon May 17 14:15:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.8.el8.x86_64 root=/dev/mapper/rhel_rdma--perf--03-root ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 reboot=acpi crashkernel=auto resume=/dev/mapper/rhel_rdma--perf--03-swap rd.lvm.lv=rhel_rdma-perf-03/root rd.lvm.lv=rhel_rdma-perf-03/swap console=ttyS1,115200n81

rdma-core-35.0-1.el8.x86_64

linux-firmware-20201218-102.git05789708.el8.noarch
==> /sys/class/infiniband/mlx5_0/fw_ver <==
16.28.2006

==> /sys/class/infiniband/mlx5_1/fw_ver <==
16.28.2006
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
03:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
03:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
07:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
07:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

RDMA hosts tested: [ roce-mlx5-cx5-27800 HCAs ]

Clients: rdma-perf-03
Servers: rdma-perf-02


Installed:
  python3-pyverbs-35.0-1.el8.x86_64        

How reproducible:

100%

Steps to Reproduce:
1. With the above RHEL8.5 build, install the following packages on both server and client hosts


2. execute the pyverbs tests

    ./run_tests.py -v --dev $HCA_ID
 
         <HCA_ID: mlx5_1>       
3.

Actual results:

======================================================================
ERROR: test_tx_packet_reformat (tests.test_mlx5_flow.Mlx5MatcherTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.UNa4FXk0HC/rdma-core/tests/test_mlx5_flow.py", line 54, in func_wrapper
    return func(instance)
  File "/tmp/tmp.UNa4FXk0HC/rdma-core/tests/test_mlx5_flow.py", line 189, in test_tx_packet_reformat
    skip_idxs=ipv4_id_idx + ipv4_chksum_idx + udp_chksum_idx)
  File "/tmp/tmp.UNa4FXk0HC/rdma-core/tests/utils.py", line 836, in raw_traffic
    expected_packet if expected_packet else msg, skip_idxs)
  File "/tmp/tmp.UNa4FXk0HC/rdma-core/tests/utils.py", line 802, in validate_raw
    raise PyverbsError(err_msg)
pyverbs.pyverbs_error.PyverbsError: Data validation failure:
expected b'\x01PV\x19 \xa7$\x8a\x07\xa5(\xc8\x08\x00E\x00\x04$\x00\x00@\x00@\x11\x00\x00\x01\x01\x01\x01\x02\x02\x02\x02\x04\xd2\x12\xb5\x04\x10\x00\x00\x08\x00\x00\x00v\xad\xf1\x00\x01PV\x19 \xa7$\x8a\x07\xa5(\xc8\x08\x00E\x00\x03\xf2\x00\x00@\x00@\x11\x00\x00\x01\x01\x01\x01\x02\x02\x02\x02\x04\xd2\x16.\x03\xde\x00\x00aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'

received b'\x01PV\x19 \xa7$\x8a\x07\xa5(\xc8\x08\x00E\x00\x04$/\x8f@\x00@\x11\x015\x01\x01\x01\x01\x02\x02\x02\x02\x04\xd2\x16.\x04\x10\x00\x00aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'

----------------------------------------------------------------------
Ran 183 tests in 25.330s

FAILED (errors=1, skipped=39)
Traceback (most recent call last):
  File "flow.pyx", line 126, in pyverbs.flow.Flow.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to destroy Flow. Errno: 95, Operation not supported
Exception ignored in: 'pyverbs.flow.Flow.__dealloc__'
Traceback (most recent call last):
  File "flow.pyx", line 126, in pyverbs.flow.Flow.close
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to destroy Flow. Errno: 95, Operation not supported
---
- TEST RESULT FOR rdma-core
-   Test:   Run pyverbs tests
-   Result: FAIL
-   Return: 1
---
/mnt/tests/kernel/infiniband/pyverbs-tests
---
- TEST RESULT FOR pyverbs-tests
-   Test:   Remove temp directory
-   Result: PASS
-   Return: 0
---
      FAIL |      1 | Run pyverbs tests
** /kernel/infiniband/pyverbs-tests/standalone FAIL Score:0
Uploading resultoutputfile.log 


Expected results:

Either

test_tx_packet_reformat (tests.test_mlx5_flow.Mlx5MatcherTest) ... skipped 'NIC flow table does not support reformat'

OR

test_tx_packet_reformat (tests.test_mlx5_flow.Mlx5MatcherTest) ... ok


Additional info:

RHEL-8.4.0-20210409.0 build also observed the same issue on rdma-perf-02 & 03 for ROCE.

Comment 4 RHEL Program Management 2022-11-24 07:27:47 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.