The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2055531 - bnxt_en card: add dpdk port to ovs bridge failed with ovs2.17
Summary: bnxt_en card: add dpdk port to ovs bridge failed with ovs2.17
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.17
Version: FDP 22.A
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: David Marchand
QA Contact: liting
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-17 07:26 UTC by liting
Modified: 2024-02-09 04:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-11 07:42:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1780 0 None None None 2022-02-17 07:27:56 UTC

Description liting 2022-02-17 07:26:33 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@netqe22 perf]# rpm -qa|grep openvs
kernel-kernel-networking-openvswitch-perf-1.0-235.noarch
openvswitch-selinux-extra-policy-1.0-29.el9fdp.noarch
openvswitch2.17-2.17.0-0.2.el9fdp.x86_64

[root@netqe22 perf]# ethtool -i enp130s0f0np0
driver: bnxt_en
version: 5.14.0-52.el9.x86_64
firmware-version: 20.6.143.0/pkg 20.06.04.06
expansion-rom-version: 
bus-info: 0000:82:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

[root@netqe22 perf]# uname -r
5.14.0-52.el9.x86_64


How reproducible:


Steps to Reproduce:
1. Add ovs bridge 
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="0,4098"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=800000800000
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
2. Bind bnxt_en card to dpdk
[root@netqe22 perf]# driverctl -v set-override 0000:82:00.0 vfio-pci
driverctl: setting driver override for 0000:82:00.0: vfio-pci
driverctl: loading driver vfio-pci
driverctl: unbinding previous driver bnxt_en
driverctl: reprobing driver for 0000:82:00.0
driverctl: saving driver override for 0000:82:00.0

3. Add dpdk port to ovsbr0
[root@netqe22 perf]# ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk type=dpdk options:dpdk-devargs=0000:82:00.0
ovs-vsctl: Error detected while setting up 'dpdk0': Error attaching device '0000:82:00.0' to DPDK.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".

Actual results:
Add dpdk port to ovs bridge failed

[root@netqe22 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log
2022-02-17T07:16:14.737Z|00083|dpdk|INFO|EAL: Using IOMMU type 1 (Type 1)
2022-02-17T07:16:14.998Z|00084|dpdk|INFO|EAL: Probe PCI driver: net_bnxt (14e4:16d7) device: 0000:82:00.0 (socket 1)
2022-02-17T07:16:14.998Z|00085|dpdk|ERR|ethdev initialisation failed
2022-02-17T07:16:14.998Z|00086|dpdk|INFO|EAL: Releasing PCI mapped resource for 0000:82:00.0
2022-02-17T07:16:14.998Z|00087|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201000000
2022-02-17T07:16:14.998Z|00088|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201010000
2022-02-17T07:16:14.998Z|00089|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201110000
2022-02-17T07:16:15.198Z|00090|dpdk|ERR|EAL: Driver cannot attach the device (0000:82:00.0)
2022-02-17T07:16:15.198Z|00091|dpdk|ERR|EAL: Failed to attach device on primary process
2022-02-17T07:16:15.198Z|00092|netdev_dpdk|WARN|Error attaching device '0000:82:00.0' to DPDK
2022-02-17T07:16:15.198Z|00093|netdev|WARN|dpdk0: could not set configuration (Invalid argument)
2022-02-17T07:16:15.198Z|00094|dpdk|ERR|Invalid port_id=128

Expected results:
Add dpdk port to ovs bridge successfully.

Additional info:
https://beaker.engineering.redhat.com/jobs/6311938

Comment 1 David Marchand 2022-02-17 12:02:11 UTC
Thanks, this seems like a regression in dpdk v21.11, introduced by:
3972281f47b2 ("net/bnxt: fix device readiness check")

Reverting this patch makes init pass fine for me.
I scheduled a scratch build for you to test (but brew seems to take a long time..).


Has this problem been reproduced on RHEL8?

Comment 9 liting 2022-03-21 09:20:32 UTC
I update the firmware and it work well with  openvswitch2.17-2.17.0-0.2.el9fdp.x86_64.
[root@netqe22 ~]# ethtool -i enp130s0f0np0
driver: bnxt_en
version: 4.18.0-369.el8.x86_64
firmware-version: 220.0.59.0/pkg 220.0.83.0
expansion-rom-version: 
bus-info: 0000:82:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Comment 10 liting 2022-03-22 03:49:08 UTC
After update the firmware, I run the ovs dpdk pvp performance job of ovs2.17 again. It still not got the result.
https://beaker.engineering.redhat.com/jobs/6418733

From the test log. the dpdk port seems cannot receive any packet.

Comment 11 liting 2022-03-22 06:16:04 UTC
I run the ovs dpdk pvp performance job of openvswitch2.17-2.17.0-0.4.el9fdp. It also got 0 result.
https://beaker.engineering.redhat.com/jobs/6421187

It work well with openvswitch2.15-2.15.0-42.el9fdp.
https://beaker.engineering.redhat.com/jobs/6421135

Comment 12 David Marchand 2022-03-22 08:55:25 UTC
I can do nothing of the beaker logs: I can see no DPDK or OVS commands, it's hard to tell what is what. There is no usable logs.

To save time, please provide an environment and a way to reproduce the issue, and I will have a look.

Comment 16 liting 2022-03-28 12:39:31 UTC
I upgrade the i40e firmware of netqe22, the case still not work well. you can continue to access to them for debug. thanks.
[root@netqe32 ~]# ethtool -i ens3f0
driver: i40e
version: 4.18.0-305.el8.x86_64
firmware-version: 8.50 0x8000b6ed 1.3082.0
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Comment 17 David Marchand 2022-03-28 13:35:09 UTC
Links are not even going up with kernel netdevs.
I logged on the systems and I see following.

On the dut, I see that the kernel netdev won't stay up:

[ 6402.875392] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[ 6402.886050] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 6404.113651] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[ 6404.116904] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[ 6404.127553] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 9932.291741] bnxt_en 0000:82:00.0 enp130s0f0np0: unsupported speed!
[11172.415616] bnxt_en 0000:82:00.1 enp130s0f1np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[11172.428238] bnxt_en 0000:82:00.1 enp130s0f1np1: FEC autoneg off encoding: None
[11172.436318] IPv6: ADDRCONF(NETDEV_CHANGE): enp130s0f1np1: link becomes ready
[11225.435240] bnxt_en 0000:82:00.1 enp130s0f1np1: NIC Link is Down
[11267.953958] bnxt_en 0000:82:00.1 enp130s0f1np1: unsupported speed!
[11617.480252] bnxt_en 0000:82:00.0 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca210000, node addr 00:0a:f7:b7:09:50
[11617.483667] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[11617.494142] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[11617.525693] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[11617.529025] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[11617.539590] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[11734.009748] bnxt_en 0000:82:00.1 enp130s0f1np1: unsupported speed!
[14286.494915] bnxt_en 0000:82:00.0 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca210000, node addr 00:0a:f7:b7:09:50
[14286.497770] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[14286.508805] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[14286.540300] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[14286.543631] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[14286.554194] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)


On the tester side, there are warnings:

[    9.565143] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[    9.565144] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[    9.586715] i40e 0000:5e:00.0: fw 8.5.67516 api 1.15 nvm 8.50 0x8000b6ed 1.3082.0 [8086:1572] [8086:0007]
[    9.596288] i40e 0000:5e:00.0: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver.
[    9.923572] i40e 0000:5e:00.0: MAC address: 3c:fd:fe:ad:7b:4c
[    9.929457] i40e 0000:5e:00.0: FW LLDP is enabled
[    9.936525] i40e 0000:5e:00.0: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK
[    9.945565] i40e 0000:5e:00.0: DCB init failed -63, disabled
[   10.012278] i40e 0000:5e:00.0: PCI-Express: Speed 8.0GT/s Width x8
[   10.018876] i40e 0000:5e:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 56 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA
[   10.311717] i40e 0000:5e:00.1: fw 8.5.67516 api 1.15 nvm 8.50 0x8000b6ed 1.3082.0 [8086:1572] [8086:0000]
[   10.311719] i40e 0000:5e:00.1: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver.
[   10.879484] i40e 0000:5e:00.1: MAC address: 3c:fd:fe:ad:7b:4d
[   10.879744] i40e 0000:5e:00.1: FW LLDP is enabled
[   10.880783] i40e 0000:5e:00.1: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK
[   10.880784] i40e 0000:5e:00.1: DCB init failed -63, disabled
[   10.898577] i40e 0000:5e:00.1: PCI-Express: Speed 8.0GT/s Width x8
[   10.898996] i40e 0000:5e:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 56 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA


I logged into trex console, and I see:
owner      |              root |              root |                   
link       |              DOWN |              DOWN |                   


You mentionned that no cable was changed, ok, but the issue could be SFP or cable that broke recently.

Comment 18 liting 2022-03-29 01:10:55 UTC
It's strange that I didn't make any changes to the cable. I will try run ovs2 15 job.

Comment 19 liting 2022-03-30 03:53:08 UTC
I run ovs2.15 and ovs2.17 job again. The only difference between the two jobs is the OVS version. The ovs2.15 work well. And ovs2.17 does not work. The netqe22/netqe32 has no 
ovs2.15 job:
https://beaker.engineering.redhat.com/jobs/6447129
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/03/64471/6447129/11700996/141928037/bnxt_10.html
ovs2.17 job:
https://beaker.engineering.redhat.com/jobs/6447257
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/03/64472/6447257/11701198/141930043/bnxt_10.html
And I also change the traffic sender from T-rex to Xena. and It also cannot got result.
https://beaker.engineering.redhat.com/jobs/6448932
Do you still need to test the test environment? I'll prepare it if necessary.

Comment 20 liting 2022-04-12 03:45:38 UTC
For fdp22c, openvswitch2.17-2.17.0-7.el9fdp, still has this issue
https://beaker.engineering.redhat.com/jobs/6490673

Comment 21 liting 2022-05-16 02:25:29 UTC
For fdp22d(openvswitch2.17-2.17.0-15.el9fdp.x86_64.rpm), change traffic sender to use Xena. it still has this issue. 
https://beaker.engineering.redhat.com/jobs/6614285

Comment 22 liting 2022-06-23 00:55:55 UTC
For openvswitch2.17-2.17.0-18.el9fdp.x86_64.rpm, it still exist this issue.
https://beaker.engineering.redhat.com/jobs/6743770

Comment 23 liting 2022-08-01 07:35:02 UTC
For fdp22f openvswitch2.17-2.17.0-30.el9fdp.x86_64, this issue still exist.
https://beaker.engineering.redhat.com/jobs/6871236

Comment 24 liting 2022-08-02 09:42:20 UTC
The issue does not exist on 25g bnxt_en card. It work well on 730-52 25g bnxt_en card. It only exist on netqe22 10g bnxt_en card. The firmware is same between the 10g and 25g card.
https://beaker.engineering.redhat.com/jobs/6875119
And It work well with ovs2.15, ovs2.16 on 10g bnxt_en card. It only does not work well with ovs2.17 on 10g bnxt_en card. 
https://beaker.engineering.redhat.com/jobs/6875032 
https://beaker.engineering.redhat.com/jobs/6872604
And checked the netqe22 10g card has disabled "LLDP nearest bridge" on bios setting.

Comment 30 liting 2023-10-11 07:42:53 UTC
It has no issue for openvswitch2.17-2.17.0-70.el9fdp.x86_64.rpm. Following is the job on anther 25g bnxt_en card. So close it. 
https://beaker.engineering.redhat.com/jobs/8328149

Comment 31 Red Hat Bugzilla 2024-02-09 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.