Bug 2055531

Summary: bnxt_en card: add dpdk port to ovs bridge failed with ovs2.17
Product: Red Hat Enterprise Linux Fast Datapath Reporter: liting <tli>
Component: openvswitch2.17Assignee: David Marchand <dmarchan>
Status: CLOSED CURRENTRELEASE QA Contact: liting <tli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 22.ACC: ansaini, cgoncalves, ctrautma, dmarchan, jhsiao, jpradhan, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-11 07:42:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liting 2022-02-17 07:26:33 UTC
Description of problem:


Version-Release number of selected component (if applicable):
[root@netqe22 perf]# rpm -qa|grep openvs
kernel-kernel-networking-openvswitch-perf-1.0-235.noarch
openvswitch-selinux-extra-policy-1.0-29.el9fdp.noarch
openvswitch2.17-2.17.0-0.2.el9fdp.x86_64

[root@netqe22 perf]# ethtool -i enp130s0f0np0
driver: bnxt_en
version: 5.14.0-52.el9.x86_64
firmware-version: 20.6.143.0/pkg 20.06.04.06
expansion-rom-version: 
bus-info: 0000:82:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

[root@netqe22 perf]# uname -r
5.14.0-52.el9.x86_64


How reproducible:


Steps to Reproduce:
1. Add ovs bridge 
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="0,4098"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=800000800000
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
2. Bind bnxt_en card to dpdk
[root@netqe22 perf]# driverctl -v set-override 0000:82:00.0 vfio-pci
driverctl: setting driver override for 0000:82:00.0: vfio-pci
driverctl: loading driver vfio-pci
driverctl: unbinding previous driver bnxt_en
driverctl: reprobing driver for 0000:82:00.0
driverctl: saving driver override for 0000:82:00.0

3. Add dpdk port to ovsbr0
[root@netqe22 perf]# ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk type=dpdk options:dpdk-devargs=0000:82:00.0
ovs-vsctl: Error detected while setting up 'dpdk0': Error attaching device '0000:82:00.0' to DPDK.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch".

Actual results:
Add dpdk port to ovs bridge failed

[root@netqe22 ~]# tail -f /var/log/openvswitch/ovs-vswitchd.log
2022-02-17T07:16:14.737Z|00083|dpdk|INFO|EAL: Using IOMMU type 1 (Type 1)
2022-02-17T07:16:14.998Z|00084|dpdk|INFO|EAL: Probe PCI driver: net_bnxt (14e4:16d7) device: 0000:82:00.0 (socket 1)
2022-02-17T07:16:14.998Z|00085|dpdk|ERR|ethdev initialisation failed
2022-02-17T07:16:14.998Z|00086|dpdk|INFO|EAL: Releasing PCI mapped resource for 0000:82:00.0
2022-02-17T07:16:14.998Z|00087|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201000000
2022-02-17T07:16:14.998Z|00088|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201010000
2022-02-17T07:16:14.998Z|00089|dpdk|INFO|EAL: Calling pci_unmap_resource for 0000:82:00.0 at 0x4201110000
2022-02-17T07:16:15.198Z|00090|dpdk|ERR|EAL: Driver cannot attach the device (0000:82:00.0)
2022-02-17T07:16:15.198Z|00091|dpdk|ERR|EAL: Failed to attach device on primary process
2022-02-17T07:16:15.198Z|00092|netdev_dpdk|WARN|Error attaching device '0000:82:00.0' to DPDK
2022-02-17T07:16:15.198Z|00093|netdev|WARN|dpdk0: could not set configuration (Invalid argument)
2022-02-17T07:16:15.198Z|00094|dpdk|ERR|Invalid port_id=128

Expected results:
Add dpdk port to ovs bridge successfully.

Additional info:
https://beaker.engineering.redhat.com/jobs/6311938

Comment 1 David Marchand 2022-02-17 12:02:11 UTC
Thanks, this seems like a regression in dpdk v21.11, introduced by:
3972281f47b2 ("net/bnxt: fix device readiness check")

Reverting this patch makes init pass fine for me.
I scheduled a scratch build for you to test (but brew seems to take a long time..).


Has this problem been reproduced on RHEL8?

Comment 9 liting 2022-03-21 09:20:32 UTC
I update the firmware and it work well with  openvswitch2.17-2.17.0-0.2.el9fdp.x86_64.
[root@netqe22 ~]# ethtool -i enp130s0f0np0
driver: bnxt_en
version: 4.18.0-369.el8.x86_64
firmware-version: 220.0.59.0/pkg 220.0.83.0
expansion-rom-version: 
bus-info: 0000:82:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Comment 10 liting 2022-03-22 03:49:08 UTC
After update the firmware, I run the ovs dpdk pvp performance job of ovs2.17 again. It still not got the result.
https://beaker.engineering.redhat.com/jobs/6418733

From the test log. the dpdk port seems cannot receive any packet.

Comment 11 liting 2022-03-22 06:16:04 UTC
I run the ovs dpdk pvp performance job of openvswitch2.17-2.17.0-0.4.el9fdp. It also got 0 result.
https://beaker.engineering.redhat.com/jobs/6421187

It work well with openvswitch2.15-2.15.0-42.el9fdp.
https://beaker.engineering.redhat.com/jobs/6421135

Comment 12 David Marchand 2022-03-22 08:55:25 UTC
I can do nothing of the beaker logs: I can see no DPDK or OVS commands, it's hard to tell what is what. There is no usable logs.

To save time, please provide an environment and a way to reproduce the issue, and I will have a look.

Comment 16 liting 2022-03-28 12:39:31 UTC
I upgrade the i40e firmware of netqe22, the case still not work well. you can continue to access to them for debug. thanks.
[root@netqe32 ~]# ethtool -i ens3f0
driver: i40e
version: 4.18.0-305.el8.x86_64
firmware-version: 8.50 0x8000b6ed 1.3082.0
expansion-rom-version: 
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Comment 17 David Marchand 2022-03-28 13:35:09 UTC
Links are not even going up with kernel netdevs.
I logged on the systems and I see following.

On the dut, I see that the kernel netdev won't stay up:

[ 6402.875392] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[ 6402.886050] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 6404.113651] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[ 6404.116904] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[ 6404.127553] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 9932.291741] bnxt_en 0000:82:00.0 enp130s0f0np0: unsupported speed!
[11172.415616] bnxt_en 0000:82:00.1 enp130s0f1np1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
[11172.428238] bnxt_en 0000:82:00.1 enp130s0f1np1: FEC autoneg off encoding: None
[11172.436318] IPv6: ADDRCONF(NETDEV_CHANGE): enp130s0f1np1: link becomes ready
[11225.435240] bnxt_en 0000:82:00.1 enp130s0f1np1: NIC Link is Down
[11267.953958] bnxt_en 0000:82:00.1 enp130s0f1np1: unsupported speed!
[11617.480252] bnxt_en 0000:82:00.0 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca210000, node addr 00:0a:f7:b7:09:50
[11617.483667] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[11617.494142] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[11617.525693] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[11617.529025] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[11617.539590] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[11734.009748] bnxt_en 0000:82:00.1 enp130s0f1np1: unsupported speed!
[14286.494915] bnxt_en 0000:82:00.0 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca210000, node addr 00:0a:f7:b7:09:50
[14286.497770] bnxt_en 0000:82:00.0 enp130s0f0np0: renamed from eth0
[14286.508805] bnxt_en 0000:82:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[14286.540300] bnxt_en 0000:82:00.1 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem ca200000, node addr 00:0a:f7:b7:09:51
[14286.543631] bnxt_en 0000:82:00.1 enp130s0f1np1: renamed from eth0
[14286.554194] bnxt_en 0000:82:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)


On the tester side, there are warnings:

[    9.565143] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[    9.565144] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[    9.586715] i40e 0000:5e:00.0: fw 8.5.67516 api 1.15 nvm 8.50 0x8000b6ed 1.3082.0 [8086:1572] [8086:0007]
[    9.596288] i40e 0000:5e:00.0: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver.
[    9.923572] i40e 0000:5e:00.0: MAC address: 3c:fd:fe:ad:7b:4c
[    9.929457] i40e 0000:5e:00.0: FW LLDP is enabled
[    9.936525] i40e 0000:5e:00.0: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK
[    9.945565] i40e 0000:5e:00.0: DCB init failed -63, disabled
[   10.012278] i40e 0000:5e:00.0: PCI-Express: Speed 8.0GT/s Width x8
[   10.018876] i40e 0000:5e:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 56 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA
[   10.311717] i40e 0000:5e:00.1: fw 8.5.67516 api 1.15 nvm 8.50 0x8000b6ed 1.3082.0 [8086:1572] [8086:0000]
[   10.311719] i40e 0000:5e:00.1: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver.
[   10.879484] i40e 0000:5e:00.1: MAC address: 3c:fd:fe:ad:7b:4d
[   10.879744] i40e 0000:5e:00.1: FW LLDP is enabled
[   10.880783] i40e 0000:5e:00.1: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK
[   10.880784] i40e 0000:5e:00.1: DCB init failed -63, disabled
[   10.898577] i40e 0000:5e:00.1: PCI-Express: Speed 8.0GT/s Width x8
[   10.898996] i40e 0000:5e:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 56 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA


I logged into trex console, and I see:
owner      |              root |              root |                   
link       |              DOWN |              DOWN |                   


You mentionned that no cable was changed, ok, but the issue could be SFP or cable that broke recently.

Comment 18 liting 2022-03-29 01:10:55 UTC
It's strange that I didn't make any changes to the cable. I will try run ovs2 15 job.

Comment 19 liting 2022-03-30 03:53:08 UTC
I run ovs2.15 and ovs2.17 job again. The only difference between the two jobs is the OVS version. The ovs2.15 work well. And ovs2.17 does not work. The netqe22/netqe32 has no 
ovs2.15 job:
https://beaker.engineering.redhat.com/jobs/6447129
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/03/64471/6447129/11700996/141928037/bnxt_10.html
ovs2.17 job:
https://beaker.engineering.redhat.com/jobs/6447257
https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2022/03/64472/6447257/11701198/141930043/bnxt_10.html
And I also change the traffic sender from T-rex to Xena. and It also cannot got result.
https://beaker.engineering.redhat.com/jobs/6448932
Do you still need to test the test environment? I'll prepare it if necessary.

Comment 20 liting 2022-04-12 03:45:38 UTC
For fdp22c, openvswitch2.17-2.17.0-7.el9fdp, still has this issue
https://beaker.engineering.redhat.com/jobs/6490673

Comment 21 liting 2022-05-16 02:25:29 UTC
For fdp22d(openvswitch2.17-2.17.0-15.el9fdp.x86_64.rpm), change traffic sender to use Xena. it still has this issue. 
https://beaker.engineering.redhat.com/jobs/6614285

Comment 22 liting 2022-06-23 00:55:55 UTC
For openvswitch2.17-2.17.0-18.el9fdp.x86_64.rpm, it still exist this issue.
https://beaker.engineering.redhat.com/jobs/6743770

Comment 23 liting 2022-08-01 07:35:02 UTC
For fdp22f openvswitch2.17-2.17.0-30.el9fdp.x86_64, this issue still exist.
https://beaker.engineering.redhat.com/jobs/6871236

Comment 24 liting 2022-08-02 09:42:20 UTC
The issue does not exist on 25g bnxt_en card. It work well on 730-52 25g bnxt_en card. It only exist on netqe22 10g bnxt_en card. The firmware is same between the 10g and 25g card.
https://beaker.engineering.redhat.com/jobs/6875119
And It work well with ovs2.15, ovs2.16 on 10g bnxt_en card. It only does not work well with ovs2.17 on 10g bnxt_en card. 
https://beaker.engineering.redhat.com/jobs/6875032 
https://beaker.engineering.redhat.com/jobs/6872604
And checked the netqe22 10g card has disabled "LLDP nearest bridge" on bios setting.

Comment 30 liting 2023-10-11 07:42:53 UTC
It has no issue for openvswitch2.17-2.17.0-70.el9fdp.x86_64.rpm. Following is the job on anther 25g bnxt_en card. So close it. 
https://beaker.engineering.redhat.com/jobs/8328149

Comment 31 Red Hat Bugzilla 2024-02-09 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days