Bug 2217867 - [OSP-17.1][Mellnox-Cx6] OVN-HWOL: LLDP flows cause performance regression [NEEDINFO]
Summary: [OSP-17.1][Mellnox-Cx6] OVN-HWOL: LLDP flows cause performance regression
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 17.1 (Wallaby)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z3
: 17.1
Assignee: RHOSP:NFV_Eng
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks: 2172622
TreeView+ depends on / blocked
 
Reported: 2023-06-27 10:13 UTC by Pradipta Kumar Sahoo
Modified: 2023-08-09 18:16 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
There is currently a known issue on Nvidia ConnectX-5 and ConnectX-6 NICs, when using hardware offload, where some offloaded flows on a PF can cause transient performance issues on the associated VFs. This issue is specifically observed with LLDP and VRRP traffic.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
rjarry: needinfo? (bnemeth)
ifrangs: needinfo? (rhosp-nfv-int)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26110 0 None None None 2023-06-27 10:15:58 UTC

Description Pradipta Kumar Sahoo 2023-06-27 10:13:08 UTC
Description of problem:
Specific to small frame size packets (64, 128), the OVN-HWOL is underperformed and we noticed that there significant performance drop compared to the regular SRIOV test.

Version-Release number of selected component (if applicable):
- RHOS-17.1-RHEL-9-20230511.n.1

How reproducible: 100% reproduced NFV PerfScale lab on Icelake Compute with 100G Mellanox Cx6 card.

Topology
--------
Traffic direction  ----->
Trex Port 1----- Switch Port 1 ---- | Switch Port2 -----DUT Port1 | dpdk-testpmd Port1 (io mode forwarding)
Trex Port 2----- Switch Port 2 ---- | Switch Port3 -----DUT Port2 | dpdk-testpmd Port2 (io mode forwarding)
Traffic direction  ----->


Steps to Reproduce:
1. I used a homogeneous configuration for both SRIOV and HWOL performance tests.
2. I used the same traffic profile consisting of 1,000 flows and modified source and destination IP addresses for both scenarios. 
3. In the PVP scenario, the dpdk-testpmd application is used as a single queue for RxTx.
4. The neutron ports for SRIOV and OVN-HWOL are disabled with the security group.
5. Tried the test in two scenarios CQE_COMPRESSION BALANCE(default) and AGGRESSIVE.
6. In HWOL, specfically in AGGRESSIVE there are significant perofrmance drop in OVN-HWOL.

Additional info: The result summary will share in the following private comment

Comment 2 Gurpreet Singh 2023-06-28 15:30:54 UTC
Pradipta,


Hmm. I thought HWOL performance at best will match SR-IOV, but in general it will be lower as compared to SR-IOV. 

Do you see an anomaly here in terms results compared to results with 16.2?

Regards
Gurpreet

Comment 3 Pradipta Kumar Sahoo 2023-07-03 02:26:12 UTC
Gurpreet,

I didn't observe any major performance gap in the 16.2 HWOL test on same hardware.

Result sheet:
https://docs.google.com/spreadsheets/d/1GF1fPqcxjQGCnyY6qmtZ-ILngQkLLZOYpO2aUDpAZmw/edit?usp=sharing

Comment 4 Pradipta Kumar Sahoo 2023-07-18 13:31:03 UTC
Traffic profile details which have used in Tgen:
---------------------------------------------
Protocol: UDP
Flow Modification: src-ip, dst-ip
Number of flows: 1024
Search-time and final-validation-time: 30sec & 300sec
Loss Scenarios: 0.002
Traffic direction: Bi-direction
Frame Sizes: 64, 128, 256, 512, 1024, 1500, 9000

Comment 5 Miguel Angel Nieto 2023-07-20 10:40:08 UTC
Can you show flows?
ovs-appctl dpctl/dump-flows -m type=offloaded

I have seen that any drop flow there is decreasing performance. 

In a ml2-ovs scenario i had some vrrp packets that were arriving to the compute and being dropped, after disabling ha in the router, i stopped receiving those vrrp packets and performance was good. I opened this bz
https://bugzilla.redhat.com/show_bug.cgi?id=2221922

ufid:39c661b9-2bf6-428e-9bdc-1974e90acbce, skb_priority(0/0),tunnel(tun_id=0xb70e,src=10.10.141.157,dst=10.10.141.136,ttl=0/0,tp_dst=4789,flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vxlan_sys_4789),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:148, bytes:7992, used:0.680s, offloaded:yes, dp:tc, actions:drop

In a ovn scenario I had some lldp packets arriving to the compute and being dropped. After disabling lldp in the switch, i stop receiving those lldp packets and the performance was good
ufid:00b82f41-503f-4a47-8f41-9dd0197f18f9, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(mx-bond),packet_type(ns=0/0,id=0/0),eth(src=f4:52:14:25:28:74,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, offloaded:yes, dp:tc, actions:drop

Comment 6 Pradipta Kumar Sahoo 2023-07-20 11:47:54 UTC
For reference, all the test logs are shared in comment #1.

OVN-HWOL Test log:
http://storage.scalelab.redhat.com/psahoo/PerfTaskLog/OSP17.1/nfv_hwol/trafficgen--2023-06-21_11%3A41%3A28_UTC--8291a5c3-740d-4ec8-a437-81a9c02265be.tar.xz

In my test topology there is neutron router has been used. The test is in P-V-P scenarios on vlan provider network.

Sample datapath flows output during the test: 

ufid:4e38023e-37d6-48b3-9429-cf2ed7a5a35e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1_2),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:4a:2c,dst=3c:fd:fe:ee:4a:2d),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16273681, bytes:244075
82184, used:0.220s, offloaded:yes, dp:tc, actions:push_vlan(vid=178,pcp=0),ens1f1np1
ufid:9e8eaf02-64d7-4f1e-a498-27107f4f0223, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1_2),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:47:0c,dst=3c:fd:fe:ee:47:0d),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16264148, bytes:243932
81302, used:0.220s, offloaded:yes, dp:tc, actions:push_vlan(vid=178,pcp=0),ens1f1np1
ufid:39364176-9664-4061-8529-114e4d81b114, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1_2),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:4c:50,dst=3c:fd:fe:ee:4c:51),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16256199, bytes:243813
59310, used:0.220s, offloaded:yes, dp:tc, actions:push_vlan(vid=178,pcp=0),ens1f1np1
ufid:e2af17d5-4539-4064-bf38-c4cd9638b9c8, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1_2),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:4c:34,dst=3c:fd:fe:ee:4c:35),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16249698, bytes:243716
07882, used:0.220s, offloaded:yes, dp:tc, actions:push_vlan(vid=178,pcp=0),ens1f1np1
ufid:3745c18c-a008-44d9-bddc-ed6ffc5b09dd, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0_3),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:4a:2d,dst=3c:fd:fe:ee:4a:2c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16657233, bytes:249829
09990, used:0.030s, offloaded:yes, dp:tc, actions:push_vlan(vid=177,pcp=0),ens1f0np0
ufid:c01e044d-429f-4f70-b4ba-b0c4ceef5e81, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0_3),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:47:0d,dst=3c:fd:fe:ee:47:0c),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16647589, bytes:249684
44008, used:0.030s, offloaded:yes, dp:tc, actions:push_vlan(vid=177,pcp=0),ens1f0np0
ufid:4c032c08-01e1-409d-87be-8391fdb5e960, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0_3),packet_type(ns
=0/0,id=0/0),eth(src=3c:fd:fe:ee:4c:51,dst=3c:fd:fe:ee:4c:50),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16640747, bytes:249581
82444, used:0.030s, offloaded:yes, dp:tc, actions:push_vlan(vid=177,pcp=0),ens1f0np0
ufid:3ad1be3d-8ae5-4cb7-87bd-4520736fba99, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0_3),packet_type(ns=0/0,id=0/0),eth(src=3c:fd:fe:ee:4c:35,dst=3c:fd:fe:ee:4c:34),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:16630009, bytes:24942075444, used:0.030s, offloaded:yes, dp:tc, actions:push_vlan(vid=177,pcp=0),ens1f0np0
...
ufid:c1ec11dd-7005-43b7-ab9a-b69071814698, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:4a:2d,dst=3c:fd:fe:ee:4a:2c),eth_type(0x8100),vlan(vid=178,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no)), packets:16270689, bytes:24338011238, used:0.220s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f1np1_2
ufid:09135c27-ee57-4dba-b39d-f5d9d9bc2922, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:47:0d,dst=3c:fd:fe:ee:47:0c),eth_type(0x8100),vlan(vid=178,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0
,frag=no)), packets:16261071, bytes:24323622724, used:0.220s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f1np1_2
ufid:3214a6d1-021d-4fff-b477-55b3bc7a2ab5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:4c:51,dst=3c:fd:fe:ee:4c:50),eth_type(0x8100),vlan(vid=178,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no)), packets:16254223, bytes:24313379552, used:0.220s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f1np1_2
ufid:ff221669-2619-427c-bc3b-6c64173ba9ec, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f1np1),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:4c:35,dst=3c:fd:fe:ee:4c:34),eth_type(0x8100),vlan(vid=178,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no)), packets:16243467, bytes:24297288576, used:0.220s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f1np1_2
ufid:56f2f088-eea1-4603-91be-bbf1a87489d4, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0/0,id=0/0),eth(src=c8:fe:6a:f1:d6:5b,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, offloaded:yes, dp:tc, actions:drop
ufid:3114804a-6a37-4a26-affe-9522ac3471ac, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0/0,id=0/0),eth(src=3c:fd:fe:ee:4a:2c,dst=3c:fd:fe:ee:4a:2d),eth_type(0x8100),vlan(vid=177,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0
,frag=no)), packets:16660338, bytes:24920926612, used:0.030s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f0np0_3
ufid:56a9b2ea-b9a8-455f-97b3-9bbcb446c463, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:47:0c,dst=3c:fd:fe:ee:47:0d),eth_type(0x8100),vlan(vid=177,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no)), packets:16650780, bytes:24906627886, used:0.030s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f0np0_3
ufid:28331545-925d-4e3e-8908-7d8276aa4038, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:4c:50,dst=3c:fd:fe:ee:4c:51),eth_type(0x8100),vlan(vid=177,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0
,frag=no)), packets:16642836, bytes:24894743718, used:0.030s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f0np0_3
ufid:380fb46d-52e3-46d5-b49f-c8bd748e3d1a, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0
/0,id=0/0),eth(src=3c:fd:fe:ee:4c:34,dst=3c:fd:fe:ee:4c:35),eth_type(0x8100),vlan(vid=177,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0
,frag=no)), packets:16636355, bytes:24885048198, used:0.030s, offloaded:yes, dp:tc, actions:pop_vlan,ens1f0np0_3

Comment 7 Miguel Angel Nieto 2023-07-20 12:11:11 UTC
I think this flow is causing performance drop. It may be lldp. I would try to stop that traffic and check

ufid:56f2f088-eea1-4603-91be-bbf1a87489d4, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens1f0np0),packet_type(ns=0/0,id=0/0),eth(src=c8:fe:6a:f1:d6:5b,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, offloaded:yes, dp:tc, actions:drop

Comment 8 Pradipta Kumar Sahoo 2023-07-20 15:51:24 UTC
As discussed with Andrew and the net-perf team, I wanted to share some insights regarding LLDP traffic and its interaction with OVS.

LLDP traffic typically has very low bandwidth. However, when an LLDP packet arrives on a host with OVS, it will likely result in a 'miss' in OVS unless the same packet (with the same header) arrives more frequently than the datapath-flow idle-time expiration.

During my test scenarios, I didn't observe frequent LLDP flows in the datapath layer. Considering this, I believe enabling LLDP is a standard procedure in the customer environment.

Hence it has been enabled in the Switch.

root@juniper-nfv1> show lldp 

LLDP                      : Enabled
Advertisement interval    : 30 seconds
Transmit delay            : 2 seconds
Hold timer                : 120 seconds
Notification interval     : 5 Second(s)
Config Trap Interval      : 0 seconds
Connection Hold timer     : 300 seconds

LLDP MED                  : Enabled
MED fast start count      : 3 Packets

Port ID TLV subtype       : interface-name
Port Description TLV type : interface-alias (ifAlias)

In/terface      Parent Interface    LLDP        LLDP-MED       Power Negotiation
all            -                   Enabled     Enabled        Enabled

Comment 9 Gurpreet Singh 2023-07-20 23:28:35 UTC
Pradipta

Is the miss for LLDP traffic impacting the performance for other workload traffic? I assume Trex is not generating any LLDP traffic and performance results are not based on non-trex generated traffic.

Regards
Gurpreet

Comment 10 Miguel Angel Nieto 2023-07-21 10:46:19 UTC
Yes, in my test I saw that LLDP traffic generated by the switch are reducing throughput of data traffic around 10% or 12%

Miguel

Comment 11 Robin Jarry 2023-07-21 10:46:57 UTC
This looks like bz 2221922 is related to the same issue.

After some debugging, I found that there is a direct correlation between LLDP packets received and rx_discards_phy increase on the PF interface. The VFs are assigned to a VM running testpmd and when an LLDP packet arrives on the host, the RX rate drops(stable rate is 14mpps per port, after receiving an LLDP packet, it drops down to 13mpps per port for a few seconds before returning to normal).

See below:

[root@computehwoffload-r740 ~]# for i in mx-bond ens6f0np0 ens6f1np1 ens6f1np1_1 ens6f1np1_8; do ip -d link show $i; done
20: mx-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:9d:73:00 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535 
    bond mode active-backup active_slave ens6f0np0 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_missed_max 2 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 
    openvswitch_slave addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 
6: ens6f0np0: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:9d:73:00 brd ff:ff:ff:ff:ff:ff promiscuity 2 minmtu 68 maxmtu 9978 
    bond_slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr 98:03:9b:9d:73:00 queue_id 0 addrgenmode eui64 numtxqueues 576 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 portname p0 switchid 00739d00039b0398 parentbus pci parentdev 0000:18:00.0 
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 8     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    altname enp24s0f0np0
8: ens6f1np1: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:9d:73:00 brd ff:ff:ff:ff:ff:ff permaddr 98:03:9b:9d:73:01 promiscuity 1 minmtu 68 maxmtu 9978 
    bond_slave state BACKUP mii_status UP link_failure_count 0 perm_hwaddr 98:03:9b:9d:73:01 queue_id 0 addrgenmode eui64 numtxqueues 576 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 portname p1 switchid 00739d00039b0398 parentbus pci parentdev 0000:18:00.1 
    vf 0     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 1     link/ether fa:16:3e:95:28:f9 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 5     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 6     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 7     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 8     link/ether fa:16:3e:0e:a3:7a brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 9     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    altname enp24s0f1np1
52: ens6f1np1_1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 12:6f:37:7e:71:48 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 9978 
    openvswitch_slave addrgenmode eui64 numtxqueues 40 numrxqueues 40 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portname pf1vf1 switchid 00739d00039b0398 parentbus pci parentdev 0000:18:00.1 
    altname enp24s0f1npf1vf1
    altname ens6f1npf1vf1
59: ens6f1np1_8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether d2:f0:2d:d7:68:a4 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 9978 
    openvswitch_slave addrgenmode eui64 numtxqueues 40 numrxqueues 40 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 portname pf1vf8 switchid 00739d00039b0398 parentbus pci parentdev 0000:18:00.1 
    altname enp24s0f1npf1vf8
    altname ens6f1npf1vf8


[root@computehwoffload-r740 ~]# cat /proc/net/bonding/mx-bond 
Ethernet Channel Bonding Driver: v5.14.0-284.23.1.el9_2.x86_64

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: ens6f0np0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: ens6f0np0
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:03:9b:9d:73:00
Slave queue ID: 0

Slave Interface: ens6f1np1
MII Status: up
Speed: 40000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:03:9b:9d:73:01
Slave queue ID: 0


[root@computehwoffload-r740 ~]# tcpdump -nnei mx-bond ether proto 0x88cc  2>/dev/null & while true; do echo $(date +%H:%M:%S.%N) ens6f0np0 $(ethtool -S ens6f0np0 | grep -e rx_discards_phy); sleep 3; done
[1] 355250
10:26:04.115486390 ens6f0np0 rx_discards_phy: 3817918797
10:26:07.128211256 ens6f0np0 rx_discards_phy: 3817918797
10:26:10.138754931 ens6f0np0 rx_discards_phy: 3817918797
10:26:13.149529768 ens6f0np0 rx_discards_phy: 3817918797
10:26:16.159410485 ens6f0np0 rx_discards_phy: 3817918797
10:26:16.851660 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
10:26:19.169313826 ens6f0np0 rx_discards_phy: 3821575293
10:26:22.179908957 ens6f0np0 rx_discards_phy: 3826320710
10:26:25.188932723 ens6f0np0 rx_discards_phy: 3831025468
10:26:28.197987965 ens6f0np0 rx_discards_phy: 3834404055
10:26:31.207463084 ens6f0np0 rx_discards_phy: 3834404055
10:26:34.217173200 ens6f0np0 rx_discards_phy: 3834404055
10:26:37.226648204 ens6f0np0 rx_discards_phy: 3834404055
10:26:40.236542661 ens6f0np0 rx_discards_phy: 3834404055
10:26:43.245958602 ens6f0np0 rx_discards_phy: 3834404055
10:26:46.257063966 ens6f0np0 rx_discards_phy: 3834404055
10:26:46.917407 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
10:26:49.266472441 ens6f0np0 rx_discards_phy: 3838136857
10:26:52.276672696 ens6f0np0 rx_discards_phy: 3842907826
10:26:55.287084080 ens6f0np0 rx_discards_phy: 3847663971
10:26:58.296209675 ens6f0np0 rx_discards_phy: 3850935565
10:27:01.305549687 ens6f0np0 rx_discards_phy: 3850935565
10:27:04.316570355 ens6f0np0 rx_discards_phy: 3850935565
10:27:07.325294825 ens6f0np0 rx_discards_phy: 3850935565
10:27:10.462213557 ens6f0np0 rx_discards_phy: 3850935565
10:27:13.472317477 ens6f0np0 rx_discards_phy: 3850935565
10:27:16.551827288 ens6f0np0 rx_discards_phy: 3850935565
10:27:16.963882 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
10:27:19.565699568 ens6f0np0 rx_discards_phy: 3855069645
10:27:22.575045649 ens6f0np0 rx_discards_phy: 3859840105
10:27:25.585629566 ens6f0np0 rx_discards_phy: 3864608229
10:27:28.596472921 ens6f0np0 rx_discards_phy: 3866940652
10:27:31.605664812 ens6f0np0 rx_discards_phy: 3866940652
10:27:34.614802869 ens6f0np0 rx_discards_phy: 3866940652
10:27:37.624192348 ens6f0np0 rx_discards_phy: 3866940652
10:27:40.633373542 ens6f0np0 rx_discards_phy: 3866940652
10:27:43.642362008 ens6f0np0 rx_discards_phy: 3866940652
^C


[root@computehwoffload-r740 ~]# tcpdump -nnei mx-bond ether proto 0x88cc  2>/dev/null & while true; do date +%H:%M:%S.%N; ovs-appctl dpctl/dump-flows --names type=offloaded | sort; sleep 5; done
[1] 379393
10:31:27.365671792
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74609919722, bytes:4775033718524, used:0.850s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:74785507282, bytes:4786271309440, used:0.850s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74785616824, bytes:4487136539530, used:0.850s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74610329061, bytes:4476619309814, used:0.060s, actions:pop_vlan,ens6f1np1_8
10:31:32.385077338
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74676944871, bytes:4779323328060, used:0.990s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:74852628463, bytes:4790567065024, used:0.990s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74852738028, bytes:4491163811770, used:0.990s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74677354242, bytes:4480640820674, used:0.990s, actions:pop_vlan,ens6f1np1_8
10:31:37.395068949
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74748609952, bytes:4783909893244, used:0.880s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:74924293720, bytes:4795153641472, used:0.880s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74924403300, bytes:4495463728090, used:0.880s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74749019328, bytes:4484940725834, used:0.880s, actions:pop_vlan,ens6f1np1_8
10:31:42.404690194
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74820293001, bytes:4788497608380, used:0.770s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:74995976783, bytes:4799741357504, used:0.770s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74996086316, bytes:4499764709050, used:0.770s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74820702361, bytes:4489241707814, used:0.770s, actions:pop_vlan,ens6f1np1_8
10:31:47.415092157
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74891948649, bytes:4793083569852, used:0.660s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75067632619, bytes:4804327331008, used:0.660s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75067742169, bytes:4504064060230, used:0.660s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74892358011, bytes:4493541046814, used:0.660s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(mx-bond),eth(src=f4:52:14:25:28:7a,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, actions:drop
10:31:47.352709 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
10:31:52.425718380
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:74953495176, bytes:4797022547580, used:0.870s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75129682080, bytes:4808298496512, used:0.870s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75129791603, bytes:4507787026270, used:0.870s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:74953904510, bytes:4497233836754, used:0.870s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(mx-bond),eth(src=f4:52:14:25:28:7a,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, actions:drop
10:31:57.435854645
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:75024789018, bytes:4801585353468, used:0.250s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75201652911, bytes:4812904629696, used:0.250s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75201762440, bytes:4512105276490, used:0.250s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75025198372, bytes:4501511468474, used:0.250s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=148,pcp=0),mx-bond
recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=149,pcp=0),mx-bond
recirc_id(0),in_port(mx-bond),eth(src=f4:52:14:25:28:7a,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:10.080s, actions:drop
10:32:02.447256298
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:75090206558, bytes:4805772076028, used:0.530s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75267148882, bytes:4817096371840, used:0.530s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75267258435, bytes:4516035036190, used:0.530s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75090615916, bytes:4505436521114, used:0.530s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=148,pcp=0),mx-bond
recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=149,pcp=0),mx-bond
10:32:07.457799792
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:75167237138, bytes:4810702033148, used:0.030s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75344179568, bytes:4822026335744, used:0.030s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75344289085, bytes:4520656875190, used:0.030s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75167646446, bytes:4510058352914, used:0.030s, actions:pop_vlan,ens6f1np1_8
10:32:12.485211959
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:75224579847, bytes:4814371966524, used:0.980s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75401522303, bytes:4825696270784, used:0.980s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75401631879, bytes:4524097442830, used:0.980s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75224989222, bytes:4513498919474, used:0.980s, actions:pop_vlan,ens6f1np1_8
10:32:17.408220 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
10:32:17.517194539
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:75296249192, bytes:4818958804604, used:0.880s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:75473191844, bytes:4830283121408, used:0.880s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75473301401, bytes:4528397614150, used:0.880s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:75296658539, bytes:4517799078494, used:0.880s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(mx-bond),eth(src=f4:52:14:25:28:7a,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, actions:drop
^C


[root@computehwoffload-r740 ~]# tcpdump -nnei mx-bond ether proto 0x88cc  2>/dev/null & ssh -i test_keypair.key cloud-user.228.38 sudo python3 /root/dpdk-port-stats.py -s /run/dpdk/rte/dpdk_telemetry.v2 -t 5
[1] 441749
The authenticity of host '10.46.228.38 (10.46.228.38)' can't be established.
ED25519 key fingerprint is SHA256:BQhMC4Vzv4ygXHu13liV98FHEGOC5YZlgs63zEpCo0E.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.46.228.38' (ED25519) to the list of known hosts.
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
10:44:48.492633 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
---
0: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
1: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=13.5M pkt/s DROP=0.0 pkt/s TX=13.4M pkt/s
1: RX=13.4M pkt/s DROP=0.0 pkt/s TX=13.5M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
10:45:18.545548 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
---
0: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
1: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
1: RX=13.3M pkt/s DROP=0.0 pkt/s TX=13.3M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s

Comment 12 Robin Jarry 2023-07-21 12:04:55 UTC
Additional info:

1) Increasing other_config:max-idle to 60s to allow the lldp drop offloaded flow to remain installed causes the packet drop to remain constant. Tcpdump does not see the LLDP packets anymore as they are dropped by hardware by the offloaded flow.

[root@computehwoffload-r740 ~]# devlink dev param show  pci/0000:18:00.0 name flow_steering_mode
pci/0000:18:00.0:
  name flow_steering_mode type driver-specific
    values:
      cmode runtime value smfs
[root@computehwoffload-r740 ~]# devlink dev param show  pci/0000:18:00.1 name flow_steering_mode
pci/0000:18:00.1:
  name flow_steering_mode type driver-specific
    values:
      cmode runtime value smfs

[root@computehwoffload-r740 ~]# ovs-vsctl set o . other_config:max-idle=60000

[root@computehwoffload-r740 ~]# ovs-appctl dpctl/dump-flows --names type=offloaded
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0800),ipv4(frag=no), packets:11852789113, bytes:758577366370, used:0.610s, actions:push_vlan(vid=148,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0800),ipv4(frag=no), packets:11852870204, bytes:758582539004, used:0.610s, actions:push_vlan(vid=149,pcp=0),mx-bond
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c4,dst=fa:16:3e:95:28:f9),eth_type(0x8100),vlan(vid=148,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:11853410853, bytes:711204182404, used:0.610s, actions:pop_vlan,ens6f1np1_1
ct_mark(0/0x2),recirc_id(0),in_port(mx-bond),eth(src=f8:f2:1e:03:c8:c6,dst=fa:16:3e:0e:a3:7a),eth_type(0x8100),vlan(vid=149,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:11853294719, bytes:711197251940, used:0.611s, actions:pop_vlan,ens6f1np1_8
recirc_id(0),in_port(ens6f1np1_1),eth(src=fa:16:3e:95:28:f9,dst=f8:f2:1e:03:c8:c4),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=148,pcp=0),mx-bond
recirc_id(0),in_port(ens6f1np1_8),eth(src=fa:16:3e:0e:a3:7a,dst=f8:f2:1e:03:c8:c6),eth_type(0x0806), packets:0, bytes:0, used:never, actions:push_vlan(vid=149,pcp=0),mx-bond
recirc_id(0),in_port(mx-bond),eth(src=f4:52:14:25:28:7a,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:3, bytes:633, used:18.010s, actions:drop

[root@computehwoffload-r740 ~]# ssh -i test_keypair.key cloud-user.228.38 sudo python3 /root/dpdk-port-stats.py -s /run/dpdk/rte/dpdk_telemetry.v2 -t 5
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s
---
0: RX=12.8M pkt/s DROP=0.0 pkt/s TX=12.7M pkt/s
1: RX=12.7M pkt/s DROP=0.0 pkt/s TX=12.8M pkt/s


2) Changing flow_steering_mode to dmfs fixes the packet drop.

[root@computehwoffload-r740 ~]# ovs-vsctl set o . other_config:max-idle=10000
[root@computehwoffload-r740 ~]# devlink dev param set pci/0000:18:00.0 name flow_steering_mode value dmfs cmode runtime
[root@computehwoffload-r740 ~]# devlink dev param set pci/0000:18:00.1 name flow_steering_mode value dmfs cmode runtime
[root@computehwoffload-r740 ~]# systemctl restart openvswitch
[root@computehwoffload-r740 ~]# devlink dev param show  pci/0000:18:00.0 name flow_steering_mode
pci/0000:18:00.0:
  name flow_steering_mode type driver-specific
    values:
      cmode runtime value dmfs
[root@computehwoffload-r740 ~]# devlink dev param show  pci/0000:18:00.1 name flow_steering_mode
pci/0000:18:00.1:
  name flow_steering_mode type driver-specific
    values:
      cmode runtime value dmfs

[root@computehwoffload-r740 ~]# ssh -i test_keypair.key cloud-user.228.38 sudo python3 /root/dpdk-port-stats.py -s /run/dpdk/rte/dpdk_telemetry.v2 -t 5
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
11:53:24.801714 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
11:53:54.825009 f4:52:14:25:28:7a > 01:80:c2:00:00:0e, ethertype LLDP (0x88cc), length 211: LLDP, length 197: nfv-private-sw05
---
0: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s
1: RX=14.0M pkt/s DROP=0.0 pkt/s TX=14.0M pkt/s

Comment 13 Marcelo Ricardo Leitner 2023-07-21 20:12:37 UTC
Robin, I'm not sure if you're planning to post a new comment or what, but this bz could use a fresh summary.
It's not clear from the comments above where the performance issue lies, especially after comment #12 compared smfs and dmfs with no recorded drops.
Please also add mentions to the vm's forced context switches triggered by the LLDP packets. That's key to this bz, actually.

Comment 14 Robin Jarry 2023-07-25 13:18:12 UTC
Hi Marcelo,

I don't have access to the platform anymore and cannot provide with traces. However, here are my observations:

With comment #12 step 1 (smfs + other_config:max-idle=60000) The rx_discards_phy counter was increasing constantly which means that the packets are dropped at reception by the CX-5 ports, testpmd running in the VM only sees a lower rx rate.

With comment #12 step 2 (dmfs + other_config:max-idle=10000) The rx_discards_phy counter remains constant and testpmd running in the VM sees the rate at which the traffic generator is sending.

I think this should be easy to reproduce without openstack.

Comment 15 Robin Jarry 2023-07-25 13:20:17 UTC
I have updated the summary. I'm not certain about the formulation. Will try to refine it later on.

Comment 16 Robin Jarry 2023-08-02 11:54:28 UTC
@bnemeth @wizhao do you have any idea what could be causing this perf regression?

Comment 17 William Zhao 2023-08-02 16:21:41 UTC
(In reply to Robin Jarry from comment #16)
> @bnemeth @wizhao do you have any idea what could be
> causing this perf regression?

I have sent it to NVIDIA to take a look. We have also seen regressions with SMFS when compared with DMFS. NVIDIA was looking into this a few months back, but maybe they were chasing a red herring in their setup. I have told them that we run LLDP in our labs and this might be a big hint for them to reproduce it on their end.

Unfortunately DMFS/SMFS is NVIDIA proprietary so we can't open it up to see what might be going wrong. However NVIDIA has strongly urged us to switch to SMFS since this is the only mode they would be supporting moving forward.

I see that you are invite to Thursday's 8AM EST meeting with NVIDIA. We need to bring this topic up with them.


Note You need to log in before you can comment on or make changes to this bug.