The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1720935 - RFE: Add option to allow packets coming from inactive interface of a non LACP bond.
Summary: RFE: Add option to allow packets coming from inactive interface of a non LACP...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch
Version: FDP 19.C
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Adrián Moreno
QA Contact: Hekai Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-16 16:52 UTC by Christophe Fontaine
Modified: 2022-09-13 07:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-13 07:15:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-116 0 None None None 2022-02-22 05:54:28 UTC

Description Christophe Fontaine 2019-06-16 16:52:50 UTC
Description of problem:
Due to TOR switch limitation, we can't enable LACP on multiple pairs of VFs.

But, we can enable LACP (which is mandatory for MLAG and/or for balance-tcp) on 1 pair, and configure the other pairs as 'balance-slb' for ovs-dpdk bonds or xor for linux bonds.

Yet, the other ovs bond discard broadcast packets coming from the 'inactive' interfaces.

file ofproto/bond.c +829
```
enum bond_verdict
bond_check_admissibility(struct bond *bond, const void *slave_,
                         const struct eth_addr eth_dst)

    /* Drop all multicast packets on inactive slaves. */
    if (eth_addr_is_multicast(eth_dst)) {
        if (bond->active_slave != slave) {
            goto out;
        }
    }
```

Yet, this slave isn't inactive (as we're in balance-slb), and the TOR switch sends the multicast frame to only 1 slave based on the hash, which is the secondary interface.
This packet is dropped. (in my case, an ARP Request frame, but this applies to any broadcast/multicast frame)



Version-Release number of selected component (if applicable):
openvswitch-2.9.0-103.el7fdp.x86_64
The code is the same in ovs 2.11

How reproducible:
Always

Steps to Reproduce:
1. Create 2 pairs of VFs with the PFs connected to the same TOR switch
2. Create linux bond with LACP enabled and another dpdk bond as balance-slb
3. Send a broadcast frame built so it will be sent to the 'inactive' interface

Behaviour reproduced also with ovs-appctl ofproto/trace:
[root@overcloud-computeovsdpdksriov-0 heat-admin]# ovs-appctl ofproto/trace br-link0 in_port=2,dl_dst=ff:ff:ff:ff:ff:ff
Flow: in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000

bridge("br-link0")
------------------
0. priority 0
   NORMAL
    -> no learned MAC for destination, flooding

Final flow: unchanged

But if we send the same frame on the inactive slave:

[root@overcloud-computeovsdpdksriov-0 heat-admin]# ovs-appctl ofproto/trace br-link0 in_port=1,dl_dst=ff:ff:ff:ff:ff:ff
Flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000

bridge("br-link0")
------------------
0. priority 0
   NORMAL
    -> bonding refused admissibility, dropping


Actual results:
broadcast packets coming on the secondary interface are dropped.

Expected results:
With an config option, broadcast packets should be allowed to be processed.

Additional info:

Comment 4 Flavio Leitner 2019-09-13 20:22:19 UTC
Hi,

I think there are some confusion, so let's get on the same page before move to the problem itself.

A TOR switch or a host can have one or multiple ports. Those ports are by default independent of each other.
That means we expect each port to be part of MAC learning and all that. Broadcast packets or multicast (snooping off) packets are expected to be flooded to all ports.

Then we have LACP, which bundles two or more physical ports into a single logical one. The protocol exchange LACPDUs to make sure the link is available and usable.
Therefore, from the MAC learning, broadcast or multicast, all ports negotiated as part of a single LACP trunk are seeing as a single big link, which means only one copy of those packets are sent to a single physical port only based on some hashing algorithm.

Now talking about bond mode. If the Linux bond has LACP enabled, then as discussed above it is guaranteed only a single copy of the multicast/broadcast will be sent from the TOR switch and that's why Linux Bond allows it to be received.

In the case of balance-slb, yes, you're correct that more than a single port is active sending traffic, but since it does not use LACP, we expect the TOR switch to send as many copies as the number of ports in the bond to the host. That means we cannot allow all ports to receive the packet, otherwise the host will see dups all the time. That's the reason that function bond_check_admissibility() drops packets from the inactive *receiving* slaves.

Still one bond_check_admissibility(), if you check above that, there are checks related to LACP confirming that if you have a trunk negotiated, the packet is accepted.

In summary, we should not enable an inactive receiving slave without LACP to receive such traffic.

The problem here is that you are trying to use bond and SR-IOV which is not supported:
https://access.redhat.com/documentation/en-us/reference_architectures/2017/html/deploying_mobile_networks_using_red_hat_openstack_platform_10/high_availability
https://access.redhat.com/solutions/355853

Unfortunately the TOR switch has no way to know if the ports are PFs or VFs in the host, so the setup in this case is half broken by definition.

fbl

Comment 5 Christophe Fontaine 2019-09-13 22:21:22 UTC
Hi Flavio,, 

I'm not using SRIOV for virtual machines, but in the context of NIC partitioning, aka to have the ability to reduce the number of interfaces for an OpenStack deployment.
Instead of needing minimum 4 ports, (2 for the control plane, 2 for the data plane), using VFs on top of 25Gb (or more 40/50/100) ports is a nice feature.

Working with active/backup bond is good, but LACP is desirable as well.
For instance, we could split the 25 Gb as following: (using min_qos feature if available, else enforcing the max rates with max_qos)
- 5 Gb for Linux mgmt, VM migration, ...
- 5 Gb for storage (Ceph)
- 15 Gb for dataplane

Yet, all these networks have different SLA: 10Gb would be required for user traffic (ie no more than 15Gb will ever go thru this link), but it would be nice to use the other interface to speedup storage or VM migration.

So, in this test, I have 3 bonds on top of 3 pairs of VFs:
- 1 dedicated to Linux (OpenStack control plane)
- 1 dedicated to Storage
- 1 dedicated to ovs-dpdk

Enabling LACP with ovs-dpdk is an option (which works on my deployment), but if ovs-dpdk is down ( update ), the LACP bond will be down as well, without any notification to the linux bonds (the switch won't bring the port down as it will continue to bring back the LACP interface).

So, the appropriate configuration (until we have a LACP bond VLAN aware switch) is be to enable LACP on top of a linux managed bond, and to add an option to accept all packets on the "inactive" slave.
If I'm not mistaken, this would be the equivalent of the option "all_slaves_active" for linux bonds. [1]

This whole request comes from the way switches build the bonds and add vlans on top of them, while I wish to create VLANs (2 VFs), and create an LACP on top of these bonds.
If we could have 1 LACP negotiation per VLAN, the issue would be resolved as we could enable LACP for all bonds.

Christophe

[1] https://github.com/torvalds/linux/blob/master/Documentation/networking/bonding.txt

Comment 6 Flavio Leitner 2019-09-16 14:02:19 UTC
Hi Christopher,

Let me elaborate on the half broken thing. I agree with you that it "works" if we add the suggested option, but it's not robust for production use and will not provide the fault tolerance required.

For example, what happens when the LACP is still negotiating the trunk? The other bond (slb/balance-tcp/active-backup) could try to transmit and that might trigger alarms on the switch side because that's not allowed. Or just be silent dropped. We don't expect that to happen when using bonds.

Another example: What happens if one LACP LAG goes down but not the physical link? The other bond (slb/balance-tcp/active-backup) might be using that lag and will continue send traffic since the physical link is UP, though the switch will again flag an alarm or drop packets.

Also, enabling two LACP for the same trunk is out of the specs which assumes 1:1, not 1:N, unless you have some way to split them inside of VLAN or something.

fbl


Note You need to log in before you can comment on or make changes to this bug.