Description of problem: OSP 8 environment. This environment uses Arista's VARP which periodically floods unicast ARP packets associated with the virtual router's mac. At times, these packets are retransmitted by OVS on the incoming port/bond which causes issues for the upstream switches where they incorrectly learn the source MAC from the connected OVS switch. Neither the source or destination MAC are local to OVS and are only learned from the connected switch. Obviously, the packets should not be retransmitted on the incoming port. OVS balance-slb bonding with two physical interfaces. This environment ran previously in a non-bonded mode and did not experience this issue. Disabling a single bond member did not resolve the issue. The OVS fdb was monitored with debug logging and the source or destination MAC was never learned on any port other than the external bond. (ovs-appctl vlog/set ofproto_dpif_xlate:file:dbg) To mitigate the issue, flow rules were added to the OVS bridge to drop all inbound ARP packets with this specific destination mac address. # ovs-ofctl dump-flows br-ex table=0 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=1117.317s, table=0, n_packets=359, n_bytes=22976, idle_age=8, arp,in_port=1,dl_dst=00:11:22:33:44:55 actions=drop cookie=0x0, duration=1116.538s, table=0, n_packets=362, n_bytes=23168, idle_age=7, arp,in_port=2,dl_dst=00:11:22:33:44:55 actions=drop cookie=0x0, duration=776901.860s, table=0, n_packets=6431978383, n_bytes=2713476631096, idle_age=0, hard_age=65534, priority=0 actions=NORMAL I'll add more specific information about this environment in a private update. Version-Release number of selected component (if applicable): python-openvswitch-2.5.1-1.el7_4.noarch (customer build) However it is reproducible on openvswitch-2.5.0-16.git20160727.el7ost.x86_64 How reproducible: Only in this specific environment. Unable to reproduce. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Will add more environment specific information
As a trial mitigation for this problem on the br-ex ports that map to the physical uplink interfaces, the customer added a flow that looks for those ports being ingress, the traffic is ARP, and the destination mac is 00:11:22:33:44:55 then to drop that traffic: # ovs-ofctl dump-flows br-ex table=0 NXST_FLOW reply (xid=0x4): cookie=0x0, duration=1117.317s, table=0, n_packets=359, n_bytes=22976, idle_age=8, arp,in_port=1,dl_dst=00:11:22:33:44:55 actions=drop cookie=0x0, duration=1116.538s, table=0, n_packets=362, n_bytes=23168, idle_age=7, arp,in_port=2,dl_dst=00:11:22:33:44:55 actions=drop cookie=0x0, duration=776901.860s, table=0, n_packets=6431978383, n_bytes=2713476631096, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
Hi, Since br-ex is using action=NORMAL, OVS will look at which port the packet is being received (xlate_normal) and it will check its admissibility there. If it is a bond, like in this case, and since this is SLB, OVS will drop all packets for which it has learned a different input port (vlan considered). That's what avoid "echo'ing" the packet back. Therefore, the packet will be accepted if either MAC learning doesn't know it yet or (MAC, VLAN) matches with previously learned input port, otherwise the (MAC, VLAN) is known on another input_port and that is blocked (loop detected). An exception is made for gratuitous ARP because they are used when VMs are migrated, but there is a "lockdown" of 5 seconds. This doesn't seem to be case as the src/dst are unicasts. Then OVS updates MAC learning with the input_port, and if the ARP dst MAC address is unknown, it will broadcast the packet otherwise send to the port registered in the fdb. At this point OVS knows (VLAN, 52:54...) comes from bond. However, if during the broadcast, the host networking loops the packet back, then it will learn (VLAN, 52:54...) is coming from another input_port and will send it back to the switch. To verify that, we need to monitor br-ex fdb and see if the MAC is learned on another port. This seems to be done, according with comment#0. Looking at the traffic dump from comment#2, the ARP is repeating every 20ms, is that correct or does that indicate a problem? Another question, does the ARP come on both bond slaves or just one? Do you see the FDB flapping for that MAC between bond slaves? Thanks, fbl
We ran 'ovs-appctl fdb/show br-ex' in a loop for two hours and did not observe the MAC moving between ports. Also debug logging ('ovs-appctl vlog/set ofproto_dpif_xlate:file:dbg') was enabled to record MAC learning. There were no log entries indicating the MAC was bouncing.
@Matteo, comment 34 has a needinfo from me, are you all set with it? if no, let me know how I can help
Hi Anis, it was for a lab setup not needed anymore