Bug 1557405 - Openvswitch retransmits ARP unicast packets on incoming port - Arista VARP
Summary: Openvswitch retransmits ARP unicast packets on incoming port - Arista VARP
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Matteo Croce
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-16 14:21 UTC by Matt Flusche
Modified: 2022-03-13 15:13 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-01 20:40:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-8877 0 None None None 2022-03-13 15:13:17 UTC

Description Matt Flusche 2018-03-16 14:21:22 UTC
Description of problem:

OSP 8 environment.

This environment uses Arista's VARP which periodically floods unicast ARP packets associated with the virtual router's mac.  At times, these packets are retransmitted by OVS on the incoming port/bond which causes issues for the upstream switches where they incorrectly learn the source MAC from the connected OVS switch. Neither the source or destination MAC are local to OVS and are only learned from the connected switch.  Obviously, the packets should not be retransmitted on the incoming port.

OVS balance-slb bonding with two physical interfaces. This environment ran previously in a non-bonded mode and did not experience this issue.  Disabling a single bond member did not resolve the issue.

The OVS fdb was monitored with debug logging and the source or destination MAC was never learned on any port other than the external bond. (ovs-appctl vlog/set ofproto_dpif_xlate:file:dbg)

To mitigate the issue, flow rules were added to the OVS bridge to drop all inbound ARP packets with this specific destination mac address.

# ovs-ofctl dump-flows br-ex table=0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1117.317s, table=0, n_packets=359, n_bytes=22976, idle_age=8, arp,in_port=1,dl_dst=00:11:22:33:44:55 actions=drop
 cookie=0x0, duration=1116.538s, table=0, n_packets=362, n_bytes=23168, idle_age=7, arp,in_port=2,dl_dst=00:11:22:33:44:55 actions=drop
 cookie=0x0, duration=776901.860s, table=0, n_packets=6431978383, n_bytes=2713476631096, idle_age=0, hard_age=65534, priority=0 actions=NORMAL


I'll add more specific information about this environment in a private update.       

Version-Release number of selected component (if applicable):

python-openvswitch-2.5.1-1.el7_4.noarch (customer build)
However it is reproducible on openvswitch-2.5.0-16.git20160727.el7ost.x86_64



How reproducible:
Only in this specific environment.  Unable to reproduce.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Will add more environment specific information

Comment 5 Andreas Karis 2018-03-19 21:31:05 UTC
As a trial mitigation for this problem on the br-ex ports that map to the physical uplink interfaces, the customer added a flow that looks for those ports being ingress, the traffic is ARP, and the destination mac is 00:11:22:33:44:55 then to drop that traffic:
# ovs-ofctl dump-flows br-ex table=0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1117.317s, table=0, n_packets=359, n_bytes=22976, idle_age=8, arp,in_port=1,dl_dst=00:11:22:33:44:55 actions=drop
 cookie=0x0, duration=1116.538s, table=0, n_packets=362, n_bytes=23168, idle_age=7, arp,in_port=2,dl_dst=00:11:22:33:44:55 actions=drop
 cookie=0x0, duration=776901.860s, table=0, n_packets=6431978383, n_bytes=2713476631096, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

Comment 14 Flavio Leitner 2018-04-19 20:14:30 UTC
Hi,

Since br-ex is using action=NORMAL, OVS will look at which port the packet is being received (xlate_normal) and it will check its admissibility there.

If it is a bond, like in this case, and since this is SLB, OVS will drop all packets for which it has learned a different input port (vlan considered). That's what avoid "echo'ing" the packet back.  Therefore, the packet will be accepted if either MAC learning doesn't know it yet or (MAC, VLAN) matches with previously learned input port, otherwise the (MAC, VLAN) is known on another input_port and that is blocked (loop detected).

An exception is made for gratuitous ARP because they are used when VMs are  migrated, but there is a "lockdown" of 5 seconds. This doesn't seem to be case as the src/dst are unicasts.

Then OVS updates MAC learning with the input_port, and if the ARP dst MAC address is unknown, it will broadcast the packet otherwise send to the port registered in the fdb.

At this point OVS knows (VLAN, 52:54...) comes from bond.

However, if during the broadcast, the host networking loops the packet back, then it will learn (VLAN, 52:54...) is coming from another input_port and will send it back to the switch.  To verify that, we need to monitor br-ex fdb and see if the MAC is learned on another port. This seems to be done, according with comment#0.

Looking at the traffic dump from comment#2, the ARP is repeating every 20ms, is that correct or does that indicate a problem?

Another question, does the ARP come on both bond slaves or just one?

Do you see the FDB flapping for that MAC between bond slaves?

Thanks,
fbl

Comment 15 Matt Flusche 2018-04-20 14:13:50 UTC
We ran 'ovs-appctl fdb/show br-ex' in a loop for two hours and did not observe the MAC moving between ports.

Also debug logging ('ovs-appctl vlog/set ofproto_dpif_xlate:file:dbg') was enabled to record MAC learning.  There were no log entries indicating the MAC was bouncing.

Comment 41 Aniss Loughlam 2018-06-25 19:46:30 UTC
@Matteo, comment 34 has a needinfo from me, are you all set with it? if no, let me know how I can help

Comment 42 Matteo Croce 2018-06-26 12:11:15 UTC
Hi Anis,

it was for a lab setup not needed anymore


Note You need to log in before you can comment on or make changes to this bug.