Bug 1626217

Summary:	OVN support for deterministic MAC addresses
Product:	Red Hat Enterprise Linux 7	Reporter:	Dan Winship <danw>
Component:	openvswitch	Assignee:	lorenzo bianconi <lorenzo.bianconi>
Status:	CLOSED ERRATA	QA Contact:	haidong li <haili>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.7	CC:	atelang, atragler, ctrautma, dalvarez, danken, haili, igkioka, ovs-qe, qding, tredaelli
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openvswitch-2.9.0-81.el7fdn	Doc Type:	Enhancement
Doc Text:	This update introduces a deterministic relationship between IP and MAC addresses dynamically allocated by OVN. As a result, the POD is always reachable even if it gets a new IP address from OVN.	Story Points:	---
Clone Of:
Clones:	1648272 (view as bug list)		Environment:
Last Closed:	2019-01-02 17:54:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1648272

Description Dan Winship 2018-09-06 18:37:48 UTC

In OpenShift we've seen a problem where when pods are being created and destroyed at a high rate, you eventually end up with a scenario where:

  - pod A is talking to pod B, which has, say, IP 10.0.1.5 and
    MAC bb:bb:bb:bb:bb:bb

      - pod A ends up with an entry 10.0.1.2 -> bb:bb:bb:bb:bb:bb in
        its ARP cache

  - pod B exits / is destroyed

  - Around 255 other pods on pod B's node are created/destroyed in a
    short amount of time, and the IP address assignment range wraps
    around back to the beginning again.

  - pod C is created and gets assigned IP 10.0.1.5 and
    MAC cc:cc:cc:cc:cc:cc

  - pod A tries to talk to pod C, finds that it already has an ARP
    cache entry for 10.0.1.5, and so tries to send packets to
    IP 10.0.1.5, MAC bb:bb:bb:bb:bb:bb

      - These packets go nowhere because nobody currently has that MAC

  - pod A's attempt to talk to pod C eventually times out. Things
    start failing

(This is not a problem in a VM-based world because of a combination of (a) VMs come and go less quickly than containers, so other VMs are less likely to still have stale ARP cache mappings when an IP gets reused again; and (b) VMs, like bare metal hosts, tend to have startup scripts that send out gratuitous ARPs when they bring up their network connection, so anyone who did have a stale ARP cache entry would get fixed.)

In OpenShift SDN, our fix for this was to just assign pods deterministic MAC addresses that were based on their IPs; specifically they get 0a:58:ww:xx:yy:zz, where ww:xx:yy:zz is the IP converted to hex. (The code for this comes from CNI and is used by some other plugins as well. I don't know who chose the prefix "0a:58" or why.)


With ovn-kubernetes, we will need to either

  1. also have deterministic IP-to-MAC mappings, OR
  2. send out ARP announcements whenever a pod is created

The latter would be possible, but is less inefficient if lots of pods are being created, especially if they are attached to logical switches that are spread across multiple hosts.


We don't handle IPv6 yet, and I'm not sure what the situation is there; in theory the kernel automatically handles the "announcement" part, so there might not be a problem. Unless the announcements get sent out before OVN is ready to forward them to other ports, which might be the case. Also, even if the announcements do get sent out, and do work, it would still be more efficient to *not* forward them, if they were known to be unnecessary.

Comment 2 Daniel Alvarez Sanchez 2018-10-29 12:28:43 UTC

Perhaps this has to do with old MAC_Binding entries in SB DB. In OpenStack I sent a patch to workaround the issue and delete them:

https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047604.html

Comment 3 Dan Winship 2018-10-29 13:34:02 UTC

No, it's the pods that are caching old MAC addresses in this case, not OVN. Though the bug you pointed out might cause additional OVN-level problems on top of the pod-level problems too I guess. (We haven't actually gotten to the point in OVN testing where we've encountered this issue yet.)

Comment 4 Daniel Alvarez Sanchez 2018-10-29 13:38:41 UTC

Yeah got it now, thanks Dan! It perhaps can cause additional OVN problems as you point out. Something to have in mind now is that if we generate the mac address deterministically based on the IP address, then stale MAC_Binding entries have to be removed when updating/upgrading OVS :)

Comment 5 lorenzo bianconi 2018-10-31 10:42:09 UTC

upstream patches (not applied yet):
- https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353327.html
- https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353328.html

Comment 7 haidong li 2018-12-03 03:44:04 UTC

verified on the latest version:
[root@hp-dl388g8-19 ovn]# uname -a
Linux hp-dl388g8-19.rhts.eng.pek2.redhat.com 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl388g8-19 ovn]# rpm -qa | grep openvswitch
openvswitch-2.9.0-81.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-1.0-108.noarch
openvswitch-ovn-common-2.9.0-81.el7fdp.x86_64
openvswitch-ovn-host-2.9.0-81.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-8.el7fdp.noarch
openvswitch-ovn-central-2.9.0-81.el7fdp.x86_64
[root@hp-dl388g8-19 ovn]#  ovn-nbctl ls-add sw6
[root@hp-dl388g8-19 ovn]# ovn-nbctl  set NB_Global . options:mac_prefix="00:11:22:33:44:55"
[root@hp-dl388g8-19 ovn]# ovn-nbctl  set Logical-Switch sw6 other_config:subnet=192.168.100.0/24
[root@hp-dl388g8-19 ovn]# ovn-nbctl lsp-add sw6 p6 -- lsp-set-addresses p6 dynamic
[root@hp-dl388g8-19 ovn]# ovn-nbctl get Logical-Switch-Port p6 dynamic_addresses
"00:11:22:a8:64:03 192.168.100.2"
[root@hp-dl388g8-19 ovn]# ovn-nbctl lsp-add sw6 p7 -- lsp-set-addresses p7 dynamic
[root@hp-dl388g8-19 ovn]#  ovn-nbctl get Logical-Switch-Port p7 dynamic_addresses
"00:11:22:a8:64:04 192.168.100.3"

Comment 9 errata-xmlrpc 2019-01-02 17:54:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0014