Bug 1626217
| Summary: | OVN support for deterministic MAC addresses | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Winship <danw> | |
| Component: | openvswitch | Assignee: | lorenzo bianconi <lorenzo.bianconi> | |
| Status: | CLOSED ERRATA | QA Contact: | haidong li <haili> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 7.7 | CC: | atelang, atragler, ctrautma, dalvarez, danken, haili, igkioka, ovs-qe, qding, tredaelli | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openvswitch-2.9.0-81.el7fdn | Doc Type: | Enhancement | |
| Doc Text: |
This update introduces a deterministic relationship between IP and MAC addresses dynamically allocated by OVN. As a result, the POD is always reachable even if it gets a new IP address from OVN.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1648272 (view as bug list) | Environment: | ||
| Last Closed: | 2019-01-02 17:54:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1648272 | |||
Perhaps this has to do with old MAC_Binding entries in SB DB. In OpenStack I sent a patch to workaround the issue and delete them: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047604.html No, it's the pods that are caching old MAC addresses in this case, not OVN. Though the bug you pointed out might cause additional OVN-level problems on top of the pod-level problems too I guess. (We haven't actually gotten to the point in OVN testing where we've encountered this issue yet.) Yeah got it now, thanks Dan! It perhaps can cause additional OVN problems as you point out. Something to have in mind now is that if we generate the mac address deterministically based on the IP address, then stale MAC_Binding entries have to be removed when updating/upgrading OVS :) upstream patches (not applied yet): - https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353327.html - https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353328.html verified on the latest version: [root@hp-dl388g8-19 ovn]# uname -a Linux hp-dl388g8-19.rhts.eng.pek2.redhat.com 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux [root@hp-dl388g8-19 ovn]# rpm -qa | grep openvswitch openvswitch-2.9.0-81.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-108.noarch openvswitch-ovn-common-2.9.0-81.el7fdp.x86_64 openvswitch-ovn-host-2.9.0-81.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-8.el7fdp.noarch openvswitch-ovn-central-2.9.0-81.el7fdp.x86_64 [root@hp-dl388g8-19 ovn]# ovn-nbctl ls-add sw6 [root@hp-dl388g8-19 ovn]# ovn-nbctl set NB_Global . options:mac_prefix="00:11:22:33:44:55" [root@hp-dl388g8-19 ovn]# ovn-nbctl set Logical-Switch sw6 other_config:subnet=192.168.100.0/24 [root@hp-dl388g8-19 ovn]# ovn-nbctl lsp-add sw6 p6 -- lsp-set-addresses p6 dynamic [root@hp-dl388g8-19 ovn]# ovn-nbctl get Logical-Switch-Port p6 dynamic_addresses "00:11:22:a8:64:03 192.168.100.2" [root@hp-dl388g8-19 ovn]# ovn-nbctl lsp-add sw6 p7 -- lsp-set-addresses p7 dynamic [root@hp-dl388g8-19 ovn]# ovn-nbctl get Logical-Switch-Port p7 dynamic_addresses "00:11:22:a8:64:04 192.168.100.3" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0014 |
In OpenShift we've seen a problem where when pods are being created and destroyed at a high rate, you eventually end up with a scenario where: - pod A is talking to pod B, which has, say, IP 10.0.1.5 and MAC bb:bb:bb:bb:bb:bb - pod A ends up with an entry 10.0.1.2 -> bb:bb:bb:bb:bb:bb in its ARP cache - pod B exits / is destroyed - Around 255 other pods on pod B's node are created/destroyed in a short amount of time, and the IP address assignment range wraps around back to the beginning again. - pod C is created and gets assigned IP 10.0.1.5 and MAC cc:cc:cc:cc:cc:cc - pod A tries to talk to pod C, finds that it already has an ARP cache entry for 10.0.1.5, and so tries to send packets to IP 10.0.1.5, MAC bb:bb:bb:bb:bb:bb - These packets go nowhere because nobody currently has that MAC - pod A's attempt to talk to pod C eventually times out. Things start failing (This is not a problem in a VM-based world because of a combination of (a) VMs come and go less quickly than containers, so other VMs are less likely to still have stale ARP cache mappings when an IP gets reused again; and (b) VMs, like bare metal hosts, tend to have startup scripts that send out gratuitous ARPs when they bring up their network connection, so anyone who did have a stale ARP cache entry would get fixed.) In OpenShift SDN, our fix for this was to just assign pods deterministic MAC addresses that were based on their IPs; specifically they get 0a:58:ww:xx:yy:zz, where ww:xx:yy:zz is the IP converted to hex. (The code for this comes from CNI and is used by some other plugins as well. I don't know who chose the prefix "0a:58" or why.) With ovn-kubernetes, we will need to either 1. also have deterministic IP-to-MAC mappings, OR 2. send out ARP announcements whenever a pod is created The latter would be possible, but is less inefficient if lots of pods are being created, especially if they are attached to logical switches that are spread across multiple hosts. We don't handle IPv6 yet, and I'm not sure what the situation is there; in theory the kernel automatically handles the "announcement" part, so there might not be a problem. Unless the announcements get sent out before OVN is ready to forward them to other ports, which might be the case. Also, even if the announcements do get sent out, and do work, it would still be more efficient to *not* forward them, if they were known to be unnecessary.