Red Hat Bugzilla – Bug 1450205
Gratuitous ARP updates received in span of 2-3 seconds time frame are all ignored
Last modified: 2018-04-10 16:03:06 EDT
OpenStack Neutron L3 agent sends 3 gratuitous ARP replies when moving a "floating" IP address from one device to another. When Linux kernel receives the first, it usually correctly processes it, updating an existing ARP entry with the new lladdr. Then the second reply is usually ignored because they happen in span of locktime interval since the previous update. The third one is also ignored because while the second reply is generally ignored, it still triggers bump of neigh->updated field that is used to determine if an ARP frame is in locktime interval. Since the first reply was honoured, that doesn't render a problem. The problem happens when the first gratuitous ARP is ignored because it's also in locktime interval. This may happen either because another ARP reply arrived just prior to the gratuitous one (correct) or if kernel transitioned the ARP table entry to another state just before the reply was received. Any state transition triggers neigh->updated bump. If such a state transition happens, all of gratuitous updates will be ignored, and then ARP entry will be left with wrong old MAC address. After delay time (which is 5s by default), kernel will usually issue an ARP probe that will hopefully heal the ARP entry. While that's mostly ok, we just wasted 5s of service availability + ARP probe round-trip. The problem is aggravated by the fact that kernel sometimes proliferate incorrect ARP entries without issuing a single probe in an unfortunate scenario, f.e. see: https://bugzilla.redhat.com/show_bug.cgi?id=1450203 Version-Release number of selected component (if applicable): 3.10.0-514.22.1.el7 How reproducible: always. Steps to Reproduce: issue 3 gratuitous ARP replies right after corresponding ARP entry transitioned STALE->DELAY. Observe that not a single reply is honored. Actual results: not a single consequent gratuitous ARP triggers an update in local ARP table if the first one arrives just before entry state changed. Expected results: Ideally, the first reply is honoured, because we haven't received any ARP reply before, so locktime should not be effective. At the very least, second reply should be honoured. Additional info: I posted a fix upstream: https://patchwork.ozlabs.org/patch/760372/ This bug + https://bugzilla.redhat.com/show_bug.cgi?id=1450203 are causes for OpenStack CI failures: https://bugzilla.redhat.com/show_bug.cgi?id=1438662
Marked the bug for OpenStack layered product since it affects OpenStack CI. I also asked to target for 7.3 because we probably can't wait for 7.5 (?) to fix OpenStack CI (we have some OpenStack side workarounds but they are very fragile).
Note: the patch was accepted by David Miller, see: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=77d7123342dcf6442341b67816321d71da8b2b16
Ihar, Is there any reason to not also include your follow up series? 776ee323ddf1 ("Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP'") 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP") d9ef2e7bf99f ("arp: postpone addr_type calculation to as late as possible") 6fd05633bdaf ("arp: decompose is_garp logic into a separate function") 34eb5fe07831 ("arp: fixed error in a comment")
Eric, I think it's a good idea, but the bug would be mostly fixed by the other patch, and I am not sure if rhel kernel policy allows backporting nice-to-haves. It will definitely help dealing with gARPs, at least to do it more efficiently.
steps to reproduce presented in description and set ack+
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Patch(es) available on kernel-3.10.0-764.el7
test topo: # br0 # Host A --|-- Host B # 0.1 0.2 # 2000::1 2000::2 test setting: # increase locktime for ease to reproduce ip netns exec ha sysctl -w net.ipv4.neigh.ha_veth0.locktime=10000 reproduced on 3.10.0-760: [root@ibm-x3650m4-01-vm-05 ~]# uname -a Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-760.el7.x86_64 #1 SMP Fri Oct 27 07:23:03 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux 1. confirmed that neigh on ha is stale: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 2. change mac on hb: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address 6e:03:b7:8d:e1:12 3. ping hb on ha: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. --- 192.168.0.2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms 4. send gratuitous ARP on hb when neigh state changed to DELAY: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2 ARPING 192.168.0.2 from 192.168.0.2 hb_veth0 Sent 3 probes (3 broadcast(s)) Received 0 response(s) 5. watch the neigh state on ha [root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE <=== become DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY <=== DELAY to PROBE, gratuitous ARP not honored 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE 192.168.0.2 dev ha_veth0 FAILED verified on 3.10.0-766: [root@ibm-x3650m4-01-vm-05 ~]# uname -a Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-766.el7.x86_64 #1 SMP Wed Nov 1 07:08:44 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux 1. confirmed that neigh on ha is stale: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 2. change mac on hb: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address e2:6d:81:cc:30:13 3. ping hb on ha: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. --- 192.168.0.2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms 4. send gratuitous ARP on hb when neigh state changed to DELAY: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2 ARPING 192.168.0.2 from 192.168.0.2 hb_veth0 Sent 3 probes (3 broadcast(s)) Received 0 response(s) 5. watch the neigh state on ha [root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE <=== to DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:13 STALE <=== updated after receive gratuitous ARP
reproducer failed on RHEL-7.4: https://beaker.engineering.redhat.com/jobs/2122415 passed on 3.10.0-766: https://beaker.engineering.redhat.com/jobs/2122416
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1062