Bug 1450205 - Gratuitous ARP updates received in span of 2-3 seconds time frame are all ignored
Gratuitous ARP updates received in span of 2-3 seconds time frame are all ign...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.3
Unspecified Unspecified
medium Severity medium
: rc
: 7.3
Assigned To: Eric Garver
Jianlin Shi
:
Depends On:
Blocks: 1438662
  Show dependency treegraph
 
Reported: 2017-05-11 15:55 EDT by Ihar Hrachyshka
Modified: 2018-04-10 16:03 EDT (History)
6 users (show)

See Also:
Fixed In Version: kernel-3.10.0-764.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1554608 (view as bug list)
Environment:
Last Closed: 2018-04-10 16:01:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1062 None None None 2018-04-10 16:03 EDT

  None (edit)
Description Ihar Hrachyshka 2017-05-11 15:55:31 EDT
OpenStack Neutron L3 agent sends 3 gratuitous ARP replies when moving a "floating" IP address from one device to another. When Linux kernel receives the first, it usually correctly processes it, updating an existing ARP entry with the new lladdr. Then the second reply is usually ignored because they happen in span of locktime interval since the previous update. The third one is also ignored because while the second reply is generally ignored, it still triggers bump of neigh->updated field that is used to determine if an ARP frame is in locktime interval. Since the first reply was honoured, that doesn't render a problem.

The problem happens when the first gratuitous ARP is ignored because it's also in locktime interval. This may happen either because another ARP reply arrived just prior to the gratuitous one (correct) or if kernel transitioned the ARP table entry to another state just before the reply was received. Any state transition triggers neigh->updated bump. If such a state transition happens, all of gratuitous updates will be ignored, and then ARP entry will be left with wrong old MAC address. After delay time (which is 5s by default), kernel will usually issue an ARP probe that will hopefully heal the ARP entry. While that's mostly ok, we just wasted 5s of service availability + ARP probe round-trip.

The problem is aggravated by the fact that kernel sometimes proliferate incorrect ARP entries without issuing a single probe in an unfortunate scenario, f.e. see: https://bugzilla.redhat.com/show_bug.cgi?id=1450203

Version-Release number of selected component (if applicable): 3.10.0-514.22.1.el7

How reproducible: always.

Steps to Reproduce:
issue 3 gratuitous ARP replies right after corresponding ARP entry transitioned STALE->DELAY. Observe that not a single reply is honored.

Actual results: not a single consequent gratuitous ARP triggers an update in local ARP table if the first one arrives just before entry state changed.

Expected results: Ideally, the first reply is honoured, because we haven't received any ARP reply before, so locktime should not be effective. At the very least, second reply should be honoured.

Additional info:

I posted a fix upstream: https://patchwork.ozlabs.org/patch/760372/
This bug + https://bugzilla.redhat.com/show_bug.cgi?id=1450203 are causes for OpenStack CI failures: https://bugzilla.redhat.com/show_bug.cgi?id=1438662
Comment 2 Ihar Hrachyshka 2017-05-16 12:15:02 EDT
Marked the bug for OpenStack layered product since it affects OpenStack CI. I also asked to target for 7.3 because we probably can't wait for 7.5 (?) to fix OpenStack CI (we have some OpenStack side workarounds but they are very fragile).
Comment 3 Ihar Hrachyshka 2017-05-17 13:15:42 EDT
Note: the patch was accepted by David Miller, see: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=77d7123342dcf6442341b67816321d71da8b2b16
Comment 4 Eric Garver 2017-09-21 14:54:03 EDT
Ihar,

Is there any reason to not also include your follow up series?

776ee323ddf1 ("Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP'")
7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP")
d9ef2e7bf99f ("arp: postpone addr_type calculation to as late as possible")
6fd05633bdaf ("arp: decompose is_garp logic into a separate function")
34eb5fe07831 ("arp: fixed error in a comment")
Comment 5 Ihar Hrachyshka 2017-09-26 09:55:07 EDT
Eric, I think it's a good idea, but the bug would be mostly fixed by the other patch, and I am not sure if rhel kernel policy allows backporting nice-to-haves. It will definitely help dealing with gARPs, at least to do it more efficiently.
Comment 6 Jianlin Shi 2017-10-17 22:24:00 EDT
steps to reproduce presented in description and set ack+
Comment 7 Rafael Aquini 2017-10-31 08:32:43 EDT
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Comment 9 Rafael Aquini 2017-11-01 06:34:37 EDT
Patch(es) available on kernel-3.10.0-764.el7
Comment 11 Jianlin Shi 2017-11-01 22:49:14 EDT
test topo:
#         br0
# Host A --|-- Host B
#   0.1          0.2
# 2000::1     2000::2

test setting:

# increase locktime for ease to reproduce
ip netns exec ha sysctl -w net.ipv4.neigh.ha_veth0.locktime=10000


reproduced on 3.10.0-760:

[root@ibm-x3650m4-01-vm-05 ~]# uname -a
Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-760.el7.x86_64 #1 SMP Fri Oct 27 07:23:03 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

1. confirmed that neigh on ha is stale:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE

2. change mac on hb:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address 6e:03:b7:8d:e1:12

3. ping hb on ha:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1                         
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

--- 192.168.0.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

4. send gratuitous ARP on hb when neigh state changed to DELAY:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2
ARPING 192.168.0.2 from 192.168.0.2 hb_veth0
Sent 3 probes (3 broadcast(s))
Received 0 response(s)

5. watch the neigh state on ha
[root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE

<=== become DELAY

192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY

<=== DELAY to PROBE, gratuitous ARP not honored

192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE
192.168.0.2 dev ha_veth0  FAILED


verified on 3.10.0-766:

[root@ibm-x3650m4-01-vm-05 ~]# uname -a
Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-766.el7.x86_64 #1 SMP Wed Nov 1 07:08:44 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

1. confirmed that neigh on ha is stale:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE

2. change mac on hb:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address e2:6d:81:cc:30:13

3. ping hb on ha:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

--- 192.168.0.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

4. send gratuitous ARP on hb when neigh state changed to DELAY:

[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2
ARPING 192.168.0.2 from 192.168.0.2 hb_veth0
Sent 3 probes (3 broadcast(s))
Received 0 response(s)

5. watch the neigh state on ha

[root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done       
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE

<=== to DELAY

192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:13 STALE

<=== updated after receive gratuitous ARP
Comment 12 Jianlin Shi 2017-11-02 03:50:06 EDT
reproducer failed on RHEL-7.4:
https://beaker.engineering.redhat.com/jobs/2122415

passed on 3.10.0-766:
https://beaker.engineering.redhat.com/jobs/2122416
Comment 13 errata-xmlrpc 2018-04-10 16:01:25 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062

Note You need to log in before you can comment on or make changes to this bug.