Bug 1450205

Summary: Gratuitous ARP updates received in span of 2-3 seconds time frame are all ignored
Product: Red Hat Enterprise Linux 7 Reporter: Ihar Hrachyshka <ihrachys>
Component: kernelAssignee: Eric Garver <egarver>
kernel sub component: arp/icmp QA Contact: Jianlin Shi <jishi>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aloughla, atragler, ihrachys, jishi, rkhan, sukulkar
Version: 7.3   
Target Milestone: rc   
Target Release: 7.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-764.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1554608 (view as bug list) Environment:
Last Closed: 2018-04-10 20:01:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1438662    

Description Ihar Hrachyshka 2017-05-11 19:55:31 UTC
OpenStack Neutron L3 agent sends 3 gratuitous ARP replies when moving a "floating" IP address from one device to another. When Linux kernel receives the first, it usually correctly processes it, updating an existing ARP entry with the new lladdr. Then the second reply is usually ignored because they happen in span of locktime interval since the previous update. The third one is also ignored because while the second reply is generally ignored, it still triggers bump of neigh->updated field that is used to determine if an ARP frame is in locktime interval. Since the first reply was honoured, that doesn't render a problem.

The problem happens when the first gratuitous ARP is ignored because it's also in locktime interval. This may happen either because another ARP reply arrived just prior to the gratuitous one (correct) or if kernel transitioned the ARP table entry to another state just before the reply was received. Any state transition triggers neigh->updated bump. If such a state transition happens, all of gratuitous updates will be ignored, and then ARP entry will be left with wrong old MAC address. After delay time (which is 5s by default), kernel will usually issue an ARP probe that will hopefully heal the ARP entry. While that's mostly ok, we just wasted 5s of service availability + ARP probe round-trip.

The problem is aggravated by the fact that kernel sometimes proliferate incorrect ARP entries without issuing a single probe in an unfortunate scenario, f.e. see: https://bugzilla.redhat.com/show_bug.cgi?id=1450203

Version-Release number of selected component (if applicable): 3.10.0-514.22.1.el7

How reproducible: always.

Steps to Reproduce:
issue 3 gratuitous ARP replies right after corresponding ARP entry transitioned STALE->DELAY. Observe that not a single reply is honored.

Actual results: not a single consequent gratuitous ARP triggers an update in local ARP table if the first one arrives just before entry state changed.

Expected results: Ideally, the first reply is honoured, because we haven't received any ARP reply before, so locktime should not be effective. At the very least, second reply should be honoured.

Additional info:

I posted a fix upstream: https://patchwork.ozlabs.org/patch/760372/
This bug + https://bugzilla.redhat.com/show_bug.cgi?id=1450203 are causes for OpenStack CI failures: https://bugzilla.redhat.com/show_bug.cgi?id=1438662

Comment 2 Ihar Hrachyshka 2017-05-16 16:15:02 UTC
Marked the bug for OpenStack layered product since it affects OpenStack CI. I also asked to target for 7.3 because we probably can't wait for 7.5 (?) to fix OpenStack CI (we have some OpenStack side workarounds but they are very fragile).

Comment 3 Ihar Hrachyshka 2017-05-17 17:15:42 UTC
Note: the patch was accepted by David Miller, see: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=77d7123342dcf6442341b67816321d71da8b2b16

Comment 4 Eric Garver 2017-09-21 18:54:03 UTC
Ihar,

Is there any reason to not also include your follow up series?

776ee323ddf1 ("Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP'")
7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP")
d9ef2e7bf99f ("arp: postpone addr_type calculation to as late as possible")
6fd05633bdaf ("arp: decompose is_garp logic into a separate function")
34eb5fe07831 ("arp: fixed error in a comment")

Comment 5 Ihar Hrachyshka 2017-09-26 13:55:07 UTC
Eric, I think it's a good idea, but the bug would be mostly fixed by the other patch, and I am not sure if rhel kernel policy allows backporting nice-to-haves. It will definitely help dealing with gARPs, at least to do it more efficiently.

Comment 6 Jianlin Shi 2017-10-18 02:24:00 UTC
steps to reproduce presented in description and set ack+

Comment 7 Rafael Aquini 2017-10-31 12:32:43 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 9 Rafael Aquini 2017-11-01 10:34:37 UTC
Patch(es) available on kernel-3.10.0-764.el7

Comment 11 Jianlin Shi 2017-11-02 02:49:14 UTC
test topo:
#         br0
# Host A --|-- Host B
#   0.1          0.2
# 2000::1     2000::2

test setting:

# increase locktime for ease to reproduce
ip netns exec ha sysctl -w net.ipv4.neigh.ha_veth0.locktime=10000


reproduced on 3.10.0-760:

[root@ibm-x3650m4-01-vm-05 ~]# uname -a
Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-760.el7.x86_64 #1 SMP Fri Oct 27 07:23:03 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

1. confirmed that neigh on ha is stale:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE

2. change mac on hb:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address 6e:03:b7:8d:e1:12

3. ping hb on ha:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1                         
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

--- 192.168.0.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

4. send gratuitous ARP on hb when neigh state changed to DELAY:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2
ARPING 192.168.0.2 from 192.168.0.2 hb_veth0
Sent 3 probes (3 broadcast(s))
Received 0 response(s)

5. watch the neigh state on ha
[root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE

<=== become DELAY

192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY

<=== DELAY to PROBE, gratuitous ARP not honored

192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE
192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE
192.168.0.2 dev ha_veth0  FAILED


verified on 3.10.0-766:

[root@ibm-x3650m4-01-vm-05 ~]# uname -a
Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-766.el7.x86_64 #1 SMP Wed Nov 1 07:08:44 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

1. confirmed that neigh on ha is stale:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE

2. change mac on hb:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address e2:6d:81:cc:30:13

3. ping hb on ha:
[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.

--- 192.168.0.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

4. send gratuitous ARP on hb when neigh state changed to DELAY:

[root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2
ARPING 192.168.0.2 from 192.168.0.2 hb_veth0
Sent 3 probes (3 broadcast(s))
Received 0 response(s)

5. watch the neigh state on ha

[root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done       
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE

<=== to DELAY

192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY
192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:13 STALE

<=== updated after receive gratuitous ARP

Comment 12 Jianlin Shi 2017-11-02 07:50:06 UTC
reproducer failed on RHEL-7.4:
https://beaker.engineering.redhat.com/jobs/2122415

passed on 3.10.0-766:
https://beaker.engineering.redhat.com/jobs/2122416

Comment 13 errata-xmlrpc 2018-04-10 20:01:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062