Bug 1450205
Summary: | Gratuitous ARP updates received in span of 2-3 seconds time frame are all ignored | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ihar Hrachyshka <ihrachys> | |
Component: | kernel | Assignee: | Eric Garver <egarver> | |
kernel sub component: | arp/icmp | QA Contact: | Jianlin Shi <jishi> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | aloughla, atragler, ihrachys, jishi, rkhan, sukulkar | |
Version: | 7.3 | |||
Target Milestone: | rc | |||
Target Release: | 7.3 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-764.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1554608 (view as bug list) | Environment: | ||
Last Closed: | 2018-04-10 20:01:25 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1438662 |
Description
Ihar Hrachyshka
2017-05-11 19:55:31 UTC
Marked the bug for OpenStack layered product since it affects OpenStack CI. I also asked to target for 7.3 because we probably can't wait for 7.5 (?) to fix OpenStack CI (we have some OpenStack side workarounds but they are very fragile). Note: the patch was accepted by David Miller, see: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=77d7123342dcf6442341b67816321d71da8b2b16 Ihar, Is there any reason to not also include your follow up series? 776ee323ddf1 ("Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP'") 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP") d9ef2e7bf99f ("arp: postpone addr_type calculation to as late as possible") 6fd05633bdaf ("arp: decompose is_garp logic into a separate function") 34eb5fe07831 ("arp: fixed error in a comment") Eric, I think it's a good idea, but the bug would be mostly fixed by the other patch, and I am not sure if rhel kernel policy allows backporting nice-to-haves. It will definitely help dealing with gARPs, at least to do it more efficiently. steps to reproduce presented in description and set ack+ Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-764.el7 test topo: # br0 # Host A --|-- Host B # 0.1 0.2 # 2000::1 2000::2 test setting: # increase locktime for ease to reproduce ip netns exec ha sysctl -w net.ipv4.neigh.ha_veth0.locktime=10000 reproduced on 3.10.0-760: [root@ibm-x3650m4-01-vm-05 ~]# uname -a Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-760.el7.x86_64 #1 SMP Fri Oct 27 07:23:03 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux 1. confirmed that neigh on ha is stale: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 2. change mac on hb: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address 6e:03:b7:8d:e1:12 3. ping hb on ha: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. --- 192.168.0.2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms 4. send gratuitous ARP on hb when neigh state changed to DELAY: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2 ARPING 192.168.0.2 from 192.168.0.2 hb_veth0 Sent 3 probes (3 broadcast(s)) Received 0 response(s) 5. watch the neigh state on ha [root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 STALE <=== become DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 DELAY <=== DELAY to PROBE, gratuitous ARP not honored 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE 192.168.0.2 dev ha_veth0 lladdr 6e:03:b7:8d:e1:11 PROBE 192.168.0.2 dev ha_veth0 FAILED verified on 3.10.0-766: [root@ibm-x3650m4-01-vm-05 ~]# uname -a Linux ibm-x3650m4-01-vm-05.lab.eng.bos.redhat.com 3.10.0-766.el7.x86_64 #1 SMP Wed Nov 1 07:08:44 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux 1. confirmed that neigh on ha is stale: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ip neigh show 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 2. change mac on hb: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb ip link set dev hb_veth0 address e2:6d:81:cc:30:13 3. ping hb on ha: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec ha ping 192.168.0.2 -c 1 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. --- 192.168.0.2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms 4. send gratuitous ARP on hb when neigh state changed to DELAY: [root@ibm-x3650m4-01-vm-05 ~]# ip netns exec hb arping -A -c 3 -I hb_veth0 -U 192.168.0.2 ARPING 192.168.0.2 from 192.168.0.2 hb_veth0 Sent 3 probes (3 broadcast(s)) Received 0 response(s) 5. watch the neigh state on ha [root@ibm-x3650m4-01-vm-05 ~]# while :; do ip netns exec ha ip neigh show; sleep 1; done 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 STALE <=== to DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:12 DELAY 192.168.0.2 dev ha_veth0 lladdr e2:6d:81:cc:30:13 STALE <=== updated after receive gratuitous ARP reproducer failed on RHEL-7.4: https://beaker.engineering.redhat.com/jobs/2122415 passed on 3.10.0-766: https://beaker.engineering.redhat.com/jobs/2122416 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1062 |