Bug 1450203
Summary: | Irrelevant upper layer protocol traffic may erroneously "confirm" neigh entries | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ihar Hrachyshka <ihrachys> |
Component: | kernel | Assignee: | Lance Richardson <lrichard> |
kernel sub component: | arp/icmp | QA Contact: | Jianlin Shi <jishi> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aloughla, atragler, dgilbert, ealcaniz, ihrachys, jiji, lmiksik, lrichard, oblaut, sukulkar |
Version: | 7.3 | ||
Target Milestone: | rc | ||
Target Release: | 7.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 07:31:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1438662 |
Description
Ihar Hrachyshka
2017-05-11 19:38:46 UTC
(In reply to Ihar Hrachyshka from comment #0) > Final note: this situation may render IP address roaming/failover not > effective "thanks" to a specially crafted traffic that happen to arrive the > same node. Which begs the question whether there is a reason to track the > bug as security related. That's why I am leaving the bug report closed from > public for now. I don't think this makes things any worse, I think the attacker would need to use address spoofing to execute such an attack, and the effect of this would be similar to that of other address spoofing attacks. No you don't need to spoof anything to render connectivity between a node X and a service IP1. Just make sure that there is special traffic to node X that makes it confirm ARP entry for IP1 over and over (that seems to be any traffic from same network subnet on the same l2 domain to the node X); and then just wait for failover for IP1 to occur. Once it occurs, X can't ever get out of confirmation loop and restore connectivity to IP1. It affects OpenStack CI and customers. We would like to ask to backport the series back to 7.3 if possible. Marking the bug as a blocker for 7.4. The rationale is as follows: 1. the bug affects OpenStack CI. We patched OpenStack Neutron L3 agent a bit to reduce the risk of failure, but it is not a complete solution. 2. the bug affects production environments (see attached customer case) that are not even relying on Neutron L3 agent gratuitous ARPs. The effect of the bug is that connectivity between two nodes on the same network segment may become broken and stay that way for indefinite time, which is a big deal, and one may even argue that's worth a special security concern. I understand we are late in 7.4 timeframe. To explain why the bug pops up now only: we set up and started debugging OpenStack CI jobs that could trigger the failure mode several months ago, and spent a lot of time to dig to the point where we realized that's a kernel issue and not OpenStack. (Actually, a set of issues that all combined render the CI jobs totally broken.) Ideally, we would see that in 7.3 too, but at least 7.4 would be a good start. I understand that the series of patches has significant impact and so we need to be cautious about cost-benefit. That being said, the benefit of production environments not breaking on IP failover sounds like a significant one to me. Posted: http://post-office.corp.redhat.com/archives/rhkernel-list/2017-May/msg01774.html Corresponding Brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13231449 There seems to be a decision that the bug doesn't justify special handling security wise. For this reason, I open the description and relevant comments to the public. set qa_ack based on comment 11 Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-678.el7 Ofer, please advise on how we can test the new kernel in scope of OSP11 CI. reproducer: [root@ibm-x3650m4-04 arp_test]# cat repo.sh #!/bin/bash ip netns add host1 ip netns add host2 ip netns add host3 brctl addbr br0 ip link add veth1 type veth peer name veth1_br ip link add veth2 type veth peer name veth2_br ip link add veth3 type veth peer name veth3_br ip link set veth1 netns host1 ip link set veth2 netns host2 ip link set veth3 netns host3 brctl addif br0 veth1_br brctl addif br0 veth2_br brctl addif br0 veth3_br ip netns exec host1 ip link set lo up ip netns exec host1 ip link set veth1 up ip netns exec host1 ip addr add 192.168.1.1/24 dev veth1 ip netns exec host1 ip addr add 2000::1/64 dev veth1 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set veth2 up ip netns exec host2 ip addr add 192.168.1.2/24 dev veth2 ip netns exec host2 ip addr add 2000::2/64 dev veth2 ip netns exec host3 ip link set lo up ip netns exec host3 ip link set veth3 up ip netns exec host3 ip addr add 192.168.1.3/24 dev veth3 ip netns exec host3 ip addr add 2000::3/64 dev veth3 ip link set br0 up ip link set veth1_br up ip link set veth2_br up ip link set veth3_br up ip netns exec host3 nc -l -k 10010 & sleep 1 ip netns exec host1 taskset --cpu-list 0 nc 192.168.1.3 10010 < /dev/zero & sleep 5 echo "host3 neigh setup" ip netns exec host1 ip neigh show ip netns exec host2 nc -l 10010 & sleep 1 ip netns exec host1 taskset --cpu-list 0 nc 192.168.1.2 10010 & sleep 5 echo "host2 neigh setup" ip netns exec host1 ip neigh show echo "down host2 and change ip to host3" ip netns exec host2 ip link set veth2 down ip netns exec host3 ip addr add 192.168.1.2/24 dev veth3 ip netns exec host3 ip addr sh sleep 60 echo "host2 neigh stale" ip netns exec host1 ip neigh show echo "send packet to host2 on host1" #ip netns exec host1 taskset --cpu-list 1 nc 192.168.1.2 10010 & ip netns exec host1 taskset --cpu-list 0 ping 192.168.1.2 -c 1 -W 1 -w 1 sleep 1 i=0 while [ $i -lt 15 ] do ip netns exec host1 ip neigh show echo "" ip netns exec host1 taskset --cpu-list 0 ping 192.168.1.2 -c 1 -W 1 -w 1 let i+=1 done jobs -p | xargs kill -9 killall -9 nc reproduced on 3.10.0-675: [root@ibm-x3650m4-04 arp_test]# uname -a Linux ibm-x3650m4-04.rhts.eng.pek2.redhat.com 3.10.0-675.el7.x86_64 #1 SMP Mon May 29 23:22:32 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@ibm-x3650m4-04 arp_test]# ./repo.sh host3 neigh setup 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE host2 neigh setup 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c REACHABLE 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE down host2 and change ip to host3 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 13: veth3@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether c6:1d:84:d1:09:54 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.1.3/24 scope global veth3 valid_lft forever preferred_lft forever inet 192.168.1.2/24 scope global secondary veth3 valid_lft forever preferred_lft forever inet6 2000::3/64 scope global valid_lft forever preferred_lft forever inet6 fe80::c41d:84ff:fed1:954/64 scope link valid_lft forever preferred_lft forever host2 neigh stale 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c STALE 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE send packet to host2 on host1 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c REACHABLE 192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE <===== mac for 192.168.1.2 not updated Verified on 3.10.0-679: [root@ibm-x3650m4-04 arp_test]# uname -a Linux ibm-x3650m4-04.rhts.eng.pek2.redhat.com 3.10.0-679.el7.x86_64 #1 SMP Mon Jun 5 23:13:08 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@ibm-x3650m4-04 arp_test]# ./repo.sh host3 neigh setup 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE host2 neigh setup 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 REACHABLE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE down host2 and change ip to host3 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 83: veth3@if82: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether b2:e4:26:21:6f:48 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.1.3/24 scope global veth3 valid_lft forever preferred_lft forever inet 192.168.1.2/24 scope global secondary veth3 valid_lft forever preferred_lft forever inet6 2000::3/64 scope global valid_lft forever preferred_lft forever inet6 fe80::b0e4:26ff:fe21:6f48/64 scope link valid_lft forever preferred_lft forever host2 neigh stale 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 STALE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE send packet to host2 on host1 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. --- 192.168.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms 192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.049 ms --- 192.168.1.2 ping statistics --- 2 packets transmitted, 1 received, 50% packet loss, time 999ms rtt min/avg/max/mdev = 0.049/0.049/0.049/0.000 ms 192.168.1.2 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE 192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE <==== mac for 192.168.1.2 updated For the recond, we did testing of the new kernel in OSP environment, and it fixed the CI issue we experienced. The first kernel version with the backported fix for this BZ was -678, which can be found here: http://download-node-02.eng.bos.redhat.com/brewroot/packages/kernel/3.10.0/678.el7/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |