Bug 1450203
| Summary: | Irrelevant upper layer protocol traffic may erroneously "confirm" neigh entries | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ihar Hrachyshka <ihrachys> |
| Component: | kernel | Assignee: | Lance Richardson <lrichard> |
| kernel sub component: | arp/icmp | QA Contact: | Jianlin Shi <jishi> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aloughla, atragler, dgilbert, ealcaniz, ihrachys, jiji, lmiksik, lrichard, oblaut, sukulkar |
| Version: | 7.3 | ||
| Target Milestone: | rc | ||
| Target Release: | 7.4 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-02 07:31:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1438662 | ||
|
Description
Ihar Hrachyshka
2017-05-11 19:38:46 UTC
(In reply to Ihar Hrachyshka from comment #0) > Final note: this situation may render IP address roaming/failover not > effective "thanks" to a specially crafted traffic that happen to arrive the > same node. Which begs the question whether there is a reason to track the > bug as security related. That's why I am leaving the bug report closed from > public for now. I don't think this makes things any worse, I think the attacker would need to use address spoofing to execute such an attack, and the effect of this would be similar to that of other address spoofing attacks. No you don't need to spoof anything to render connectivity between a node X and a service IP1. Just make sure that there is special traffic to node X that makes it confirm ARP entry for IP1 over and over (that seems to be any traffic from same network subnet on the same l2 domain to the node X); and then just wait for failover for IP1 to occur. Once it occurs, X can't ever get out of confirmation loop and restore connectivity to IP1. It affects OpenStack CI and customers. We would like to ask to backport the series back to 7.3 if possible. Marking the bug as a blocker for 7.4. The rationale is as follows: 1. the bug affects OpenStack CI. We patched OpenStack Neutron L3 agent a bit to reduce the risk of failure, but it is not a complete solution. 2. the bug affects production environments (see attached customer case) that are not even relying on Neutron L3 agent gratuitous ARPs. The effect of the bug is that connectivity between two nodes on the same network segment may become broken and stay that way for indefinite time, which is a big deal, and one may even argue that's worth a special security concern. I understand we are late in 7.4 timeframe. To explain why the bug pops up now only: we set up and started debugging OpenStack CI jobs that could trigger the failure mode several months ago, and spent a lot of time to dig to the point where we realized that's a kernel issue and not OpenStack. (Actually, a set of issues that all combined render the CI jobs totally broken.) Ideally, we would see that in 7.3 too, but at least 7.4 would be a good start. I understand that the series of patches has significant impact and so we need to be cautious about cost-benefit. That being said, the benefit of production environments not breaking on IP failover sounds like a significant one to me. Posted: http://post-office.corp.redhat.com/archives/rhkernel-list/2017-May/msg01774.html Corresponding Brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13231449 There seems to be a decision that the bug doesn't justify special handling security wise. For this reason, I open the description and relevant comments to the public. set qa_ack based on comment 11 Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-678.el7 Ofer, please advise on how we can test the new kernel in scope of OSP11 CI. reproducer:
[root@ibm-x3650m4-04 arp_test]# cat repo.sh
#!/bin/bash
ip netns add host1
ip netns add host2
ip netns add host3
brctl addbr br0
ip link add veth1 type veth peer name veth1_br
ip link add veth2 type veth peer name veth2_br
ip link add veth3 type veth peer name veth3_br
ip link set veth1 netns host1
ip link set veth2 netns host2
ip link set veth3 netns host3
brctl addif br0 veth1_br
brctl addif br0 veth2_br
brctl addif br0 veth3_br
ip netns exec host1 ip link set lo up
ip netns exec host1 ip link set veth1 up
ip netns exec host1 ip addr add 192.168.1.1/24 dev veth1
ip netns exec host1 ip addr add 2000::1/64 dev veth1
ip netns exec host2 ip link set lo up
ip netns exec host2 ip link set veth2 up
ip netns exec host2 ip addr add 192.168.1.2/24 dev veth2
ip netns exec host2 ip addr add 2000::2/64 dev veth2
ip netns exec host3 ip link set lo up
ip netns exec host3 ip link set veth3 up
ip netns exec host3 ip addr add 192.168.1.3/24 dev veth3
ip netns exec host3 ip addr add 2000::3/64 dev veth3
ip link set br0 up
ip link set veth1_br up
ip link set veth2_br up
ip link set veth3_br up
ip netns exec host3 nc -l -k 10010 &
sleep 1
ip netns exec host1 taskset --cpu-list 0 nc 192.168.1.3 10010 < /dev/zero &
sleep 5
echo "host3 neigh setup"
ip netns exec host1 ip neigh show
ip netns exec host2 nc -l 10010 &
sleep 1
ip netns exec host1 taskset --cpu-list 0 nc 192.168.1.2 10010 &
sleep 5
echo "host2 neigh setup"
ip netns exec host1 ip neigh show
echo "down host2 and change ip to host3"
ip netns exec host2 ip link set veth2 down
ip netns exec host3 ip addr add 192.168.1.2/24 dev veth3
ip netns exec host3 ip addr sh
sleep 60
echo "host2 neigh stale"
ip netns exec host1 ip neigh show
echo "send packet to host2 on host1"
#ip netns exec host1 taskset --cpu-list 1 nc 192.168.1.2 10010 &
ip netns exec host1 taskset --cpu-list 0 ping 192.168.1.2 -c 1 -W 1 -w 1
sleep 1
i=0
while [ $i -lt 15 ]
do
ip netns exec host1 ip neigh show
echo ""
ip netns exec host1 taskset --cpu-list 0 ping 192.168.1.2 -c 1 -W 1 -w 1
let i+=1
done
jobs -p | xargs kill -9
killall -9 nc
reproduced on 3.10.0-675:
[root@ibm-x3650m4-04 arp_test]# uname -a
Linux ibm-x3650m4-04.rhts.eng.pek2.redhat.com 3.10.0-675.el7.x86_64 #1 SMP Mon May 29 23:22:32 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@ibm-x3650m4-04 arp_test]# ./repo.sh
host3 neigh setup
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
host2 neigh setup
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c REACHABLE
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
down host2 and change ip to host3
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
13: veth3@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether c6:1d:84:d1:09:54 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.3/24 scope global veth3
valid_lft forever preferred_lft forever
inet 192.168.1.2/24 scope global secondary veth3
valid_lft forever preferred_lft forever
inet6 2000::3/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::c41d:84ff:fed1:954/64 scope link
valid_lft forever preferred_lft forever
host2 neigh stale
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c STALE
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
send packet to host2 on host1
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c DELAY
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr 16:07:5a:37:06:2c REACHABLE
192.168.1.3 dev veth1 lladdr c6:1d:84:d1:09:54 REACHABLE
<===== mac for 192.168.1.2 not updated
Verified on 3.10.0-679:
[root@ibm-x3650m4-04 arp_test]# uname -a
Linux ibm-x3650m4-04.rhts.eng.pek2.redhat.com 3.10.0-679.el7.x86_64 #1 SMP Mon Jun 5 23:13:08 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@ibm-x3650m4-04 arp_test]# ./repo.sh
host3 neigh setup
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
host2 neigh setup
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 REACHABLE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
down host2 and change ip to host3
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
83: veth3@if82: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether b2:e4:26:21:6f:48 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.3/24 scope global veth3
valid_lft forever preferred_lft forever
inet 192.168.1.2/24 scope global secondary veth3
valid_lft forever preferred_lft forever
inet6 2000::3/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::b0e4:26ff:fe21:6f48/64 scope link
valid_lft forever preferred_lft forever
host2 neigh stale
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 STALE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
send packet to host2 on host1
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 DELAY
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
192.168.1.2 dev veth1 lladdr f6:ca:a5:2e:a1:79 PROBE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.049 ms
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 999ms
rtt min/avg/max/mdev = 0.049/0.049/0.049/0.000 ms
192.168.1.2 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
192.168.1.3 dev veth1 lladdr b2:e4:26:21:6f:48 REACHABLE
<==== mac for 192.168.1.2 updated
For the recond, we did testing of the new kernel in OSP environment, and it fixed the CI issue we experienced. The first kernel version with the backported fix for this BZ was -678, which can be found here: http://download-node-02.eng.bos.redhat.com/brewroot/packages/kernel/3.10.0/678.el7/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1842 |