Description of problem: We were seeing a weird issue in Fedora Atomic host where the kubernetes dns addon is not properly routing traffic from the kube-dns "service" to the backend pods that are hosting dns. I narrowed this problem down to a single atomic host update from `6de4ed0b6f63a030d00e65eb986abb39f230c134785ee965ab489884c24f7fa3` to `81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96`. Here is the diff between the two: ``` -bash-4.3# rpm-ostree deploy 81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96 Validating checksum '81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96' 774 metadata, 3207 content objects fetched; 223427 KiB transferred in 149 seconds Copying /etc changes: 36 modified, 0 removed, 115 added Transaction complete; bootconfig swap: yes deployment count change: 0 Freed objects: 217.2 MB Changed: bind99-libs 9.9.9-4.P4.fc25 -> 9.9.9-4.P5.fc25 bind99-license 9.9.9-4.P4.fc25 -> 9.9.9-4.P5.fc25 ca-certificates 2016.2.10-1.0.fc25 -> 2017.2.11-1.0.fc25 ceph-common 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 container-selinux 2:1.12.2-5.git8f1975c.fc25 -> 2:2.2-2.fc25 docker 2:1.12.2-5.git8f1975c.fc25 -> 2:1.12.6-3.git51ef5a8.fc25 docker-common 2:1.12.2-5.git8f1975c.fc25 -> 2:1.12.6-3.git51ef5a8.fc25 kernel 4.8.16-300.fc25 -> 4.9.3-200.fc25 kernel-core 4.8.16-300.fc25 -> 4.9.3-200.fc25 kernel-modules 4.8.16-300.fc25 -> 4.9.3-200.fc25 libcephfs1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 librados2 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 libradosstriper1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 librbd1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 librgw2 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 pcre 8.39-6.fc25 -> 8.40-1.fc25 python-cephfs 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 python-rados 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 python-rbd 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25 systemd 231-10.fc25 -> 231-11.fc25 systemd-container 231-10.fc25 -> 231-11.fc25 systemd-libs 231-10.fc25 -> 231-11.fc25 systemd-pam 231-10.fc25 -> 231-11.fc25 systemd-udev 231-10.fc25 -> 231-11.fc25 xfsprogs 4.5.0-2.fc25 -> 4.9.0-1.fc25 Removed: lz4-1.7.5-1.fc25.x86_64 Run "systemctl reboot" to start a reboot ``` I then narrowed this down to just an issue with the kernel by booting the 4.8.16-300.fc25 kernel on the `81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96` tree, which allows us to leave everything else equal. With the 4.8 kernel there is no problem. With the 4.9 kernel, kube-dns does not work. I don't see any obvious messages in the logs that indicate what the problem is. I am hoping I can get some experts to help me narrow down the problem. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. rpmostree deploy 6de4ed0b6f63a030d00e65eb986abb39f230c134785ee965ab489884c24f7fa3 2. run kubernetes ansible against the host: https://github.com/kubernetes/contrib/tree/master/ansible 3. verify kubedns works: https://github.com/kubernetes/kubernetes/tree/release-1.2/cluster/addons/dns#how-do-i-test-if-it-is-working 4. deploy the next commit in the sequence: `rpm-ostree deploy 81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96` 5. reboot and verify dns is not working Additional info: Please find me (dustymabe in #fedora-cloud or #atomic on freenode) to discuss details.
Created attachment 1241888 [details] journal-from-4.8.16.txt.gz
Created attachment 1241889 [details] journal-from-4.9.3.txt.gz
Note that this system has selinux disabled because of https://bugzilla.redhat.com/show_bug.cgi?id=1414096. Jason brooks has confirmed that the same behavior happens on a system with newer kubernetes (1.5 with a fix for the selinux issue) and selinux enforcing.
I tested on f25 w/ the 4.10.0-0.rc4.git0.1.fc26.x86_64 kernel and kube 1.5.1, and kube-dns works as expected, w/ selinux enforcing.
So this was fun and definitely a kernel regression of some sort. I'll attach the iptables rules of the node. They are exactly (modulo the generated chain names) the same between a workin 4.8 and a broken 4.9 kernel. In the iptables rules I'm about to attach we have 1 container with ip addr 172.16.35.3. I try to run 'dig' from that container. We have another container 172.16.35.2. It is running DNS. We have a completely virtual ip address/udp port 10.254.0.10:53. Any traffic to the virtual ip/port should get dnat'd to 172.16.35.2:53. On a 4.9 kernel listing on the host with `tcpdump -i any` I see: 17:22:24.273178 IP 172.16.35.2.49994 > 10.254.0.10.domain: 46023+ [1au] A? www.google.com. (43) Basically I see traffic from the 'dig' to the virutal ip/port. Nothing else. On a 4.8 kernel with the exact same setup and iptables rules, again with `tcpdump -i any` I see: 18:21:25.949497 IP 172.16.35.2.42645 > 10.254.0.10.domain: 54717+ [1au] A? www.google.com. (43) 18:21:25.949565 IP 172.16.35.2.42645 > 172.16.35.3.domain: 54717+ [1au] A? www.google.com. (43) 18:21:25.954133 IP 172.16.35.3.domain > 172.16.35.2.42645: 54717 1/0/1 A 216.58.219.68 (59) 18:21:25.954147 IP 10.254.0.10.domain > 172.16.35.2.42645: 54717 1/0/1 A 216.58.219.68 (59) Which is what we'd expect. I see the client->vip. Then I see a second packet that has been DNAT to the real destination. I see the return from the real destination and the reversal of the DNAT. An interesting thing I noticed when playing with tcpdump is that the host sees the first packet coming from somewhere different in 4.8 vs 4.9. In 4.8 I can do: `tcpdump -i docker0` and I see all 4 (expected) packets. In 4.9 listening only on docker0 shows NO traffic at all. Instead in 4.9 I can only see the single packet using: `tcpdump -i vethca4159a` docker0 is a linux bridge: # brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242cb34b484 no veth8ebe5b8 vethca4159a It is as if on 4.9 the frames are not coming off of the bridge and instead are coming directly off the veth and the packets are not going through iptables. I can relatively easily set up a reproducer for you or give you root access to a VM that reproduces the issue.
Created attachment 1241980 [details] full iptables ruleset that are not matching
I confirm that 4.10.0-0.rc4.git0.1.fc26.x86_64 is working for me. Packets are showing up on docker0 and are having iptables rules applied...
I'm also having success w/ this 4.9.4-202.rhbz1414068.fc25.x86_64 kernel: https://koji.fedoraproject.org/koji/taskinfo?taskID=17316113
I'll commit the fix to the repository. This should show up in the 4.9.5 kernel or another 4.9.4 build if that happens for some reason.
kernel-4.9.5-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e6012e74b6
kernel-4.9.5-100.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-18ce368ba3
kernel-4.9.5-100.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-18ce368ba3
kernel-4.9.5-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e6012e74b6
kernel-4.9.5-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.
kernel-4.9.5-100.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.