Bug 1414068 - 4.9.3-200 kernel causes kubernetes dns to not work
Summary: 4.9.3-200 kernel causes kubernetes dns to not work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1414468
TreeView+ depends on / blocked
 
Reported: 2017-01-17 16:30 UTC by Dusty Mabe
Modified: 2019-01-09 12:54 UTC (History)
12 users (show)

Fixed In Version: kernel-4.9.5-200.fc25 kernel-4.9.5-100.fc24
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1414468 (view as bug list)
Environment:
Last Closed: 2017-01-24 03:20:34 UTC
Type: Bug


Attachments (Terms of Use)
journal-from-4.8.16.txt.gz (245.98 KB, application/x-gzip)
2017-01-17 16:35 UTC, Dusty Mabe
no flags Details
journal-from-4.9.3.txt.gz (97.16 KB, application/x-gzip)
2017-01-17 16:35 UTC, Dusty Mabe
no flags Details
full iptables ruleset that are not matching (8.06 KB, text/plain)
2017-01-17 23:30 UTC, Eric Paris
no flags Details

Description Dusty Mabe 2017-01-17 16:30:04 UTC
Description of problem:

We were seeing a weird issue in Fedora Atomic host where the kubernetes dns addon is not properly routing traffic from the kube-dns "service" to the backend pods that are hosting dns. 

I narrowed this problem down to a single atomic host update from `6de4ed0b6f63a030d00e65eb986abb39f230c134785ee965ab489884c24f7fa3` to `81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96`. Here is the diff between the two:

```
-bash-4.3# rpm-ostree deploy 81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96 
Validating checksum '81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96'

774 metadata, 3207 content objects fetched; 223427 KiB transferred in 149 seconds                                                                                                                                                           
Copying /etc changes: 36 modified, 0 removed, 115 added
Transaction complete; bootconfig swap: yes deployment count change: 0
Freed objects: 217.2 MB
Changed:
  bind99-libs 9.9.9-4.P4.fc25 -> 9.9.9-4.P5.fc25
  bind99-license 9.9.9-4.P4.fc25 -> 9.9.9-4.P5.fc25
  ca-certificates 2016.2.10-1.0.fc25 -> 2017.2.11-1.0.fc25
  ceph-common 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  container-selinux 2:1.12.2-5.git8f1975c.fc25 -> 2:2.2-2.fc25
  docker 2:1.12.2-5.git8f1975c.fc25 -> 2:1.12.6-3.git51ef5a8.fc25
  docker-common 2:1.12.2-5.git8f1975c.fc25 -> 2:1.12.6-3.git51ef5a8.fc25
  kernel 4.8.16-300.fc25 -> 4.9.3-200.fc25
  kernel-core 4.8.16-300.fc25 -> 4.9.3-200.fc25
  kernel-modules 4.8.16-300.fc25 -> 4.9.3-200.fc25
  libcephfs1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  librados2 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  libradosstriper1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  librbd1 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  librgw2 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  pcre 8.39-6.fc25 -> 8.40-1.fc25
  python-cephfs 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  python-rados 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  python-rbd 1:10.2.4-1.fc25 -> 1:10.2.4-2.fc25
  systemd 231-10.fc25 -> 231-11.fc25
  systemd-container 231-10.fc25 -> 231-11.fc25
  systemd-libs 231-10.fc25 -> 231-11.fc25
  systemd-pam 231-10.fc25 -> 231-11.fc25
  systemd-udev 231-10.fc25 -> 231-11.fc25
  xfsprogs 4.5.0-2.fc25 -> 4.9.0-1.fc25
Removed:
  lz4-1.7.5-1.fc25.x86_64
Run "systemctl reboot" to start a reboot
```


I then narrowed this down to just an issue with the kernel by booting the 4.8.16-300.fc25 kernel on the `81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96` tree, which allows us to leave everything else equal. With the 4.8 kernel there is no problem. With the 4.9 kernel, kube-dns does not work.

I don't see any obvious messages in the logs that indicate what the problem is. I am hoping I can get some experts to help me narrow down the problem.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. rpmostree deploy 6de4ed0b6f63a030d00e65eb986abb39f230c134785ee965ab489884c24f7fa3

2. run kubernetes ansible against the host: https://github.com/kubernetes/contrib/tree/master/ansible

3. verify kubedns works: https://github.com/kubernetes/kubernetes/tree/release-1.2/cluster/addons/dns#how-do-i-test-if-it-is-working

4. deploy the next commit in the sequence: `rpm-ostree deploy 81bb249f4257fed960510b45c088bb8aa54d2f27873be9cf94c2b7639baa6f96`

5. reboot and verify dns is not working


Additional info:

Please find me (dustymabe in #fedora-cloud or #atomic on freenode) to discuss details.

Comment 1 Dusty Mabe 2017-01-17 16:35:18 UTC
Created attachment 1241888 [details]
journal-from-4.8.16.txt.gz

Comment 2 Dusty Mabe 2017-01-17 16:35:51 UTC
Created attachment 1241889 [details]
journal-from-4.9.3.txt.gz

Comment 3 Dusty Mabe 2017-01-17 18:25:15 UTC
Note that this system has selinux disabled because of https://bugzilla.redhat.com/show_bug.cgi?id=1414096. Jason brooks has confirmed that the same behavior happens on a system with newer kubernetes (1.5 with a fix for the selinux issue) and selinux enforcing.

Comment 4 Jason Brooks 2017-01-17 18:48:57 UTC
I tested on f25 w/ the 4.10.0-0.rc4.git0.1.fc26.x86_64 kernel and kube 1.5.1, and kube-dns works as expected, w/ selinux enforcing.

Comment 5 Eric Paris 2017-01-17 23:28:38 UTC
So this was fun and definitely a kernel regression of some sort. I'll attach the iptables rules of the node. They are exactly (modulo the generated chain names) the same between a workin 4.8 and a broken 4.9 kernel.

In the iptables rules I'm about to attach we have 1 container with ip addr 172.16.35.3. I try to run 'dig' from that container.

We have another container 172.16.35.2. It is running DNS.

We have a completely virtual ip address/udp port 10.254.0.10:53. Any traffic to the virtual ip/port should get dnat'd to 172.16.35.2:53.

On a 4.9 kernel listing on the host with `tcpdump -i any` I see:

17:22:24.273178 IP 172.16.35.2.49994 > 10.254.0.10.domain: 46023+ [1au] A? www.google.com. (43)

Basically I see traffic from the 'dig' to the virutal ip/port. Nothing else.

On a 4.8 kernel with the exact same setup and iptables rules, again with `tcpdump -i any` I see:

18:21:25.949497 IP 172.16.35.2.42645 > 10.254.0.10.domain: 54717+ [1au] A? www.google.com. (43)
18:21:25.949565 IP 172.16.35.2.42645 > 172.16.35.3.domain: 54717+ [1au] A? www.google.com. (43)
18:21:25.954133 IP 172.16.35.3.domain > 172.16.35.2.42645: 54717 1/0/1 A 216.58.219.68 (59)
18:21:25.954147 IP 10.254.0.10.domain > 172.16.35.2.42645: 54717 1/0/1 A 216.58.219.68 (59)

Which is what we'd expect. I see the client->vip. Then I see a second packet that has been DNAT to the real destination. I see the return from the real destination and the reversal of the DNAT.


An interesting thing I noticed when playing with tcpdump is that the host sees the first packet coming from somewhere different in 4.8 vs 4.9.  In 4.8 I can do:

`tcpdump -i docker0` and I see all 4 (expected) packets.

In 4.9 listening only on docker0 shows NO traffic at all. Instead in 4.9 I can only see the single packet using:

`tcpdump -i vethca4159a`

docker0 is a linux bridge:

# brctl show
bridge name	bridge id		STP enabled	interfaces
docker0		8000.0242cb34b484	no		veth8ebe5b8
							vethca4159a



It is as if on 4.9 the frames are not coming off of the bridge and instead are coming directly off the veth and the packets are not going through iptables. I can relatively easily set up a reproducer for you or give you root access to a VM that reproduces the issue.

Comment 6 Eric Paris 2017-01-17 23:30:32 UTC
Created attachment 1241980 [details]
full iptables ruleset that are not matching

Comment 7 Eric Paris 2017-01-17 23:37:18 UTC
I confirm that 4.10.0-0.rc4.git0.1.fc26.x86_64 is working for me. Packets are showing up on docker0 and are having iptables rules applied...

Comment 8 Jason Brooks 2017-01-18 00:18:22 UTC
I'm also having success w/ this 4.9.4-202.rhbz1414068.fc25.x86_64 kernel: https://koji.fedoraproject.org/koji/taskinfo?taskID=17316113

Comment 9 Laura Abbott 2017-01-18 01:57:37 UTC
I'll commit the fix to the repository. This should show up in the 4.9.5 kernel or another 4.9.4 build if that happens for some reason.

Comment 10 Fedora Update System 2017-01-20 22:08:05 UTC
kernel-4.9.5-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-e6012e74b6

Comment 11 Fedora Update System 2017-01-20 22:10:24 UTC
kernel-4.9.5-100.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-18ce368ba3

Comment 12 Fedora Update System 2017-01-21 21:52:54 UTC
kernel-4.9.5-100.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-18ce368ba3

Comment 13 Fedora Update System 2017-01-21 22:25:52 UTC
kernel-4.9.5-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-e6012e74b6

Comment 14 Fedora Update System 2017-01-24 03:20:34 UTC
kernel-4.9.5-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2017-01-24 03:48:42 UTC
kernel-4.9.5-100.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.