Description of problem: Fail to resolve DNS in container which started by 'docker run' Version-Release number of selected component (if applicable): openshift v3.4.0.18+ada983f kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Docker version 1.12.1, build fd47464-redhat docker-common-1.12.1-6.el7.x86_64 we update openvswitch to openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup OCP env and ssh into the node 2. Start a docker container manually # docker pull bmeng/hello-openshift # docker run -td bmeng/hello-openshift 3. Login to the docker container and check the network # docker exec -it <container_id> bash # nslookup www.redhat.com Actual results: bash-4.3# nslookup www.redhat.com Server: 10.240.0.45 Address 1: 10.240.0.45 nslookup: can't resolve 'www.redhat.com' Expected results: / $ nslookup www.redhat.com Server: 10.240.0.46 Address 1: 10.240.0.46 qe-dyan1-node-registry-router-2.c.openshift-gce-devel.internal Name: www.redhat.com Address 1: 23.194.78.16 a23-194-78-16.deploy.static.akamaitechnologies.com Address 2: 2600:1407:9:39a::d44 Address 3: 2600:1407:9:389::d44 Additional info: It works when creating the container by oc create https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/pod-for-ping.json
What does your resolv.conf look like in the container? What does it look like outside? Are you running a dnsmasq on the nodes? What does the config look like?
Created attachment 1216833 [details] node-config
How did you install that cluster?
worksforme using a dind cluster. Can you provide the output of "ip a", "ip r", and "traceroute 10.240.0.45" (or whatever the DNS ends up being this time) inside the container, and "ip a", "ip r", "brctl show docker0" and "iptables-save" outside the container?
Created attachment 1217212 [details] config info in container and node
OK, the problem seems to be that docker's iptables rules have been deleted. systemctl status shows that docker was started at "2016-11-03 00:02:51 EDT" and iptables.service was started at "2016-11-03 00:03:31 EDT". So that's the problem; iptables.service was started after docker, and deleted its rules. In 3.3, OpenShift would always restart docker after it was started up at boot time (to change its configuration), but that no longer happens, so we depend on docker and iptables.service being started in the right order. Since iptables.service is being enabled by ansible, it seems like the right fix here is for ansible to also install an appropriate systemd unit file that will ensure that docker doesn't get started until after iptables.service does.
Thanks for the work around, it works well after docker and iptables.service being started in the right order
https://github.com/openshift/openshift-ansible/pull/2770
Created attachment 1219297 [details] some info inside container
Created attachment 1219299 [details] iptable rules
Test with openshift v3.4.0.24+52fd77b kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 docker version Version: 1.12.3 API version: 1.24 Package version: docker-common-1.12.3-4.el7.x86_64 Go version: go1.6.2 Git commit: f320458-redhat Built: Mon Nov 7 10:15:24 2016 OS/Arch: linux/amd64 # uname -r 3.10.0-327.36.1.el7.x86_64 the work around does not work, attach some info inside container ( attachment 1219297 [details] ), and iptable rules ( attachment 1219299 [details] )
Please provide details on how the workaround was applied. For example, the systemd files applied, the order of commands executed, such as `systemctl daemon-reload`, and the output of `systemctl status iptables` and `systemctl status docker`.
According to comemnt 10, the work around is "systemctl iptables restart" -> "systemctl docker restart" in order. comment 11 is confirming the workaround works, while comment 15 is saying the workaround does not any more. But I could confirm this workaround still works well in my env, you could have a try in your env.
For comment 11 , I confirmed the work around with openshift v3.4.0.23+24b1a58 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 docker version Version: 1.12.3 API version: 1.24 Package version: docker-common-1.12.3-2.el7.x86_64 Go version: go1.6.2 Git commit: 81ac282-redhat Built: Tue Nov 1 12:01:01 2016 OS/Arch: linux/amd64
The work around steps I perform: 1.After openshift cluster is ready, restart iptables service $ systemctl restart iptables 2.Then restart docker service $ systemctl restart docker no other extra steps
This bug is blocked by 1394491.
After manually apply the PR#2770, on a openstack install, this PR works well, while on a AWS install, still failed. Dig more, found on a AWS install, have to open udp 53 port on node host to allow traffic from container to dnsmaq service, while on a openstack install, does not need do that.
(In reply to Johnny Liu from comment #24) > After manually apply the PR#2770, on a openstack install, this PR works > well, while on a AWS install, still failed. > > Dig more, found on a AWS install, have to open udp 53 port on node host to > allow traffic from container to dnsmaq service, while on a openstack > install, does not need do that. Actually on a openstack install, also have the same problem (need open 53 port on iptables). The reason of my previous testing on openstack succeeding was installer also adding office public DNS server into /etc/resolv.conf. On AWS install: # cat /etc/resolv.conf # Generated by NetworkManager search ec2.internal nameserver 172.18.11.184 ---> node's ip # nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh On openstack install: # cat /etc/resolv.conf # Generated by NetworkManager search openstacklocal lab.sjc.redhat.com nameserver 192.168.2.9 ---> node's ip nameserver 10.11.5.19 ---> office public ip, after remove this line, the same behavior is seen as aws install # nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
Run the following command on node to open 53 port to workaround for comment 25: # iptables -A OS_FIREWALL_ALLOW -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT
(In reply to Dongbo Yan from comment #22) > The work around steps I perform: > 1.After openshift cluster is ready, restart iptables service > $ systemctl restart iptables > 2.Then restart docker service > $ systemctl restart docker > > no other extra steps If you manually restart iptables, you have to restart both docker and openshift afterward. But you shouldn't be manually restarting iptables anyway; we know how things behave when the services are *manually* restarted. This bug was about making sure that they get started in the right order at boot time. So the test should be, boot the machine, and if things work, then the services were started in the right order and the bug is fixed.
Created attachment 1220860 [details] 3.4 node iptables default rules after also adding "iptables -A INPUT -i docker0 -j ACCEPT"
Ignore comment 27... the comment it was replying to was a red herring. The problem is that since docker traffic is no longer being routed through OVS, we need a rule to accept docker0 traffic.
openshift v3.4.0.29+ca980ba kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066