Bug 1481366
| Summary: | DNS does not resolve from within a container | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andy MacMurray <amacmurr> |
| Component: | Installer | Assignee: | Scott Dodson <sdodson> |
| Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.6.0 | CC: | amacmurr, aos-bugs, javier.ramirez, jokerman, mmccomas, sdodson, ssegal |
| Target Milestone: | --- | ||
| Target Release: | 3.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Previously dnsmasq was configured to listen on a specific ip address in an effort to avoid binding to 127.0.0.1:53 which is where the node service runs its dns service. Now we've configured dnsmasq to bind to all interfaces except lo which ensures that dnsmasq works properly on hosts with multiple interfaces.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-01-09 18:47:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Andy, If you change /etc/dnsmasq.d/origin-dns.conf to have the following and restart dnsmasq does everything start working? listen-address=172.17.0.61 -- Scott Removing listen-address= from /etc/dnsmasq.d/origin-dns.conf and allowing dnsmasq listen on all ips will break skydns - it wants localhost:53 Setting ip address to correct one solves the issue and dns resolving works from within containers: listen-address=172.17.0.61 Try to reproduce with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm
# cat inventory
<--snip-->
[nodes]
openshift-145.lab.sjc.redhat.com ansible_user=root ansible_ssh_user=root openshift_public_hostname=openshift-145.lab.sjc.redhat.com openshift_hostname=openshift-145.lab.sjc.redhat.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" openshift_schedulable=true openshift_ip=192.168.2.67 openshift_dns_ip=192.168.2.67
<--snip-->
It resulted in that `listen-address` still pointed to internal address.
# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=192.168.2.67
QE is unable to reproduce the issue with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm
Any tips for reproducing the bug?
Sasha, can you please assist with reproducing this bug and providing the necessary information? Thanks The way I tested this was to just add an alias to the interface and set openshift_dns_ip to that ip address. You can add an alias like this, just pick a random subnet that's not in use ip address add 192.168.1.1/24 dev eth0 Thanks Scott!
Tested with openshift-ansible-3.6.173.0.7-2.git.0.340aa2c.el7.noarch.rpm.
Installer failed with `TASK [openshift_node : Install Node package]` as the DNS resolution issue.
Log in to the host, found that DNS resolution failed:
# ping redhat.com
ping: redhat.com: Name or service not known
# cat /etc/resolv.conf
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search openstacklocal lab.sjc.redhat.com cluster.local
nameserver 192.168.2.105
# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=192.168.3.3
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:86:3b:ba brd ff:ff:ff:ff:ff:ff
inet 192.168.2.105/24 brd 192.168.2.255 scope global dynamic eth0
valid_lft 85955sec preferred_lft 85955sec
inet6 fe80::f816:3eff:fe86:3bba/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:7a:8c:10:6e brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:4b:98:59 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.3/24 brd 192.168.3.255 scope global dynamic eth1
valid_lft 85955sec preferred_lft 85955sec
inet6 fe80::f816:3eff:fe4b:9859/64 scope link
valid_lft forever preferred_lft forever
`192.168.2.105` is my `external` IP, `192.168.3.3` is the internal IP. The default route is via the `external` IP which is regarded as `nameserver` in /etc/resolve.conf.
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.2.1 0.0.0.0 UG 100 0 0 eth0
169.254.169.254 192.168.2.1 255.255.255.255 UGH 100 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.2.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
192.168.3.0 0.0.0.0 255.255.255.0 U 100 0 0 eth1
In the case, looks like that we have to change the value of nameserver in /etc/resolv.conf to `openshift_dns_ip`.
Moving to assigned, please let me know if my steps are wrong. Thanks!
Already set openshift_dns_ip=192.168.3.3 for the node. I install on OpenStack, just use our project OSOOS and not stock installer for this. https://github.com/openshift/openshift-ansible/pull/5778 should fix this in 3.7 Fixed via https://github.com/openshift/openshift-ansible/pull/5953 which has already merged and been built. Per https://bugzilla.redhat.com/show_bug.cgi?id=1491850#c7 Verified in openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch.rpm This was fixed in the 3.7 GA release. |
Description of problem: Our provisioning solution, OSOOS, was recently updated and tested to support OCP 3.6 provisioning. During testing a DNS issue was observed and is captured here from an internal email. Also please note, that I do not have specific version information. I've requested it, and will update this ticket once the information is available. - Installation of 3.6 finish with success. BUT When deploying test app I notice that there is problem with DNS resolving from within container. The DNS server of container set to: sh-4.2$ cat /etc/resolv.conf nameserver 172.17.0.61 search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal dbshtest.osepool.centralci.eng.rdu2.redhat.com options ndots:5 This is correct and works same with 3.5. The ip address 172.17.0.61 is set in advanced config file: n061.o.internal openshift_node_labels="{'region': 'bagl', 'zone': 'default', 'infrarole': 'router'}" openshift_ip=172.17.0.61 openshift_hostname=n061.o.internal openshift_dns_ip=172.17.0.61 This should be no problem, as per documentation: https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html (dnsmasq paragraph) In version 3.6 dnsmasq on host is setting another ip as listening ip of host: [root@n061 ~]# cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed no-negcache max-cache-ttl=1 enable-dbus bind-interfaces listen-address=10.11.152.61 So nothing is listening on 172.17.0.61 (we have several ips on host, 10.11.152.61 is external ip, and 172.17.0.61 is Openshift cluster traffic ip) In version 3.5 with using same config, this not happens, here are contents of same dnsmasq config file on 3.5: [root@n058 ~]# cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed server=/cluster.local/172.22.0.1 no-negcache max-cache-ttl=1 This makes dnsmasq on 3.5 listen on *:53 - all works. Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag