Bug 1481366 - DNS does not resolve from within a container
Summary: DNS does not resolve from within a container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.0
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 3.10.0
Assignee: Scott Dodson
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-14 18:17 UTC by Andy MacMurray
Modified: 2018-01-09 18:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously dnsmasq was configured to listen on a specific ip address in an effort to avoid binding to 127.0.0.1:53 which is where the node service runs its dns service. Now we've configured dnsmasq to bind to all interfaces except lo which ensures that dnsmasq works properly on hosts with multiple interfaces.
Clone Of:
Environment:
Last Closed: 2018-01-09 18:47:37 UTC


Attachments (Terms of Use)

Description Andy MacMurray 2017-08-14 18:17:25 UTC
Description of problem:

Our provisioning solution, OSOOS, was recently updated and tested to support OCP 3.6 provisioning. During testing a DNS issue was observed and is captured here from an internal email. 

Also please note, that I do not have specific version information. I've requested it, and will update this ticket once the information is available.

-

Installation of 3.6 finish with success. BUT
When deploying test app I notice that there is problem with DNS resolving from within container.
The DNS server of container set to:

sh-4.2$ cat /etc/resolv.conf
nameserver 172.17.0.61
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal dbshtest.osepool.centralci.eng.rdu2.redhat.com
options ndots:5

This is correct and works same with 3.5. The ip address 172.17.0.61 is set in advanced config file:

n061.o.internal openshift_node_labels="{'region': 'bagl', 'zone': 'default', 'infrarole': 'router'}" openshift_ip=172.17.0.61 openshift_hostname=n061.o.internal openshift_dns_ip=172.17.0.61

This should be no problem, as per documentation:
https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html (dnsmasq paragraph)

In version 3.6 dnsmasq on host is setting another ip as listening ip of host:

[root@n061 ~]# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=10.11.152.61

So nothing is listening on 172.17.0.61 (we have several ips on host, 10.11.152.61 is external ip, and 172.17.0.61 is Openshift cluster traffic ip)

In version 3.5 with using same config, this not happens, here are contents of same dnsmasq config file on 3.5:

[root@n058 ~]# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
server=/cluster.local/172.22.0.1
no-negcache
max-cache-ttl=1

This makes dnsmasq on 3.5 listen on *:53 - all works.



Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2017-08-14 20:40:55 UTC
Andy,

If you change /etc/dnsmasq.d/origin-dns.conf to have the following and restart dnsmasq does everything start working?

listen-address=172.17.0.61

--
Scott

Comment 2 Scott Dodson 2017-08-14 20:42:43 UTC
https://github.com/openshift/openshift-ansible/pull/5087 untested fix

Comment 3 Sasha Segal 2017-08-15 09:37:49 UTC
Removing listen-address= from /etc/dnsmasq.d/origin-dns.conf and allowing dnsmasq listen on all ips will break skydns - it wants localhost:53
Setting ip address to correct one solves the issue and dns resolving works from within containers:
listen-address=172.17.0.61

Comment 4 Scott Dodson 2017-08-18 14:14:44 UTC
https://github.com/openshift/openshift-ansible/pull/5137 3.6 backport

Comment 6 Gan Huang 2017-08-23 08:27:24 UTC
Try to reproduce with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm

# cat inventory
<--snip-->

[nodes]
openshift-145.lab.sjc.redhat.com ansible_user=root ansible_ssh_user=root  openshift_public_hostname=openshift-145.lab.sjc.redhat.com openshift_hostname=openshift-145.lab.sjc.redhat.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" openshift_schedulable=true openshift_ip=192.168.2.67  openshift_dns_ip=192.168.2.67 

<--snip-->

It resulted in that `listen-address` still pointed to internal address.
#  cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=192.168.2.67

QE is unable to reproduce the issue with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm


Any tips for reproducing the bug?

Comment 7 Andy MacMurray 2017-08-23 10:03:59 UTC
Sasha, can you please assist with reproducing this bug and providing the necessary information? Thanks

Comment 10 Scott Dodson 2017-08-23 15:02:37 UTC
The way I tested this was to just add an alias to the interface and set openshift_dns_ip to that ip address. You can add an alias like this, just pick a random subnet that's not in use

ip address add 192.168.1.1/24 dev eth0

Comment 11 Gan Huang 2017-08-24 11:11:54 UTC
Thanks Scott! 

Tested with openshift-ansible-3.6.173.0.7-2.git.0.340aa2c.el7.noarch.rpm.

Installer failed with `TASK [openshift_node : Install Node package]` as the DNS resolution issue.

Log in to the host, found that DNS resolution failed:

# ping redhat.com
ping: redhat.com: Name or service not known

# cat /etc/resolv.conf 
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search openstacklocal lab.sjc.redhat.com cluster.local
nameserver 192.168.2.105

# cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=192.168.3.3

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:86:3b:ba brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.105/24 brd 192.168.2.255 scope global dynamic eth0
       valid_lft 85955sec preferred_lft 85955sec
    inet6 fe80::f816:3eff:fe86:3bba/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:7a:8c:10:6e brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:4b:98:59 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.3/24 brd 192.168.3.255 scope global dynamic eth1
       valid_lft 85955sec preferred_lft 85955sec
    inet6 fe80::f816:3eff:fe4b:9859/64 scope link 
       valid_lft forever preferred_lft forever


`192.168.2.105` is my `external` IP, `192.168.3.3` is the internal IP. The default route is via the `external` IP which is regarded as `nameserver` in /etc/resolve.conf.

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.2.1     0.0.0.0         UG    100    0        0 eth0
169.254.169.254 192.168.2.1     255.255.255.255 UGH   100    0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.2.0     0.0.0.0         255.255.255.0   U     100    0        0 eth0
192.168.3.0     0.0.0.0         255.255.255.0   U     100    0        0 eth1

In the case, looks like that we have to change the value of nameserver in /etc/resolv.conf to `openshift_dns_ip`.

Moving to assigned, please let me know if my steps are wrong. Thanks!

Comment 12 Gan Huang 2017-08-24 11:14:02 UTC
Already set openshift_dns_ip=192.168.3.3 for the node.

Comment 13 Sasha Segal 2017-08-24 19:27:39 UTC
I install on OpenStack, just use our project OSOOS and not stock installer for this.

Comment 16 Scott Dodson 2017-10-18 17:29:34 UTC
https://github.com/openshift/openshift-ansible/pull/5778 should fix this in 3.7

Comment 17 Scott Dodson 2017-11-02 14:17:07 UTC
Fixed via https://github.com/openshift/openshift-ansible/pull/5953 which has already merged and been built.

Comment 18 Gan Huang 2017-11-06 05:25:14 UTC
Per https://bugzilla.redhat.com/show_bug.cgi?id=1491850#c7

Verified in openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch.rpm

Comment 20 Scott Dodson 2018-01-09 18:47:37 UTC
This was fixed in the 3.7 GA release.


Note You need to log in before you can comment on or make changes to this bug.