Description of problem: Our provisioning solution, OSOOS, was recently updated and tested to support OCP 3.6 provisioning. During testing a DNS issue was observed and is captured here from an internal email. Also please note, that I do not have specific version information. I've requested it, and will update this ticket once the information is available. - Installation of 3.6 finish with success. BUT When deploying test app I notice that there is problem with DNS resolving from within container. The DNS server of container set to: sh-4.2$ cat /etc/resolv.conf nameserver 172.17.0.61 search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal dbshtest.osepool.centralci.eng.rdu2.redhat.com options ndots:5 This is correct and works same with 3.5. The ip address 172.17.0.61 is set in advanced config file: n061.o.internal openshift_node_labels="{'region': 'bagl', 'zone': 'default', 'infrarole': 'router'}" openshift_ip=172.17.0.61 openshift_hostname=n061.o.internal openshift_dns_ip=172.17.0.61 This should be no problem, as per documentation: https://docs.openshift.com/container-platform/3.6/install_config/install/prerequisites.html (dnsmasq paragraph) In version 3.6 dnsmasq on host is setting another ip as listening ip of host: [root@n061 ~]# cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed no-negcache max-cache-ttl=1 enable-dbus bind-interfaces listen-address=10.11.152.61 So nothing is listening on 172.17.0.61 (we have several ips on host, 10.11.152.61 is external ip, and 172.17.0.61 is Openshift cluster traffic ip) In version 3.5 with using same config, this not happens, here are contents of same dnsmasq config file on 3.5: [root@n058 ~]# cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed server=/cluster.local/172.22.0.1 no-negcache max-cache-ttl=1 This makes dnsmasq on 3.5 listen on *:53 - all works. Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
Andy, If you change /etc/dnsmasq.d/origin-dns.conf to have the following and restart dnsmasq does everything start working? listen-address=172.17.0.61 -- Scott
https://github.com/openshift/openshift-ansible/pull/5087 untested fix
Removing listen-address= from /etc/dnsmasq.d/origin-dns.conf and allowing dnsmasq listen on all ips will break skydns - it wants localhost:53 Setting ip address to correct one solves the issue and dns resolving works from within containers: listen-address=172.17.0.61
https://github.com/openshift/openshift-ansible/pull/5137 3.6 backport
Try to reproduce with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm # cat inventory <--snip--> [nodes] openshift-145.lab.sjc.redhat.com ansible_user=root ansible_ssh_user=root openshift_public_hostname=openshift-145.lab.sjc.redhat.com openshift_hostname=openshift-145.lab.sjc.redhat.com openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" openshift_schedulable=true openshift_ip=192.168.2.67 openshift_dns_ip=192.168.2.67 <--snip--> It resulted in that `listen-address` still pointed to internal address. # cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed no-negcache max-cache-ttl=1 enable-dbus bind-interfaces listen-address=192.168.2.67 QE is unable to reproduce the issue with openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch.rpm Any tips for reproducing the bug?
Sasha, can you please assist with reproducing this bug and providing the necessary information? Thanks
The way I tested this was to just add an alias to the interface and set openshift_dns_ip to that ip address. You can add an alias like this, just pick a random subnet that's not in use ip address add 192.168.1.1/24 dev eth0
Thanks Scott! Tested with openshift-ansible-3.6.173.0.7-2.git.0.340aa2c.el7.noarch.rpm. Installer failed with `TASK [openshift_node : Install Node package]` as the DNS resolution issue. Log in to the host, found that DNS resolution failed: # ping redhat.com ping: redhat.com: Name or service not known # cat /etc/resolv.conf # nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh # Generated by NetworkManager search openstacklocal lab.sjc.redhat.com cluster.local nameserver 192.168.2.105 # cat /etc/dnsmasq.d/origin-dns.conf no-resolv domain-needed no-negcache max-cache-ttl=1 enable-dbus bind-interfaces listen-address=192.168.3.3 # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:86:3b:ba brd ff:ff:ff:ff:ff:ff inet 192.168.2.105/24 brd 192.168.2.255 scope global dynamic eth0 valid_lft 85955sec preferred_lft 85955sec inet6 fe80::f816:3eff:fe86:3bba/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7a:8c:10:6e brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever 4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:4b:98:59 brd ff:ff:ff:ff:ff:ff inet 192.168.3.3/24 brd 192.168.3.255 scope global dynamic eth1 valid_lft 85955sec preferred_lft 85955sec inet6 fe80::f816:3eff:fe4b:9859/64 scope link valid_lft forever preferred_lft forever `192.168.2.105` is my `external` IP, `192.168.3.3` is the internal IP. The default route is via the `external` IP which is regarded as `nameserver` in /etc/resolve.conf. # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.2.1 0.0.0.0 UG 100 0 0 eth0 169.254.169.254 192.168.2.1 255.255.255.255 UGH 100 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.2.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0 192.168.3.0 0.0.0.0 255.255.255.0 U 100 0 0 eth1 In the case, looks like that we have to change the value of nameserver in /etc/resolv.conf to `openshift_dns_ip`. Moving to assigned, please let me know if my steps are wrong. Thanks!
Already set openshift_dns_ip=192.168.3.3 for the node.
I install on OpenStack, just use our project OSOOS and not stock installer for this.
https://github.com/openshift/openshift-ansible/pull/5778 should fix this in 3.7
Fixed via https://github.com/openshift/openshift-ansible/pull/5953 which has already merged and been built.
Per https://bugzilla.redhat.com/show_bug.cgi?id=1491850#c7 Verified in openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch.rpm
This was fixed in the 3.7 GA release.