Bug 1491850

Summary: DNS resolution is broken when installing on host with multiple NICs
Product: OpenShift Container Platform Reporter: Anton Sherkhonov <asherkho>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, asherkho, jialiu, jokerman, mmccomas, pportant
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:10:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anton Sherkhonov 2017-09-14 20:19:33 UTC
Description of problem:
Similar to Bug 1481366. See that bug for extra info.

The host has 2 NICs(in this case we have 3 nics, but only 2 that matter for this bug).
1st NIC is primary on the host, connected to the outside world.
2nd NIC will be dedicated for internal openshift traffic.

In the inventory the hosts are configured the following way to have a separate IP for openshift specific traffic (172.17.11.4 is on the 2nd NIC)
hp60ds-4.o.internal openshift_node_labels="{'region': 'bagl', 'zone': 'default'}"                        openshift_ip=172.17.11.4  openshift_hostname=hp60ds-4.o.internal openshift_dns_ip=172.17.11.4

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch
rpm -q ansible
ansible-2.3.2.0-2.el7.noarch
ansible --version
ansible 2.3.2.0
  config file = /home/ocp_deployment/OpenShiftCluster-v3/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
always.

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated


2017-09-14 17:52:41,826 p=25774 u=root |  TASK [openshift_node : Install Node package] *************************************************************************************************************************************
2017-09-14 17:52:51,207 p=25774 u=root |  fatal: [hp60ds-3.o.internal]: FAILED! => {
    "changed": true,
    "failed": true,
    "rc": 1,
    "results": [
        "Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-\n              : manager, versionlock\nResolving Dependencies\n--> Running transaction check\n---> Package atomic-openshift-node.x86_64 0:3.6.173.0.21-1.git.0.f95b0e7.el7 will be installed\n---> Package tuned-profiles-atomic-openshift-node.x86_64 0:3.6.173.0.21-1.git.0.f95b0e7.el7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package\n       Arch   Version                          Repository                  Size\n================================================================================\nInstalling:\n atomic-openshift-node\n       x86_64 3.6.173.0.21-1.git.0.f95b0e7.el7 rhel-7-server-ose-3.6-rpms 717 k\n tuned-profiles-atomic-openshift-node\n       x86_64 3.6.173.0.21-1.git.0.f95b0e7.el7 rhel-7-server-ose-3.6-rpms 721 k\n\nTransaction Summary\n================================================================================\nInstall  2 Packages\n\nTotal download size: 1.4 M\nInstalled size: 14 k\nDownloading packages:\n"
    ]
}

MSG:

https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/rhgs-server/3.1/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/fast-datapath/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/extras/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/Packages/atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64.rpm: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/Packages/tuned-profiles-atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64.rpm: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.


Error downloading packages:
  atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64: [Errno 256] No more mirrors to try.
  tuned-profiles-atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64: [Errno 256] No more mirrors to try.


---
On any one of the nodes as a result I have:

# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=172.17.0.58

where 172.17.0.58 - the IP for OpenShift-specific traffic.


Expected results:
1) installation complete successfully
2) hosts outside of the cluster can be resolved from the hosts within the cluster

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

I've workarounded the issue by adding extra listening address for dnsmasq:
listen-address={{ ansible_default_ipv4.address }}
to /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/examples/playbooks/roles/openshift_node_dnsmasq/templates/origin-dns.conf.j2

Comment 1 Scott Dodson 2017-10-12 19:09:39 UTC
Anton,

Which IP address is in /etc/resolv.conf? I imagine not the value of openshift_ip but instead the default interface? I'm working on a refactor of the dispatcher script that instead of listening on a specific address it listens to all interfaces other than lo. I imagine that would address the issue.

Comment 2 Anton Sherkhonov 2017-10-13 18:26:16 UTC
Scott, yes.
/etc/resolv.conf has one `nameserver <ip>` entry, where the <ip> is the ip of the default interface of the node.
`openshift_ip` for that node is defined by inventory, it's the ip of the 2nd NIC.

Comment 3 Scott Dodson 2017-10-18 17:31:00 UTC
https://github.com/openshift/openshift-ansible/pull/5778 should fix this in 3.7

Comment 4 Scott Dodson 2017-10-31 17:42:24 UTC
https://github.com/openshift/openshift-ansible/pull/5891 was merged to address this

Comment 5 Scott Dodson 2017-10-31 17:54:47 UTC
https://github.com/openshift/openshift-ansible/pull/5953 probably necessary also to avoid racing at startup

Comment 6 Scott Dodson 2017-11-02 13:30:20 UTC
In 3.7.0-0.189.0

Comment 7 Gan Huang 2017-11-06 05:23:54 UTC
Verified in openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch.rpm

1) ##Spin up instances with two NICs:
# ip addr |grep eth |grep inet
    inet 172.16.120.98/24 brd 172.16.120.255 scope global dynamic eth0
    inet 192.168.33.3/24 brd 192.168.33.255 scope global dynamic eth1

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.120.1    0.0.0.0         UG    100    0        0 eth0
10.128.0.0      0.0.0.0         255.252.0.0     U     0      0        0 tun0
169.254.169.254 192.168.33.1    255.255.255.255 UGH   100    0        0 eth1
172.16.120.0    0.0.0.0         255.255.255.0   U     100    0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.30.0.0      0.0.0.0         255.255.0.0     U     0      0        0 tun0
192.168.33.0    0.0.0.0         255.255.255.0   U     100    0        0 eth1


2) ##Trigger installation against two NICs of the instances:
# cat inventory_hosts
<--snip-->

[masters]
host-8-240-252.host.centralci.eng.rdu2.redhat.com 

[nodes]
host-8-240-252.host.centralci.eng.rdu2.redhat.com openshift_node_labels="{'role': 'node'}" openshift_ip=192.168.33.3  openshift_dns_ip=192.168.33.3 

host-8-241-27.host.centralci.eng.rdu2.redhat.com  openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" openshift_ip=192.168.33.5  openshift_dns_ip=192.168.33.5

[etcd]
host-8-241-126.host.centralci.eng.rdu2.redhat.com 

[nfs]
host-8-240-252.host.centralci.eng.rdu2.redhat.com

<--snip-->

3) ##Check the configurations
# cat /etc/resolv.conf 
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search openstacklocal cluster.local
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 172.16.120.98

# cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
dns-forward-max=5000
cache-size=5000
bind-dynamic
except-interface=lo
# End of config

# cat /etc/dnsmasq.d/origin-upstream-dns.conf 
server=172.16.120.11
server=172.16.120.2
server=172.16.120.3

4) ##S2I build successfully

Comment 11 errata-xmlrpc 2017-11-28 22:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188