Bug 1491850 - DNS resolution is broken when installing on host with multiple NICs
Summary: DNS resolution is broken when installing on host with multiple NICs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Michael Gugino
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-14 20:19 UTC by Anton Sherkhonov
Modified: 2018-07-30 08:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-28 22:10:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Anton Sherkhonov 2017-09-14 20:19:33 UTC
Description of problem:
Similar to Bug 1481366. See that bug for extra info.

The host has 2 NICs(in this case we have 3 nics, but only 2 that matter for this bug).
1st NIC is primary on the host, connected to the outside world.
2nd NIC will be dedicated for internal openshift traffic.

In the inventory the hosts are configured the following way to have a separate IP for openshift specific traffic (172.17.11.4 is on the 2nd NIC)
hp60ds-4.o.internal openshift_node_labels="{'region': 'bagl', 'zone': 'default'}"                        openshift_ip=172.17.11.4  openshift_hostname=hp60ds-4.o.internal openshift_dns_ip=172.17.11.4

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch
rpm -q ansible
ansible-2.3.2.0-2.el7.noarch
ansible --version
ansible 2.3.2.0
  config file = /home/ocp_deployment/OpenShiftCluster-v3/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:
always.

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated


2017-09-14 17:52:41,826 p=25774 u=root |  TASK [openshift_node : Install Node package] *************************************************************************************************************************************
2017-09-14 17:52:51,207 p=25774 u=root |  fatal: [hp60ds-3.o.internal]: FAILED! => {
    "changed": true,
    "failed": true,
    "rc": 1,
    "results": [
        "Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-\n              : manager, versionlock\nResolving Dependencies\n--> Running transaction check\n---> Package atomic-openshift-node.x86_64 0:3.6.173.0.21-1.git.0.f95b0e7.el7 will be installed\n---> Package tuned-profiles-atomic-openshift-node.x86_64 0:3.6.173.0.21-1.git.0.f95b0e7.el7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package\n       Arch   Version                          Repository                  Size\n================================================================================\nInstalling:\n atomic-openshift-node\n       x86_64 3.6.173.0.21-1.git.0.f95b0e7.el7 rhel-7-server-ose-3.6-rpms 717 k\n tuned-profiles-atomic-openshift-node\n       x86_64 3.6.173.0.21-1.git.0.f95b0e7.el7 rhel-7-server-ose-3.6-rpms 721 k\n\nTransaction Summary\n================================================================================\nInstall  2 Packages\n\nTotal download size: 1.4 M\nInstalled size: 14 k\nDownloading packages:\n"
    ]
}

MSG:

https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/rhgs-server/3.1/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/fast-datapath/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/extras/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/os/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/Packages/atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64.rpm: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.
https://cdn.redhat.com/content/dist/rhel/server/7/7Server/x86_64/ose/3.6/os/Packages/tuned-profiles-atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64.rpm: [Errno 14] curl#6 - "Could not resolve host: cdn.redhat.com; Unknown error"
Trying other mirror.


Error downloading packages:
  atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64: [Errno 256] No more mirrors to try.
  tuned-profiles-atomic-openshift-node-3.6.173.0.21-1.git.0.f95b0e7.el7.x86_64: [Errno 256] No more mirrors to try.


---
On any one of the nodes as a result I have:

# cat /etc/dnsmasq.d/origin-dns.conf
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=172.17.0.58

where 172.17.0.58 - the IP for OpenShift-specific traffic.


Expected results:
1) installation complete successfully
2) hosts outside of the cluster can be resolved from the hosts within the cluster

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

I've workarounded the issue by adding extra listening address for dnsmasq:
listen-address={{ ansible_default_ipv4.address }}
to /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/examples/playbooks/roles/openshift_node_dnsmasq/templates/origin-dns.conf.j2

Comment 1 Scott Dodson 2017-10-12 19:09:39 UTC
Anton,

Which IP address is in /etc/resolv.conf? I imagine not the value of openshift_ip but instead the default interface? I'm working on a refactor of the dispatcher script that instead of listening on a specific address it listens to all interfaces other than lo. I imagine that would address the issue.

Comment 2 Anton Sherkhonov 2017-10-13 18:26:16 UTC
Scott, yes.
/etc/resolv.conf has one `nameserver <ip>` entry, where the <ip> is the ip of the default interface of the node.
`openshift_ip` for that node is defined by inventory, it's the ip of the 2nd NIC.

Comment 3 Scott Dodson 2017-10-18 17:31:00 UTC
https://github.com/openshift/openshift-ansible/pull/5778 should fix this in 3.7

Comment 4 Scott Dodson 2017-10-31 17:42:24 UTC
https://github.com/openshift/openshift-ansible/pull/5891 was merged to address this

Comment 5 Scott Dodson 2017-10-31 17:54:47 UTC
https://github.com/openshift/openshift-ansible/pull/5953 probably necessary also to avoid racing at startup

Comment 6 Scott Dodson 2017-11-02 13:30:20 UTC
In 3.7.0-0.189.0

Comment 7 Gan Huang 2017-11-06 05:23:54 UTC
Verified in openshift-ansible-3.7.0-0.190.0.git.0.129e91a.el7.noarch.rpm

1) ##Spin up instances with two NICs:
# ip addr |grep eth |grep inet
    inet 172.16.120.98/24 brd 172.16.120.255 scope global dynamic eth0
    inet 192.168.33.3/24 brd 192.168.33.255 scope global dynamic eth1

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.120.1    0.0.0.0         UG    100    0        0 eth0
10.128.0.0      0.0.0.0         255.252.0.0     U     0      0        0 tun0
169.254.169.254 192.168.33.1    255.255.255.255 UGH   100    0        0 eth1
172.16.120.0    0.0.0.0         255.255.255.0   U     100    0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.30.0.0      0.0.0.0         255.255.0.0     U     0      0        0 tun0
192.168.33.0    0.0.0.0         255.255.255.0   U     100    0        0 eth1


2) ##Trigger installation against two NICs of the instances:
# cat inventory_hosts
<--snip-->

[masters]
host-8-240-252.host.centralci.eng.rdu2.redhat.com 

[nodes]
host-8-240-252.host.centralci.eng.rdu2.redhat.com openshift_node_labels="{'role': 'node'}" openshift_ip=192.168.33.3  openshift_dns_ip=192.168.33.3 

host-8-241-27.host.centralci.eng.rdu2.redhat.com  openshift_node_labels="{'role': 'node','registry': 'enabled','router': 'enabled'}" openshift_ip=192.168.33.5  openshift_dns_ip=192.168.33.5

[etcd]
host-8-241-126.host.centralci.eng.rdu2.redhat.com 

[nfs]
host-8-240-252.host.centralci.eng.rdu2.redhat.com

<--snip-->

3) ##Check the configurations
# cat /etc/resolv.conf 
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search openstacklocal cluster.local
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 172.16.120.98

# cat /etc/dnsmasq.d/origin-dns.conf 
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
dns-forward-max=5000
cache-size=5000
bind-dynamic
except-interface=lo
# End of config

# cat /etc/dnsmasq.d/origin-upstream-dns.conf 
server=172.16.120.11
server=172.16.120.2
server=172.16.120.3

4) ##S2I build successfully

Comment 11 errata-xmlrpc 2017-11-28 22:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.