Bug 1401425

Summary:	[3.4]The nameservers in /etc/resolv.conf are generated to non-node's ip by NetworkManager after rebooting nodes
Product:	OpenShift Container Platform	Reporter:	Gan Huang <ghuang>
Component:	Installer	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED ERRATA	QA Contact:	Johnny Liu <jialiu>
Severity:	high	Docs Contact:
Priority:	medium
Version:	3.4.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously the NetworkManager dispatcher script did not correctly update /etc/resolv.conf after a host was rebooted. The script has been updated to ensure that /etc/resolv.conf is updated on reboot ensuring proper use of dnsmasq.	Story Points:	---
Clone Of:
Clones:	1401427 1401428 (view as bug list)		Environment:
Last Closed:	2017-01-18 12:56:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1401427, 1401428

Description Gan Huang 2016-12-05 08:52:47 UTC

Description of problem:
After rebooting the node which belongs to the OCP cluster, the nameserver in /etc/resolv.conf would be regenerated automatically by NetworkManager. Actually we expect it should still be the node's ip

Version-Release number of selected component (if applicable):
openshift-ansible-3.4.33-1
NetworkManager-1.4.0-13.el7_3.x86_64

How reproducible:
always

Steps to Reproduce:
1. Install OCP
2. Check /etc/resolv.conf on each node
3. reboot one node after installation
4. Check /etc/resolv.conf on each node

Actual results:

2. 
# cat /etc/resolv.conf 
# Generated by NetworkManager
search c.openshift-gce-devel.internal google.internal
nameserver 10.240.0.35   --> the node's ip
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh

4. 
# cat /etc/resolv.conf 
# Generated by NetworkManager
search c.openshift-gce-devel.internal google.internal
nameserver 169.254.169.254

Expected results:


Additional info:

Restart NetworkManager can fix it.

It looks like NetworkManager starts before dnsmasq on a reboot.

#journalctl -u NetworkManager
Dec 05 03:33:00 qe-ghuang-conrhel-master-1 NetworkManager[662]: <info>  [1480926780.5415] device (eth0): Activation: successful, device activated.
Dec 05 03:33:04 qe-ghuang-conrhel-master-1 NetworkManager[662]: <info>  [1480926784.9640] manager: startup complete

#journalctl -u dnsmasq -l
-- Logs begin at Mon 2016-12-05 03:32:44 EST, end at Mon 2016-12-05 03:47:05 EST. --
Dec 05 03:33:05 qe-ghuang-conrhel-master-1 systemd[1]: Started DNS caching server..
Dec 05 03:33:05 qe-ghuang-conrhel-master-1 systemd[1]: Starting DNS caching server....

Comment 1 Scott Dodson 2016-12-05 14:36:00 UTC

Hmm, dispatcher script should start it if it's not started. Will look into this today.

Comment 2 Scott Dodson 2016-12-05 16:01:19 UTC

This is a regression introduced in https://github.com/openshift/openshift-ansible/pull/2690

Comment 3 Scott Dodson 2016-12-05 19:51:30 UTC

Proposed fix https://github.com/openshift/openshift-ansible/pull/2915

Comment 4 openshift-github-bot 2016-12-05 20:42:38 UTC

Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/7d9b624ea0707eaf6875a6781aed2f7f387fe349
node_dnsmasq - restart dnsmasq if it's not currently running

Fixes Bug 1401425
Fixes BZ1401425

Comment 6 Gan Huang 2016-12-06 09:10:10 UTC

Verified with origin/release-1.4 branch

Will test again once new puddle coming

Comment 7 Johnny Liu 2016-12-07 10:36:37 UTC

Verified this bug with openshift-ansible-3.4.35-1.git.0.2e13650.el7, and PASS.

After reboot node, the nameserver in /etc/resolv.conf is still kept to node's IP.

Comment 9 errata-xmlrpc 2017-01-18 12:56:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066