1903146 – After minor upgrade cloud-init started to erase /etc/resolv.conf contents on overcloud nodes after reboot

Bug 1903146 - After minor upgrade cloud-init started to erase /etc/resolv.conf contents on overcloud nodes after reboot

Summary: After minor upgrade cloud-init started to erase /etc/resolv.conf contents on ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	13.0 (Queens)
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	13.0 (Queens)
Assignee:	Rabi Mishra
QA Contact:	David Rosenfeld
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1933202 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-01 13:23 UTC by Alex Stupnikov
Modified:	2024-03-25 17:19 UTC (History)
CC List:	19 users (show)
Fixed In Version:	openstack-tripleo-common-8.7.1-28.el7ost openstack-tripleo-heat-templates-8.4.1-86.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-02 13:36:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	775555	None	MERGED	Stop NetworkManager from updating resolv.conf	2021-04-08 09:57:11 UTC
OpenStack gerrit	776925	None	MERGED	[queens] Don't use service_facts	2021-04-08 09:56:25 UTC
Red Hat Issue Tracker	OSP-1384	None	None	None	2022-08-30 14:49:13 UTC
Red Hat Knowledge Base (Solution)	894753	None	None	None	2020-12-04 10:34:26 UTC
Red Hat Knowledge Base (Solution)	5837741	None	None	None	2021-09-02 01:37:41 UTC
Red Hat Product Errata	RHBA-2021:2978	None	None	None	2021-08-02 13:36:24 UTC

Description Alex Stupnikov 2020-12-01 13:23:06 UTC

Description of problem:

I am not sure how to properly define affected component: the closest one is os-net-config, but it has nothing to do with reported problem. So I selected openstack-tripleo and hope that it will be re-routed properly.

A fix for bug #1748015 set dependency between NetworkManager.service and cloud-final.service to address the problem with OpenStack guest instances when cloud-init configuration in /etc/resolv.conf was overwritten by NetworkManager.

Unfortunately, this change didn't work well for RHOSP overcloud nodes: cloud-init now runs after network.service and nullifies its configuration in /etc/resolv.conf after reboot.

I am not completely sure how to address this issue: cloud-init people could  say that we are using custom procedure to configure overcloud nodes and we should change it, so I would like to ask for a second look from developers.

I also found the workaround [1] (basically, it reverts the fix proposed by cloud-init people). Could you tell me if it is correct approach here?

We reproduced this issue for RHOSP 13, but I am not sure if RHOSP 16.1 is affected: same cloud-init version is used there.

[1]
--- cloud-final.service 2020-12-01 13:21:16.828434342 +0000
+++ /etc/systemd/system/cloud-init.target.wants/cloud-final.service     2020-12-01 12:59:02.167153009 +0000
@@ -13,7 +13,7 @@
 KillMode=process
 ExecStartPost=/bin/echo "try restart NetworkManager.service"
 # TODO: try-reload-or-restart is available only on systemd >= 229
-ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service
+#ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service
 
 # Output needs to appear in instance console output
 StandardOutput=journal+console

Comment 1 David Rosenfeld 2020-12-01 22:06:00 UTC

Moved to Compute because customer case says rebooting compute node causes /etc/resolv.conf to be rewritten.

Comment 5 ldenny 2021-01-03 22:16:44 UTC

Hi Rabi,

Is this change able to be backported to OSP13? I see this bug is still in ON_DEV status, just wanted to give the customer an update.

Comment 28 Lon Hohberger 2021-06-17 10:31:04 UTC

According to our records, this should be resolved by openstack-tripleo-common-8.7.1-29.el7ost.  This build is available now.

Comment 29 Leander Koornneef 2021-06-23 14:42:25 UTC

In the past few days we have updated two OSP13 environments from Z12 to Z16 and in both cases all the overcloud nodes (controllers, hypervisors) ended up with an empty /etc/resolv.conf after rebooting.
It appears to be the same issue as described by the original reporter of this BZ. Restarting the network service on each node resolves the issue temporarily, at least until the next reboot. 

The openstack-tripleo-common package on the directors is in fact the same version as mentioned above:

(undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-common
openstack-tripleo-common-8.7.1-29.el7ost.noarch

Comment 30 Rabi Mishra 2021-06-23 14:55:09 UTC

> The openstack-tripleo-common package on the directors is in fact the same version as mentioned above:

For minor updates you would need openstack-tripleo-heat-templates-8.4.1-86.el7ost which I don't think shipped with z16.

Comment 31 Leander Koornneef 2021-06-23 15:03:57 UTC

Ah, indeed, we have a different version of that package:

(undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-8.4.1-85.el7ost.noarch

Comment 36 Jad Haj Yahya 2021-07-18 15:20:12 UTC

Run job http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-z14-HA-ipv4/ which performs minor update to latests z stream and also reboots the overcloud

Verified that /etc/resolv.conf was not changed

Comment 40 errata-xmlrpc 2021-08-02 13:36:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13 Bug Fix and Enhancement Advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2978

Comment 44 xiachen 2021-09-02 01:37:42 UTC

*** Bug 1933202 has been marked as a duplicate of this bug. ***

Comment 47 Red Hat Bugzilla 2023-09-15 01:31:27 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.