Bug 1903146 - After minor upgrade cloud-init started to erase /etc/resolv.conf contents on overcloud nodes after reboot
Summary: After minor upgrade cloud-init started to erase /etc/resolv.conf contents on ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: All
OS: All
high
high
Target Milestone: async
: 13.0 (Queens)
Assignee: Rabi Mishra
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 1933202 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-01 13:23 UTC by Alex Stupnikov
Modified: 2024-03-25 17:19 UTC (History)
19 users (show)

Fixed In Version: openstack-tripleo-common-8.7.1-28.el7ost openstack-tripleo-heat-templates-8.4.1-86.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-02 13:36:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 775555 0 None MERGED Stop NetworkManager from updating resolv.conf 2021-04-08 09:57:11 UTC
OpenStack gerrit 776925 0 None MERGED [queens] Don't use service_facts 2021-04-08 09:56:25 UTC
Red Hat Issue Tracker OSP-1384 0 None None None 2022-08-30 14:49:13 UTC
Red Hat Knowledge Base (Solution) 894753 0 None None None 2020-12-04 10:34:26 UTC
Red Hat Knowledge Base (Solution) 5837741 0 None None None 2021-09-02 01:37:41 UTC
Red Hat Product Errata RHBA-2021:2978 0 None None None 2021-08-02 13:36:24 UTC

Description Alex Stupnikov 2020-12-01 13:23:06 UTC
Description of problem:

I am not sure how to properly define affected component: the closest one is os-net-config, but it has nothing to do with reported problem. So I selected openstack-tripleo and hope that it will be re-routed properly.

A fix for bug #1748015 set dependency between NetworkManager.service and cloud-final.service to address the problem with OpenStack guest instances when cloud-init configuration in /etc/resolv.conf was overwritten by NetworkManager.

Unfortunately, this change didn't work well for RHOSP overcloud nodes: cloud-init now runs after network.service and nullifies its configuration in /etc/resolv.conf after reboot.

I am not completely sure how to address this issue: cloud-init people could  say that we are using custom procedure to configure overcloud nodes and we should change it, so I would like to ask for a second look from developers.

I also found the workaround [1] (basically, it reverts the fix proposed by cloud-init people). Could you tell me if it is correct approach here?

We reproduced this issue for RHOSP 13, but I am not sure if RHOSP 16.1 is affected: same cloud-init version is used there.

[1]
--- cloud-final.service 2020-12-01 13:21:16.828434342 +0000
+++ /etc/systemd/system/cloud-init.target.wants/cloud-final.service     2020-12-01 12:59:02.167153009 +0000
@@ -13,7 +13,7 @@
 KillMode=process
 ExecStartPost=/bin/echo "try restart NetworkManager.service"
 # TODO: try-reload-or-restart is available only on systemd >= 229
-ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service
+#ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service
 
 # Output needs to appear in instance console output
 StandardOutput=journal+console

Comment 1 David Rosenfeld 2020-12-01 22:06:00 UTC
Moved to Compute because customer case says rebooting compute node causes /etc/resolv.conf to be rewritten.

Comment 5 ldenny 2021-01-03 22:16:44 UTC
Hi Rabi,

Is this change able to be backported to OSP13? I see this bug is still in ON_DEV status, just wanted to give the customer an update.

Comment 28 Lon Hohberger 2021-06-17 10:31:04 UTC
According to our records, this should be resolved by openstack-tripleo-common-8.7.1-29.el7ost.  This build is available now.

Comment 29 Leander Koornneef 2021-06-23 14:42:25 UTC
In the past few days we have updated two OSP13 environments from Z12 to Z16 and in both cases all the overcloud nodes (controllers, hypervisors) ended up with an empty /etc/resolv.conf after rebooting.
It appears to be the same issue as described by the original reporter of this BZ. Restarting the network service on each node resolves the issue temporarily, at least until the next reboot. 

The openstack-tripleo-common package on the directors is in fact the same version as mentioned above:

(undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-common
openstack-tripleo-common-8.7.1-29.el7ost.noarch

Comment 30 Rabi Mishra 2021-06-23 14:55:09 UTC
> The openstack-tripleo-common package on the directors is in fact the same version as mentioned above:

For minor updates you would need openstack-tripleo-heat-templates-8.4.1-86.el7ost which I don't think shipped with z16.

Comment 31 Leander Koornneef 2021-06-23 15:03:57 UTC
Ah, indeed, we have a different version of that package:

(undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-8.4.1-85.el7ost.noarch

Comment 36 Jad Haj Yahya 2021-07-18 15:20:12 UTC
Run job http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-z14-HA-ipv4/ which performs minor update to latests z stream and also reboots the overcloud

Verified that /etc/resolv.conf was not changed

Comment 40 errata-xmlrpc 2021-08-02 13:36:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13 Bug Fix and Enhancement Advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2978

Comment 44 xiachen 2021-09-02 01:37:42 UTC
*** Bug 1933202 has been marked as a duplicate of this bug. ***

Comment 47 Red Hat Bugzilla 2023-09-15 01:31:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.