Description of problem: I am not sure how to properly define affected component: the closest one is os-net-config, but it has nothing to do with reported problem. So I selected openstack-tripleo and hope that it will be re-routed properly. A fix for bug #1748015 set dependency between NetworkManager.service and cloud-final.service to address the problem with OpenStack guest instances when cloud-init configuration in /etc/resolv.conf was overwritten by NetworkManager. Unfortunately, this change didn't work well for RHOSP overcloud nodes: cloud-init now runs after network.service and nullifies its configuration in /etc/resolv.conf after reboot. I am not completely sure how to address this issue: cloud-init people could say that we are using custom procedure to configure overcloud nodes and we should change it, so I would like to ask for a second look from developers. I also found the workaround [1] (basically, it reverts the fix proposed by cloud-init people). Could you tell me if it is correct approach here? We reproduced this issue for RHOSP 13, but I am not sure if RHOSP 16.1 is affected: same cloud-init version is used there. [1] --- cloud-final.service 2020-12-01 13:21:16.828434342 +0000 +++ /etc/systemd/system/cloud-init.target.wants/cloud-final.service 2020-12-01 12:59:02.167153009 +0000 @@ -13,7 +13,7 @@ KillMode=process ExecStartPost=/bin/echo "try restart NetworkManager.service" # TODO: try-reload-or-restart is available only on systemd >= 229 -ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service +#ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service # Output needs to appear in instance console output StandardOutput=journal+console
Moved to Compute because customer case says rebooting compute node causes /etc/resolv.conf to be rewritten.
Hi Rabi, Is this change able to be backported to OSP13? I see this bug is still in ON_DEV status, just wanted to give the customer an update.
According to our records, this should be resolved by openstack-tripleo-common-8.7.1-29.el7ost. This build is available now.
In the past few days we have updated two OSP13 environments from Z12 to Z16 and in both cases all the overcloud nodes (controllers, hypervisors) ended up with an empty /etc/resolv.conf after rebooting. It appears to be the same issue as described by the original reporter of this BZ. Restarting the network service on each node resolves the issue temporarily, at least until the next reboot. The openstack-tripleo-common package on the directors is in fact the same version as mentioned above: (undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-common openstack-tripleo-common-8.7.1-29.el7ost.noarch
> The openstack-tripleo-common package on the directors is in fact the same version as mentioned above: For minor updates you would need openstack-tripleo-heat-templates-8.4.1-86.el7ost which I don't think shipped with z16.
Ah, indeed, we have a different version of that package: (undercloud) [stack@nlhrl1vim52-dir2 ~]$ rpm -q openstack-tripleo-heat-templates openstack-tripleo-heat-templates-8.4.1-85.el7ost.noarch
Run job http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-z14-HA-ipv4/ which performs minor update to latests z stream and also reboots the overcloud Verified that /etc/resolv.conf was not changed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13 Bug Fix and Enhancement Advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2978
*** Bug 1933202 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days