Bug 2179017

Summary: cloud-init: backport fix for instance id detection on Azure to RHEL9 [rhel-9.3]
Product: Red Hat Enterprise Linux 9 Reporter: Chris Patterson <cpatterson>
Component: cloud-initAssignee: Ani Sinha <anisinha>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: andavis, anisinha, bdas, eesposit, eterrell, huzhao, jgreguske, litian, mrezanin, xiachen, xiliang, xuli, yacao, yuxisun
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cloud-init-23.1.1-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:28:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Patterson 2023-03-16 12:59:08 UTC
Description of problem:
cloud-init may re-configure VM after falling back to non-Azure datasource on Azure.

Some error cases will cause the instance ID to fall back to a default value of "iid-datasource".  This will cause cloud-init to reconfigure the VM thinking it is a new instance.

Version-Release number of selected component (if applicable):
All cloud-init versions < 22.4

How reproducible:
The customer cases we have seen have been due to DHCP failures for various reasons (either due to image misconfiguration or platform issues).

Steps to Reproduce:
1. create RHEL VM on Azure
2. reboot # clean reboot for demonstration purposes
2. mv /usr/sbin/dhclient /usr/sbin/dhclient.x
2. cp /bin/false /usr/sbin/dhclient
3. reboot
4. grep config-users /var/log/cloud-init.log

We should see something like:

<First boot where provisioning runs config-users-groups>
2023-02-24 16:46:08,014 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
2023-02-24 16:46:08,017 - helpers.py[DEBUG]: Running config-users-groups using lock (<FileLock using file '/var/lib/cloud/instances/05a707cf-665b-b547-8fd4-a88de9aa6215/sem/config_users_groups'>)
2023-02-24 16:46:09,851 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups ran successfully

<Normal subsequent boot where config-users-groups does not run>
2023-03-15 15:14:03,278 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
2023-03-15 15:14:03,279 - helpers.py[DEBUG]: config-users-groups already ran (freq=once-per-instance)
2023-03-15 15:14:03,279 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups previously ran

<Boot with /bin/false for dhclient, config-users unexpectedly re-runs>
2023-03-15 17:30:49,484 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
2023-03-15 17:30:49,487 - helpers.py[DEBUG]: Running config-users-groups using lock (<FileLock using file '/var/lib/cloud/instances/iid-datasource/sem/config_users_groups'>)
2023-03-15 17:30:49,491 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups ran successfully

The upstream patch that should address this failure:
https://github.com/canonical/cloud-init/commit/b861ea8a5e1fd0eb33096f60f54eeff42d80d3bd

Comment 1 Ani Sinha 2023-03-30 10:04:19 UTC
@huzhao will this be fixed by the cloud-init rebase https://bugzilla.redhat.com/show_bug.cgi?id=2172811 ?

Comment 2 Ani Sinha 2023-03-30 10:09:57 UTC
Seems https://github.com/canonical/cloud-init/commit/b861ea8a5e1fd0eb33096f60f54eeff42d80d3bd is present in cloud-init 23.1.1

Comment 3 Huijuan Zhao 2023-03-30 22:37:32 UTC
(In reply to Ani Sinha from comment #1)
> @huzhao will this be fixed by the cloud-init rebase
> https://bugzilla.redhat.com/show_bug.cgi?id=2172811 ?

Yes, I think so. 
Please refer to the https://bugzilla.redhat.com/show_bug.cgi?id=2178793#c14 for detail test results and discussion.

Comment 4 Ani Sinha 2023-04-28 08:20:22 UTC
@huzhao Please feel free to close this bug as the rebase is complete.

Comment 5 Huijuan Zhao 2023-04-28 14:14:11 UTC
(In reply to Ani Sinha from comment #4)
> @huzhao Please feel free to close this bug as the rebase is
> complete.

Thanks Ani. Should we add this BZ to errata or close it directly?

Comment 6 Ani Sinha 2023-04-28 15:04:57 UTC
(In reply to Huijuan Zhao from comment #5)
> (In reply to Ani Sinha from comment #4)
> > @huzhao Please feel free to close this bug as the rebase is
> > complete.
> 
> Thanks Ani. Should we add this BZ to errata or close it directly?

I guess add it to errata

Comment 13 Huijuan Zhao 2023-05-24 09:57:20 UTC
The issue is fixed in cloud-init-23.1.1-4.el9, test results are same as https://bugzilla.redhat.com/show_bug.cgi?id=2178793#c3

Moving to VERIFIED.

Comment 15 errata-xmlrpc 2023-11-07 08:28:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: cloud-init security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6371