Bug 1958464

Summary: generate new '/etc/machine-id' while launching new instance
Product: Red Hat Enterprise Linux 8 Reporter: Frank Liang <xiliang>
Component: cloud-initAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WONTFIX QA Contact: Frank Liang <xiliang>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.5CC: eterrell, huzhao, jgreguske, leiwang, linl, ribarry, vkuznets, xiachen, ymao
Target Milestone: betaKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 11:56:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Liang 2021-05-08 06:13:14 UTC
Description of problem:

When building RHEL AMIs, we have "cat /dev/null > /etc/machine-id" in the ks file to empty machine-id. This step ensures systemd to generate new machine-id when launch instance from our official AMIs.
But if we create an AMI from a running instance, the machine-id is not empty and all instances use the same machine-id if it starts from such a customized AMI.

It is a bad idea to have that situation, regardless of sub-man. Many identifiers are generated from machine-id, so there is a very high risk that your systems will get conflicts.

I prefer to add extra steps into the cloud-init pkg instead of in AMI's cloud-init cfg. Because I guess, it is not aws specific.

Below is my change.
# tail -3 /etc/cloud/cloud.cfg
runcmd:
 - [ sh, -c, 'cat /dev/null > /etc/machine-id' ]
 - [ systemd-machine-id-setup ]

# rpm -q cloud-init
cloud-init-20.3-10.el8.noarch

Version-Release number of selected components (if applicable):

RHEL Version:
RHEL-8.5(4.18.0-305.1.el8.aarch64)

How reproducible:
100%

Steps to Reproduce:
1. start a RHEL-8.5 instance
2. create a new AMI from running 8.5 instance
3. start a new instance from new AMI
4. check "/etc/machine-id"

Actual results:
all instances started from customized image have the same machine-id

Expected results:
machine-id should not be the same when launch new instance

Additional info:
N/A

Comment 1 xiachen 2021-05-10 02:52:03 UTC
It is leave as a wishlist bug on upstream.
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1563951

Comment 2 Eduardo Otubo 2021-08-23 15:00:30 UTC
(In reply to Frank Liang from comment #0)
> Description of problem:
> 
> When building RHEL AMIs, we have "cat /dev/null > /etc/machine-id" in the ks
> file to empty machine-id. This step ensures systemd to generate new
> machine-id when launch instance from our official AMIs.
> But if we create an AMI from a running instance, the machine-id is not empty
> and all instances use the same machine-id if it starts from such a
> customized AMI.
> 
> It is a bad idea to have that situation, regardless of sub-man. Many
> identifiers are generated from machine-id, so there is a very high risk that
> your systems will get conflicts.
> 
> I prefer to add extra steps into the cloud-init pkg instead of in AMI's
> cloud-init cfg. Because I guess, it is not aws specific.
> 
> Below is my change.
> # tail -3 /etc/cloud/cloud.cfg
> runcmd:
>  - [ sh, -c, 'cat /dev/null > /etc/machine-id' ]
>  - [ systemd-machine-id-setup ]

I have only one concern about this solution, do you believe any other service during boot time will need the machine-id? Runcmd module is the last thing run by cloud-init.

Comment 3 Eduardo Otubo 2021-08-27 08:56:12 UTC
(In reply to Eduardo Otubo from comment #2)
> (In reply to Frank Liang from comment #0)
> > Description of problem:
> > 
> > When building RHEL AMIs, we have "cat /dev/null > /etc/machine-id" in the ks
> > file to empty machine-id. This step ensures systemd to generate new
> > machine-id when launch instance from our official AMIs.
> > But if we create an AMI from a running instance, the machine-id is not empty
> > and all instances use the same machine-id if it starts from such a
> > customized AMI.
> > 
> > It is a bad idea to have that situation, regardless of sub-man. Many
> > identifiers are generated from machine-id, so there is a very high risk that
> > your systems will get conflicts.
> > 
> > I prefer to add extra steps into the cloud-init pkg instead of in AMI's
> > cloud-init cfg. Because I guess, it is not aws specific.
> > 
> > Below is my change.
> > # tail -3 /etc/cloud/cloud.cfg
> > runcmd:
> >  - [ sh, -c, 'cat /dev/null > /etc/machine-id' ]
> >  - [ systemd-machine-id-setup ]
> 
> I have only one concern about this solution, do you believe any other
> service during boot time will need the machine-id? Runcmd module is the last
> thing run by cloud-init.

According to the systemd documentation, on the First Boot Semantics section[0], it states:

   "/etc/machine-id is used to decide whether a boot is the first one. The rules are as follows:
       If /etc/machine-id does not exist, this is a first boot. During early boot, systemd will write "uninitialized\n" to this file and overmount a temporary file which contains the actual machine ID. Later (after first-boot-       complete.target has been reached), the real machine ID will be written to disk.
       If /etc/machine-id contains the string "uninitialized", a boot is also considered the first boot. The same mechanism as above applies.
       If /etc/machine-id exists and is empty, a boot is not considered the first boot. systemd will still bind-mount a file containing the actual machine-id over it and later try to commit it to disk (if /etc/ is writable).
       If /etc/machine-id already contains a valid machine-id, this is not a first boot.
   If by any of the above rules, a first boot is detected, units with ConditionFirstBoot=yes will be run."

Which means there *might* be some service that needs to know if that's a first boot or not and therefore we can't generate the machine-id at the last stage of the boot sequence (runcmd module on cloud-init). This being said, I believe the best course of action in this case is to manually remove the the file before snapshoting, and describe this procedure on our documentation base. I'll keep the NEEDINFO on Frank to see his opinion on this matter, if this is an acceptable solution. In a positive case, I believe we can close as WONTFIX but request an update on the docs. Also close bz#1994278.

[0] https://freedesktop.org/software/systemd/man/machine-id.html#First%20Boot%20Semantics

Comment 4 Frank Liang 2021-08-27 09:23:08 UTC
Thanks for your explanation.
My original idea is about handling duplicated machine-id case in smart way in case that users forget to clean it easily.

Before having a perfect solution to decide regenerate machine-id or not, it might be not a bad idea leaving it to customer to decide if remove machine-id when they use their own images.
And I am fine to close it as not a bug.

In addition, we clean it in our images built from ks file or image builder.

Comment 5 Eduardo Otubo 2021-08-30 11:56:32 UTC
I'm closing this bug as WONTFIX (as well as bz#1994278) as per the discussion and explanation regarding how systemd works, and what it's expected of cloud-init service. If you believe this is wrong please re-open this BZ and we can keep discussing and reach a new solution. Thanks!