Bug 2073855 - /boot/loader/entries file names need to match machine-id on firstboot
Summary: /boot/loader/entries file names need to match machine-id on firstboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.0
Assignee: Steve Baker
QA Contact: James Parker
URL:
Whiteboard:
Depends On:
Blocks: 2035325
TreeView+ depends on / blocked
 
Reported: 2022-04-10 22:26 UTC by Steve Baker
Modified: 2022-09-21 12:21 UTC (History)
9 users (show)

Fixed In Version: diskimage-builder-3.20.4-0.20220428174017.555cecb.el8ost openstack-tripleo-image-elements-13.1.3-0.20220510162343.6883abc.el8ost openstack-tripleo-common-15.4.1-0.20220510162343.855dcd5.el8ost rhosp-director-images-17.0-20220614.1.test.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:20:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 837251 0 None MERGED Set machine-id to uninitialized to trigger first boot 2022-06-03 06:14:44 UTC
Red Hat Issue Tracker OSP-14615 0 None None None 2022-04-10 22:29:54 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:21:18 UTC

Description Steve Baker 2022-04-10 22:26:17 UTC
When grub2-mkconfig is called as a day 2 operation (for example, to change the kernelargs) the existing boot values should be taken from the /boot/loader/entries/<machine-id>-<kernel>.conf file. However this doesn't happen because the machine-id is different to when the image was built.

To fix the /boot/loader/entries/<machine-id>-<kernel>.conf filename, a systemd unit needs to be included in the image which:

- Runs after first-boot-complete.target
- Has ConditionFirstBoot=yes
- Renames /boot/loader/entries/ similar to [1]

We know the overcloud image will be the only OS installed to disk, but /boot/loader/entries is a multi-OS mechanism. This is why this bug is filed against tripleo-image-elements instead of diskimage-builder.

[1] https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/centos/pre-install.d/03-reset-bls-entries

Comment 1 Steve Baker 2022-05-08 22:50:05 UTC
Downstream backports are now attached, only the diskimage-builder change is in a build so far.

Comment 4 Steve Baker 2022-06-09 02:34:25 UTC
(In reply to James Parker from comment #3)
> @sbaker can you confirm this is ready for QA? Just tried this
> with the latest puddle and I'm still seeing different machine ids in my
> deployment

I deployed the image from this package[1] to baremetal locally, it did not rename the BLS files. The image should have 'uninitialized' as the contents of /etc/machine-id but instead the file was empty. When I wrote uninitialized to /etc/machine-id and rebooted then it renamed the BLS entries as expected.

The diskimage-builder package[2] has the change[3] which writes uninitialized to /etc/machine-id.

I can't tell without looking at the overcloud-hardened-uefi-full.qcow2 build logs, but is it possible the image was built with a different version of diskimage-builder than the one in the compose? (setting NEEDINFO to Lon)

[1] https://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/17.0-RHEL-9/latest-RHOS-17-RHEL-9/compose/OpenStack/x86_64/os/Packages/rhosp-director-images-uefi-x86_64-17.0-20220603.1.test.el9ost.noarch.rpm
[2] https://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/17.0-RHEL-9/latest-RHOS-17-RHEL-9/compose/OpenStack/x86_64/os/Packages/diskimage-builder-3.21.2-0.20220514080754.2f06cbc.el9ost.noarch.rpm
[3] https://review.opendev.org/c/openstack/diskimage-builder/+/837251/2/diskimage_builder/elements/sysprep/finalise.d/99-clear-machine-id#20

Comment 5 Steve Baker 2022-06-09 02:34:50 UTC
I can't tell without looking at the overcloud-hardened-uefi-full.qcow2 build logs, but is it possible the image was built with a different version of diskimage-builder than the one in the compose? (setting NEEDINFO to Lon)

Comment 6 Steve Baker 2022-06-09 05:19:30 UTC
I'm not sure if this is indicative of what ends up in rhosp-director-images-uefi-x86_64-17.0-20220603.1.test.el9ost.noarch.rpm, but this image build log[1] shows the expected diskimage-builder version. The logging level isn't high enough to show exactly what 99-clear-machine-id is running.

[1] https://download-node-02.eng.bos.redhat.com/brewroot/packages/overcloud-hardened-uefi-full-x86_64/17.0/20220603.1.test/data/logs/image/oz-indirection.log

Comment 10 Steve Baker 2022-06-20 21:59:14 UTC
The image build packaging fix is in, so I'm moving this back to MODIFIED

Comment 15 Steve Baker 2022-06-29 04:23:44 UTC
I checked the image in this package[1] and it looks correct. I deployed it and the BLS files got renamed (see the commands run below)

Some options for what is happening for you:
- sourcing an image other than from rhosp-director-images-uefi-x86_64-17.0-20220623.1.test.el9ost.noarch.rpm
- your deployment tooling is modifying the image with guestfish, and truncating /etc/machine-id

Can you provide deployment logs which show neither of these are the case? Also running the following will show if the rename script is actually run and what it does, if it doesn't run at all then the image has been deployed with a modified /etc/machine-id:

cat /var/log/messages |grep reset-bls-entries

[root@localhost ~]# cat /etc/machine-id 
a5112b2cf42e4fe4a088e758c65d2acc
[root@localhost ~]# ls /boot/loader/entries/
a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf  a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf

[root@localhost log]# cat /var/log/messages |grep reset-bls-entries
Jun 29 00:09:24 localhost reset-bls-entries[880]: + pushd /boot/loader/entries
Jun 29 00:09:24 localhost reset-bls-entries[880]: /boot/loader/entries /
Jun 29 00:09:24 localhost reset-bls-entries[880]: + machine_id=a5112b2cf42e4fe4a088e758c65d2acc
Jun 29 00:09:24 localhost reset-bls-entries[880]: + for entry in *.conf
Jun 29 00:09:24 localhost reset-bls-entries[883]: ++ echo 1db7e42fdfe541e4b8e84a89422a4c5b-0-rescue.conf
Jun 29 00:09:24 localhost reset-bls-entries[884]: ++ sed 's/^[a-f0-9]*/a5112b2cf42e4fe4a088e758c65d2acc/'
Jun 29 00:09:24 localhost reset-bls-entries[880]: + new_entry=a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf
Jun 29 00:09:24 localhost reset-bls-entries[880]: + [[ 1db7e42fdfe541e4b8e84a89422a4c5b-0-rescue.conf != a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf ]]
Jun 29 00:09:24 localhost reset-bls-entries[880]: + echo 'renaming 1db7e42fdfe541e4b8e84a89422a4c5b-0-rescue.conf to a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf for new machine-id'
Jun 29 00:09:24 localhost reset-bls-entries[880]: renaming 1db7e42fdfe541e4b8e84a89422a4c5b-0-rescue.conf to a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf for new machine-id
Jun 29 00:09:24 localhost reset-bls-entries[880]: + mv 1db7e42fdfe541e4b8e84a89422a4c5b-0-rescue.conf a5112b2cf42e4fe4a088e758c65d2acc-0-rescue.conf
Jun 29 00:09:24 localhost reset-bls-entries[880]: + for entry in *.conf
Jun 29 00:09:24 localhost reset-bls-entries[887]: ++ echo 1db7e42fdfe541e4b8e84a89422a4c5b-5.14.0-70.13.1.el9_0.x86_64.conf
Jun 29 00:09:24 localhost reset-bls-entries[888]: ++ sed 's/^[a-f0-9]*/a5112b2cf42e4fe4a088e758c65d2acc/'
Jun 29 00:09:24 localhost reset-bls-entries[880]: + new_entry=a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf
Jun 29 00:09:24 localhost reset-bls-entries[880]: + [[ 1db7e42fdfe541e4b8e84a89422a4c5b-5.14.0-70.13.1.el9_0.x86_64.conf != a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf ]]
Jun 29 00:09:24 localhost reset-bls-entries[880]: + echo 'renaming 1db7e42fdfe541e4b8e84a89422a4c5b-5.14.0-70.13.1.el9_0.x86_64.conf to a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf for new machine-id'
Jun 29 00:09:24 localhost reset-bls-entries[880]: renaming 1db7e42fdfe541e4b8e84a89422a4c5b-5.14.0-70.13.1.el9_0.x86_64.conf to a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf for new machine-id
Jun 29 00:09:24 localhost reset-bls-entries[880]: + mv 1db7e42fdfe541e4b8e84a89422a4c5b-5.14.0-70.13.1.el9_0.x86_64.conf a5112b2cf42e4fe4a088e758c65d2acc-5.14.0-70.13.1.el9_0.x86_64.conf
Jun 29 00:09:24 localhost reset-bls-entries[880]: + popd
Jun 29 00:09:24 localhost reset-bls-entries[880]: /
Jun 29 00:09:25 localhost kernel: audit: type=1130 audit(1656475765.203:135): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=reset-bls-entries comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'


[1] https://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/17.0-RHEL-9/RHOS-17.0-RHEL-9-20220623.n.1/compose/OpenStack/x86_64/os/Packages/rhosp-director-images-uefi-x86_64-17.0-20220623.1.test.el9ost.noarch.rpm

Comment 16 Steve Baker 2022-06-29 04:47:39 UTC
I found a couple of machine-id resets in infrared and posted a change https://review.gerrithub.io/c/redhat-openstack/infrared/+/540613

Comment 34 errata-xmlrpc 2022-09-21 12:20:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.