Bug 1753157

Summary: grub2 blscfg menu order can become random!
Product: Red Hat Enterprise Linux 8 Reporter: Warren Togami <wtogami>
Component: grub2Assignee: Bootloader engineering team <bootloader-eng-team>
Status: CLOSED WONTFIX QA Contact: Release Test Team <release-test-team-automation>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: extras-qa, fmartine, javierm, lkundrak, ngompa13, pablo, pjones
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1753154 Environment:
Last Closed: 2021-03-18 07:31:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Warren Togami 2019-09-18 09:39:06 UTC
Javier says much the same issue as below may need to be fixed in RHEL8.

+++ This bug was initially created as a clone of Bug #1753154 +++

Description of problem:
If you installed your system with Anaconda then /boot/loader/entries/*.conf is named to match /etc/machine-id.

But if you are booting a raw image, kernel install happened during image creation with a temporary random /etc/machine-id UUID, but then the file is blanked. During initial fresh boot from that image it generates a new random UUID for /etc/machine-id which no longer matches the /boot/loader/entries/*.conf filename.

It seems to boot just fine, but upon installation of a new kernel (if /etc/sysconfig/kernel is missing UPDATEDEFAULT=yes) it fails to explicitly write grubenv's saved_entry=<new kernel's full BLS name>.

The consequence of that is headless cloud or embedded boards randomly will randomly reboot into either the new or old kernel. This is because it defaults to zeroth menu entry while the /boot/loader/entries are ordered by blscfg with rpmvercmp().

https://fedorapeople.org/~wtogami/rpmvercmp3.py
$ ./rpmvercmp3.py 3a0ec5d722d8490895ed0715bcf68280 61dcccd9652d4a02b08ae324222cb5d4
61dcccd9652d4a02b08ae324222cb5d4 is newer than 3a0ec5d722d8490895ed0715bcf68280

blscfg is comparing two random UUID's exactly as intended.

Another consequence is removal of the original kernel does not delete the BLS entry file because the name does not match the current machine-id.

Version-Release number of selected component (if applicable):
grub2-efi-aa64-2.02-97.fc31.aarch64
appliance-tools-009.0-7.fc31.noarch
systemd-udev-243-1.fc31.aarch64

Mitigation:
Image creators like appliance-tools and imagefactory should probably write out /etc/sysconfig/kernel. Only after kernel-install is run again does reboot behavior become closer to user expectations. But this only bypasses the random menu ordering, it still needs to be fixed.

Possible Fixes:
* Stop including the machine-id in the /boot/loader/entries/*.conf filenames.
* Images could include a one-time script that runs during initial boot. After /etc/machine-id is written the filename in /boot/loader/entries/ can be renamed to match.

RHEL8 also needs to be fixed. BLS Cloud boot can behave in unexpected ways as headless machines can't show the boot menu to the user. This can be very confusing as reboot and grub2-reboot do not do what you expect. It could also prevent a system from rebooting into a new kernel containing a security patch.

Comment 3 RHEL Program Management 2021-03-18 07:31:29 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.