Bug 2305981

Summary: OSP16.2 to OSP17.1 upgrade breaks GRUB and makes it try to boot RHEL7
Product: Red Hat OpenStack Reporter: Kenny Tordeurs <ktordeur>
Component: openstack-tripleo-heat-templatesAssignee: Juan Badia Payno <jbadiapa>
Status: CLOSED ERRATA QA Contact: Archana Singh <arcsingh>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17.1 (Wallaby)CC: hjensas, jbadiapa, jjoyce, jpretori, kgilliga, mburns, mflusche, pgrist, pweeks, sbaker, tvignaud
Target Milestone: z4Keywords: Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130753.el9ost openstack-tripleo-heat-templates-14.3.1-17.1.20240919123750.el8ost Doc Type: Known Issue
Doc Text:
When you upgrade from RHOSP 16.2 to 17.1, during the system upgrade, a known issue causes GRUB to contain RHEL 7 entries instead of RHEL 8 entries. As a result, the hosts cannot reboot. This issue affects environments that previously ran RHOSP 13.0 or earlier. + *Workaround:* See the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/7096899[Openstack 16 to 17 FFU - During LEAPP upgrade UEFI systems do not boot due to invalid /boot/grub2/grub.cfg].
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-21 09:30:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kenny Tordeurs 2024-08-20 11:33:26 UTC
Description of problem:
During the LEAPP upgrade phase of our servers LEAPP breaks the grub, we have this issue on a wide range of hosts `openstack overcloud upgrade run --yes --stack openstack07 --tags system_upgrade --limit openstackcontroller`

Your grub will suddenly only contain RHEL7 entries instead of the correct RHEL8 + upgrade entries.
I can confirm that before the upgrade there were no issues with grub, it contained RHEL8 entries and you could reboot the hosts just fine.

We manually booted the nodes to make sure the upgrade could continue from the grub shell (very labor intensive).
But now even post upgrade, you can't reboot the hosts, you always end up in the rhel7 grub menu, which most probably is a leftover from the time this cluster was running OSP13 years back which LEAPP for some weird reason reinstated.


Version-Release number of selected component (if applicable):
17.1

How reproducible:
/

Steps to Reproduce:
/

Actual results:
grub corrupt

Expected results:
no issues with grub

Additional info:
We hit this rhel7 boot issue on almost all nodes, controller, ceph and computes.

I actually just managed to find/fix the issue on the ceph and computes.
It appeared that we were hitting the following known bug: https://access.redhat.com/solutions/7034430
/boot/grub2/grubenv was a file and not a symlink to /boot/efi/EFI/redhat/grubenv, after creating this symlink and a grub2-mkconfig they now all boot fine by themselves.

But our 3 controllers are a different case, they had already a correct /boot/grub2/grubenv symlink in place.
Unfortunately we don't have a screenshot of the successful booting before the upgrade, but once it failed for our controller1, i explicitly rebooted controller2 before running leapp and it just booted fine and showed only rhel8.4 related grub entries.

For example when doing a grep on those controller nodes:
[root@openstackcontoller ~]# grep -ri menuentry /boot/
grep: /boot/grub2/i386-pc/gfxterm_menu.mod: binary file matches
grep: /boot/grub2/i386-pc/normal.mod: binary file matches
/boot/grub2/i386-pc/command.lst:*menuentry: normal
grep: /boot/grub2/i386-pc/syslinuxcfg.mod: binary file matches
grep: /boot/grub2/i386-pc/legacycfg.mod: binary file matches
/boot/grub2/grub.cfg:if [ x"${feature_menuentry_id}" = xy ]; then
/boot/grub2/grub.cfg:  menuentry_id_option="--id"
/boot/grub2/grub.cfg:  menuentry_id_option=""
/boot/grub2/grub.cfg:export menuentry_id_option
/boot/grub2/grub.cfg:menuentry 'Red Hat Enterprise Linux Server 7.9 Rescue d8a79cee73f84c04aa6da7a494db5c92 (3.10.0-1160.11.1.el7.x86_64)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-1160.6.1.el7.x86_64-advanced-4b91c3b4-480b-48a0-94f2-a0c4f19923c2' {
/boot/grub2/grub.cfg:menuentry 'Red Hat Enterprise Linux Server (3.10.0-1160.11.1.el7.x86_64) 7.9 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-1160.6.1.el7.x86_64-advanced-4b91c3b4-480b-48a0-94f2-a0c4f19923c2' {
/boot/grub2/grub.cfg:menuentry 'Red Hat Enterprise Linux Server (3.10.0-1160.6.1.el7.x86_64) 7.9 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-1160.6.1.el7.x86_64-advanced-4b91c3b4-480b-48a0-94f2-a0c4f19923c2' {
/boot/grub2/grub.cfg:menuentry 'Red Hat Enterprise Linux Server (0-rescue-ba23abcd5d1f469f9a5fd4e16664c6f4) 7.9 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-ba23abcd5d1f469f9a5fd4e16664c6f4-advanced-4b91c3b4-480b-48a0-94f2-a0c4f19923c2' {
/boot/efi/EFI/redhat/grub.cfg:if [ x"${feature_menuentry_id}" = xy ]; then
/boot/efi/EFI/redhat/grub.cfg:  menuentry_id_option="--id"
/boot/efi/EFI/redhat/grub.cfg:  menuentry_id_option=""
/boot/efi/EFI/redhat/grub.cfg:export menuentry_id_option
/boot/efi/EFI/redhat/grub.cfg:  menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {

As you can see both files are drastically different.

When i compare it to for example a compute in the same cluster both files actually contain the same enties and not those static ones.
[root@openstackcompute ~]# grep -ri menuentry /boot/
grep: /boot/efi/EFI/redhat/grubx64.efi: binary file matches
/boot/efi/EFI/redhat/grub.cfg:if [ x"${feature_menuentry_id}" = xy ]; then
/boot/efi/EFI/redhat/grub.cfg:  menuentry_id_option="--id"
/boot/efi/EFI/redhat/grub.cfg:  menuentry_id_option=""
/boot/efi/EFI/redhat/grub.cfg:export menuentry_id_option
/boot/efi/EFI/redhat/grub.cfg:  menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
/boot/efi/EFI/BOOT/grub.cfg:if [ x"${feature_menuentry_id}" = xy ]; then
/boot/efi/EFI/BOOT/grub.cfg:  menuentry_id_option="--id"
/boot/efi/EFI/BOOT/grub.cfg:  menuentry_id_option=""
/boot/efi/EFI/BOOT/grub.cfg:export menuentry_id_option
/boot/efi/EFI/BOOT/grub.cfg:menuentry 'System setup' $menuentry_id_option 'uefi-firmware' {
/boot/grub2/grub.cfg:if [ x"${feature_menuentry_id}" = xy ]; then
/boot/grub2/grub.cfg:  menuentry_id_option="--id"
/boot/grub2/grub.cfg:  menuentry_id_option=""
/boot/grub2/grub.cfg:export menuentry_id_option
/boot/grub2/grub.cfg:menuentry 'System setup' $menuentry_id_option 'uefi-firmware' {

Comment 35 errata-xmlrpc 2024-11-21 09:30:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 36 Steve Baker 2024-11-25 21:19:07 UTC
*** Bug 2327390 has been marked as a duplicate of this bug. ***

Comment 38 Red Hat Bugzilla 2025-04-03 04:25:15 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days