Bug 2220957

Summary: Completing EFI base system build is impossible the system always tries to boot from EFI Network after final reboot
Product: Red Hat Satellite Reporter: Sayan Das <saydas>
Component: Compute Resources - VMWareAssignee: Shimon Shtein <sshtein>
Status: CLOSED MIGRATED QA Contact: Satellite QE Team <sat-qe-bz-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.12.3CC: ahumbe, chrobert, gtalreja, lstejska, mhulan, nalfassi, nshaik, rlavi, sshtein
Target Milestone: UnspecifiedKeywords: MigratedToJIRA, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-06 16:23:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Boot order settings being reconfigured in VMware console none

Description Sayan Das 2023-07-06 16:47:05 UTC
Description of problem:

After submitting a UEFI-based PXE build in VMware vCenter 7 environment,

* System will boot up in the network
* Complete the PXE boot and system build
* Proceed with a final reboot
* Then it will boot up again and will boot from EFI Network
* Whether I select "Chainload Grub2 EFI from ESP" or "Chainload into BIOS bootloader on first disk" or "Chainload into BIOS bootloader on second disk", none of them can boot me in the OS.

* "Chainload Grub2 EFI from ESP" will leave in online for 2 mins and then simply halt the system. 


I don't know what it is but I feel like the issue started since https://bugzilla.redhat.com/show_bug.cgi?id=2112436 and https://github.com/theforeman/foreman/pull/9123 .

It may not be an issue with satellite\foreman but I think it's more of an issue with VMware 7 somehow.


Version-Release number of selected component (if applicable):

Red Hat Satellite 6.11\6.12\6.13
VMware vSphere 7


How reproducible:

100%

Steps to Reproduce:
1. Setup a Satellite 6.11\6.12\6.13 for PXE Booting and building RHEL 7.9 or 9.1 clients

2. Create a VMware Compute Resource and Profile ( set it up for UEFI ).

3. create a host entry using the same along with a host parameter 
   Name: efi_bootentry 
   Type: String
   Value: Red Hat Enterprise Linux

  And submit the host for build

4. Monitor the progress


Actual results:

* System will boot up in the network
* Complete the PXE boot and system build
* Console will show that efibootmgr is reconfiguring the boot order.
* Proceed with a final reboot
* Then it will boot up again and will boot from EFI Network
* Whether I select "Chainload Grub2 EFI from ESP" or "Chainload into BIOS bootloader on first disk" or "Chainload into BIOS bootloader on second disk", none of them can boot me in the OS.

* "Chainload Grub2 EFI from ESP" will leave in online for 2 mins and then simply halt the system. 


Even if I don't use the efi_bootentry host param still the situation remains the same.



Expected results:

No such issues and after reboot,

* That VM should honor the boot priority set by UEFI boot Manager ( efibootmgr ) and boot from the correct device

* Even if it boots from Network, It should be able to chainload the OS from HDD correctly. 


Additional info:

It's like UEFI Boot Manager settings are completely ignored by BIOS and it's honoring the very initial Boot 

When we deploy the system using compute profile, there the default boot order is Network,HDD. And it seems The VM always honors that even if we modify the boot order to be HDD,Network via efibootmgr 

Workaround:
1. Go to BIOS of the system after the final reboot and disable the network device in boot order.

2. As ^^ would not be acceptable by anyone, So one hack to fix this issue would be to set HDD,Network as the boot device priority in Compute profile and then use it to build the VM.


More details about the investigation will be posted below.

Comment 1 Sayan Das 2023-07-06 16:47:59 UTC
Created attachment 1974328 [details]
Boot order settings being reconfigured in VMware console

Comment 2 Sayan Das 2023-07-06 16:58:47 UTC
If we see the attachment in Comment 1, we will see something like this 

This is the default boot order detected:

BootOrder: 0003,0000,0001,0002
BootCurrent: 0001
Boot0000* EFI Virtual disk (0.0)
Boot0001* EFI Network
Boot0002* EFI Internal Shell (Unsupported option)
Boot0003* Red Hat Enterprise Linux


So, here we don't even need to change anything as 0003 is the RHEL drive and the expectations would be that , after reboot, the system will boot from 0003 only But it does not happen ( assuming i haven't used the efi_bootentry host param ). It always boots from EFI Network i.e. 0001

Now, since i had used efi_bootentry , you would see the following code is in play i.e. https://github.com/theforeman/foreman/blob/develop/app/views/unattended/provisioning_templates/snippet/efibootmgr_netboot.erb#L31C1-L34C33

0003 was detected as the required boot entry

Existing boot order was 0003,0000,0001,0002

We simply appended 0003 in front of "0003,0000,0001,0002" , and hence the new boot order becomes "0003,0003,0000,0001,0002"

The duplication happened as we forgot to remove the value of $id from $current before doing "efibootmgr -o ${id},${current}" 

But anyways, the situation remains same. 

One may think that, Having ""0003,0003" is the issue here i.e. duplicates. So, 

* I forced booted the system with HDD
* Manually reconfigured the boot order i.e. efibootmgr -o 0003,0000,0001,0002
* Forced the next boot to be 0003 i.e. efibootmgr -n 0003
* rebooted.

The same situation i.e. the VM still boots from Network instead of HDD

I repeated the same with 0000 and the result remains same.

I thought in BIOS we may still have a bad boot order wharas that also shows the right boot order.  But still, BIOS seems to be honoring that boot order only which was set while creating the Host\VM and completely ignoring the settings from UEFI Boot Manager. 

I tested this with RHEl 7.9 8.6 and 9.1 , in the Vmware 7 infra ( without Secureboot enabled ) and the result remains same.

The only way to workaround is to disable the EFI Network device in BIOS or else select HDD,Network as the BD prior before submitting the host\VM for build.

Comment 3 Sayan Das 2023-07-06 18:12:42 UTC
I tested on a libvirt infra [ Sat 6.13, RHEL 8.8 client VM ( Titanocore edk2 UEFI ) ]

Almost everything remains the same but here in libvirt:

A) I don't need any manual hack as even if it boots from PXE\EFI Network, It gets the PXE Menu and then successfully boot with "Chainload Grub2 EFI from ESP" i.e. chainloads and boot me into OS

   So that's a very good news unlike VMware issue.

B) Even if I do this on the VM:

# efibootmgr 
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0002,0003,0004,0005,0007,0001,0000,0006
Boot0000* UiApp
Boot0001* UEFI Misc Device
Boot0002* UEFI PXEv4 (MAC:5254003C3993)
Boot0003* UEFI PXEv6 (MAC:5254003C3993)
Boot0004* UEFI HTTPv4 (MAC:5254003C3993)
Boot0005* UEFI HTTPv6 (MAC:5254003C3993)
Boot0006* EFI Internal Shell
Boot0007* Red Hat Enterprise Linux


# efibootmgr -o 0007,0002,0003,0004,0005,0001,0000,0006
BootCurrent: 0002
Timeout: 0 seconds
BootOrder: 0007,0002,0003,0004,0005,0001,0000,0006
Boot0000* UiApp
Boot0001* UEFI Misc Device
Boot0002* UEFI PXEv4 (MAC:5254003C3993)
Boot0003* UEFI PXEv6 (MAC:5254003C3993)
Boot0004* UEFI HTTPv4 (MAC:5254003C3993)
Boot0005* UEFI HTTPv6 (MAC:5254003C3993)
Boot0006* EFI Internal Shell
Boot0007* Red Hat Enterprise Linux

It would not change the boot order of devices in BIOS. 

Even if you manually change the Boot Order, it will be just persistent on the runtime but then again will revert back. 

The only way I could change the boot order was by powering off the VM, Open it from libvirt console, go to Boot Options and then fix the Boot Order there. 


So perhaps the bigger question here would be why on VMware "Chainload Grub2 EFI from ESP" fails to find any bootable device to chainload from whereas no such issues exist in case of libvirt. 

And the bigger question is , what in the world "efibootmgr" does if it cannot even help with boot device selection\set boot device priority that BIOS can honor ?

Comment 4 Sayan Das 2023-07-07 11:39:25 UTC
Some more information is there in https://bugzilla.redhat.com/show_bug.cgi?id=2058037 ( which explains why VMware fails to do the chainloading )

https://github.com/theforeman/foreman/pull/9175/files -> This has added "connectefi scsi" but it is commented out i.e. no effect by default.

So just to test if it will work or not, I uncommented "connectefi scsi" line in "/var/lib/tftpboot/grub2/grub.cfg-01-00-50-56-b4-34-19" which was already deployed and simply booted up the VM and it works. 

i.e. even if the VM boots up from network it correctly chainloads the system from HDD and boots into OS.

For any new system build to work, 

* We need to clone pxegrub2_chainload by name "pxegrub2_chainload vmware" and uncomment the "connectefi scsi" line and save it with correct Org and Location.

* We need to clone "PXEGrub2 default local boot" to "PXEGrub2 default local boot vmware" and replace "pxegrub2_chainload" with "pxegrub2_chainload vmware" and save it with correct Org, Location and OS.

* In Administer --> Settings --> Provisioning tab, we need to set "PXEGrub2 default local boot vmware" as the value of "Local boot PXEGrub2 template"

And then a new system build on VMware will work without any issues in Vmware 7 only.

But, even if it's a one time work, it's still a lot of work for one small tweak. 


Here in https://github.com/theforeman/foreman/blob/df3a0b0d970b229887e2d371568d16c8143c5aae/app/views/unattended/provisioning_templates/snippet/pxegrub2_chainload.erb#L46C2-L46C17 

I propose we change

#connectefi scsi

to

<% if host_param_true?('vmware') -%>
connectefi scsi
<% end -%>


And then, We can keep a note in provisioning guide that, Uses's trying Vm build on VMware, should add a "vmware" host parameter of type boolean and value "true"

Comment 5 Eric Helms 2024-05-16 16:17:38 UTC
Upstream bug assigned to sshtein

Comment 6 Eric Helms 2024-05-16 16:17:42 UTC
Upstream bug assigned to sshtein

Comment 7 Shimon Shtein 2024-05-21 14:29:08 UTC
Since https://github.com/theforeman/foreman/commit/7a77b3ce51a2c446c1389d7b742eb0749d821ef2 `connectefi` is not commented anymore.
It should work by default.

Comment 8 Eric Helms 2024-06-06 16:23:52 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "SAT-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.