Red Hat Bugzilla – Bug 1031876
wrong boot order if EFI firmware does not define BootCurrent variable
Last modified: 2013-12-19 00:09:18 EST
Description of problem:
If a system's EFI firmware does not provide the BootCurrent variable, the boot order will end up wrong after installation (and also rhts-reboot will not work properly for the same reason).
Version-Release number of selected component (if applicable):
always, using the IBM x3250 m4 systems attached to our devel environment
Steps to Reproduce:
1. Set the boot order correctly (Netboot first)
2. Provision the system in Beaker
After the installation, boot order is wrong (OS first, then Netboot).
Netboot should remain first in the list.
The BootCurrent variable is supposed to indicate which entry in the boot order was selected for the current boot. The rhts_post snippet uses it to find the entry for netboot, so it can move that to the front of the order. rhts-reboot uses it to find the entry for the OS, so it can set that as BootNext.
From reading the UEFI spec it's not clear to me whether the firmware is required to provide the BootCurrent variable, but certainly the x3250 m4 systems which were recently added to our devel environment do not provide it, so Beaker probably needs to handle this case.
On Gerrit: http://gerrit.beaker-project.org/2519
The approach in this patch is to assume that the installer has added a new boot entry for the OS, which efibootmgr always adds to the front of the boot order. The script removes that from the boot order, preserving the rest of the order, and sets the OS entry as BootNext.
Setting back to ASSIGNED for now since this will also need a patch for rhts-reboot.
Back to POST as Dan updated the patch.
Steps to verify:
1. Find a system with EFI firmware which does not provide the BootCurrent variable (for example IBM x3250 m4).
2. Submit a recipe for it like this:
<task name="/distribution/install" />
<param name="CMDS_TO_RUN" value="test "$REBOOTCOUNT" -eq 0 && rhts-reboot || :"/>
<task name="/distribution/reservesys" />
3. Ensure that the system successfully boots back into the operating system after the rhts-reboot command (does not boot from the network) and finishes the recipe successfully.
testing on beaker-devel(beaker-server-0.14.3-1.git.11.c34034f.el6eng)-->depending on system reliability.
as the steps in comment8:
1. Scenario1:on rhel6:
J:2251: '/distribution/install' still keep running, and fell into the error '>>Start PXE over IPv4.
PXE-E18: Server response timeout.
Boot Failed. Netboot
Boot Failed. Red Hat Enterprise Linux'
all the time.
J:2254: '/distribution/install' completed pass, but when running '/distribution/command', stucked at log:'
Trying to allocate 1172 pages for VMLINUZ
[Linux-EFI, setup=0x10db, size=0x493690]
[Initrd, addr=0x7dc18000, size=0x1ee459c]' for hours.
3. Scenario3:on rhel5: (RHEL5 doesn't support EFI on x86_64/i386, RHEL5 supports EFI for ia64.)
J:2266: stucked at log:'
no config file found on TFTP server in
forcing interactive mode due to config file error(s)
ELILO boot:......................' for hours.
(In reply to wangjing from comment #10)
> 2.Scenario2:on rhel7:
> J:2254: '/distribution/install' completed pass, but when running
> '/distribution/command', stucked at log:'
> Trying to allocate 1172 pages for VMLINUZ
> [Linux-EFI, setup=0x10db, size=0x493690]
> [Initrd, addr=0x7dc18000, size=0x1ee459c]' for hours.
In this case rhts-reboot did not work, because I only updated rhts-test-env for RHEL6, not other distros.
> 3. Scenario3:on rhel5: (RHEL5 doesn't support EFI on x86_64/i386, RHEL5
> supports EFI for ia64.)
> J:2266: stucked at log:'
> no config file found on TFTP server in
> forcing interactive mode due to config file error(s)
> ELILO boot:......................' for hours.
In this case the snippet appears to have worked correctly (it set BootNext to the expected value) but the system then booted off the network anyway. My first guess is that the firmware ignored BootNext because the entry was not present in the BootOrder. In any case it means that this new snippet is a regression for ia64. => FailedQA
(In reply to Dan Callaghan from comment #11)
> My first guess is that the firmware ignored BootNext because the entry was not
> present in the BootOrder.
Seems so. The firmware on that system prints:
BmOrderOptions: Removing un-referenced load option: Boot0000
So maybe instead of completely removing the OS entry in %post we can shuffle it to the end.
(In reply to wangjing from comment #9)
> testing on
> beaker-devel(beaker-server-0.14.3-1.git.11.c34034f.el6eng)-->depending on
> system reliability.
> 1. Scenario1:on rhel6:
> J:2251: '/distribution/install' still keep running, and fell into the error
testing on beaker-devel(beaker-server-0.14.3-1.git.15.c195dfb.el6eng.noarch.rpm)
1. Scenario1:on rhel6:
J:2284: stuck on a page 'system configuration and boot managment'(have attached) for a long time.