Bug 1031876

Summary: wrong boot order if EFI firmware does not define BootCurrent variable
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: generalAssignee: Dan Callaghan <dcallagh>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.14CC: aigao, asaha, dcallagh, jingwang, llim, qwan, rmancy, xjia
Target Milestone: 0.14.4   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 05:09:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2013-11-19 03:37:09 UTC
Description of problem:
If a system's EFI firmware does not provide the BootCurrent variable, the boot order will end up wrong after installation (and also rhts-reboot will not work properly for the same reason).

Version-Release number of selected component (if applicable):
0.14

How reproducible:
always, using the IBM x3250 m4 systems attached to our devel environment

Steps to Reproduce:
1. Set the boot order correctly (Netboot first)
2. Provision the system in Beaker

Actual results:
After the installation, boot order is wrong (OS first, then Netboot).

Expected results:
Netboot should remain first in the list.

Additional info:
The BootCurrent variable is supposed to indicate which entry in the boot order was selected for the current boot. The rhts_post snippet uses it to find the entry for netboot, so it can move that to the front of the order. rhts-reboot uses it to find the entry for the OS, so it can set that as BootNext.

From reading the UEFI spec it's not clear to me whether the firmware is required to provide the BootCurrent variable, but certainly the x3250 m4 systems which were recently added to our devel environment do not provide it, so Beaker probably needs to handle this case.

Comment 4 Dan Callaghan 2013-11-21 05:41:44 UTC
On Gerrit: http://gerrit.beaker-project.org/2519

The approach in this patch is to assume that the installer has added a new boot entry for the OS, which efibootmgr always adds to the front of the boot order. The script removes that from the boot order, preserving the rest of the order, and sets the OS entry as BootNext.

Comment 5 Dan Callaghan 2013-11-22 06:15:30 UTC
Setting back to ASSIGNED for now since this will also need a patch for rhts-reboot.

Comment 6 Nick Coghlan 2013-11-25 07:41:13 UTC
Back to POST as Dan updated the patch.

Comment 8 Dan Callaghan 2013-11-26 06:31:28 UTC
Steps to verify:

1. Find a system with EFI firmware which does not provide the BootCurrent variable (for example IBM x3250 m4).

2. Submit a recipe for it like this:

<task name="/distribution/install" />
<task name="/distribution/command">
  <params>
    <param name="CMDS_TO_RUN" value="test &quot;$REBOOTCOUNT&quot; -eq 0 &amp;&amp; rhts-reboot || :"/>
  </params>
</task>
<task name="/distribution/reservesys" />

3. Ensure that the system successfully boots back into the operating system after the rhts-reboot command (does not boot from the network) and finishes the recipe successfully.

Comment 9 wangjing 2013-11-27 07:31:30 UTC
testing on beaker-devel(beaker-server-0.14.3-1.git.11.c34034f.el6eng)-->depending on system reliability.

as the steps in comment8:

1. Scenario1:on rhel6:
J:2251: '/distribution/install' still keep running, and fell into the error '>>Start PXE over IPv4. 
  PXE-E18: Server response timeout. 
Boot Failed. Netboot
 Boot Failed. Red Hat Enterprise Linux' 
all the time.

Comment 10 wangjing 2013-11-27 10:14:37 UTC
2.Scenario2:on rhel7:
J:2254: '/distribution/install' completed pass, but when running '/distribution/command', stucked at log:'
Trying to allocate 1172 pages for VMLINUZ 
[Linux-EFI, setup=0x10db, size=0x493690] 
   [Initrd, addr=0x7dc18000, size=0x1ee459c]' for hours.

3. Scenario3:on rhel5: (RHEL5 doesn't support EFI on x86_64/i386,  RHEL5 supports EFI for ia64.)
J:2266: stucked at log:'
no config file found on TFTP server in  
forcing interactive mode due to config file error(s) 
 
ELILO boot:......................' for hours.

Comment 11 Dan Callaghan 2013-11-27 22:34:46 UTC
(In reply to wangjing from comment #10)
> 2.Scenario2:on rhel7:
> J:2254: '/distribution/install' completed pass, but when running
> '/distribution/command', stucked at log:'
> Trying to allocate 1172 pages for VMLINUZ 
> [Linux-EFI, setup=0x10db, size=0x493690] 
>    [Initrd, addr=0x7dc18000, size=0x1ee459c]' for hours.

In this case rhts-reboot did not work, because I only updated rhts-test-env for RHEL6, not other distros.

> 3. Scenario3:on rhel5: (RHEL5 doesn't support EFI on x86_64/i386,  RHEL5
> supports EFI for ia64.)
> J:2266: stucked at log:'
> no config file found on TFTP server in  
> forcing interactive mode due to config file error(s) 
>  
> ELILO boot:......................' for hours.

In this case the snippet appears to have worked correctly (it set BootNext to the expected value) but the system then booted off the network anyway. My first guess is that the firmware ignored BootNext because the entry was not present in the BootOrder. In any case it means that this new snippet is a regression for ia64. => FailedQA

Comment 12 Dan Callaghan 2013-11-27 23:14:54 UTC
(In reply to Dan Callaghan from comment #11)
> My first guess is that the firmware ignored BootNext because the entry was not
> present in the BootOrder.

Seems so. The firmware on that system prints:

BmOrderOptions: Removing un-referenced load option: Boot0000

So maybe instead of completely removing the OS entry in %post we can shuffle it to the end.

Comment 14 wangjing 2013-11-28 03:24:01 UTC
(In reply to wangjing from comment #9)
> testing on
> beaker-devel(beaker-server-0.14.3-1.git.11.c34034f.el6eng)-->depending on
> system reliability.
> 
> 1. Scenario1:on rhel6:
> J:2251: '/distribution/install' still keep running, and fell into the error

testing on beaker-devel(beaker-server-0.14.3-1.git.15.c195dfb.el6eng.noarch.rpm)

1. Scenario1:on rhel6:
J:2284: stuck on a page 'system configuration and boot managment'(have attached) for a long time.