RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1323085 - generate bootindex even when <bootmenu enable='yes'/> is specified
Summary: generate bootindex even when <bootmenu enable='yes'/> is specified
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1328318 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-01 08:30 UTC by Qianqian Zhu
Modified: 2016-11-03 18:41 UTC (History)
8 users (show)

Fixed In Version: libvirt-2.0.0-2.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 18:41:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
bz1323085.log (86.26 KB, text/plain)
2016-04-05 03:14 UTC, Qianqian Zhu
no flags Details
boot from a blank disk (63.53 KB, image/png)
2016-08-25 07:43 UTC, lijuan men
no flags Details
ovmf-q35-secboot.xml (5.10 KB, text/plain)
2016-09-07 08:49 UTC, lijuan men
no flags Details
debug log for comment 24 (107.27 KB, text/plain)
2016-09-07 08:51 UTC, lijuan men
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Internal Links: 1752838

Description Qianqian Zhu 2016-04-01 08:30:23 UTC
Description of problem:
Modification of "-boot order=" lose efficacy when "menu=on"

Version-Release number of selected component (if applicable):
OVMF-20160202-2.gitd7c0dfa.el7.noarch
qemu-kvm-rhev-2.5.0-2.el7.x86_64
libvirt-1.2.17-13.el7.x86_64
kernel-3.10.0-366.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.start guest
# virsh start generic
# ps aux|grep qemu
/usr/libexec/qemu-kvm -name generic -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu SandyBridge -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/generic_VARS.fd,if=pflash,format=raw,unit=1 -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2f2a507e-9c56-456b-beaf-dbd5fec83fdd -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-generic/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=d,menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/generic-2.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/home/RHEL-6.7-20150702.0-Server-x86_64-dvd1.iso,if=none,id=drive-ide0-0-1,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:8d:87:25,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

# virsh edit generic
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/generic_VARS.fd</nvram>
    <boot dev='cdrom'/>
    <bootmenu enable='yes'/>
  </os>


2. shutdown guest, and modify boot order
# virsh shutdown generic
# virsh edit generic
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/generic_VARS.fd</nvram>
    <boot dev='hd'/>
    <bootmenu enable='yes'/>
  </os>

# ps aux|grep qemu
/usr/libexec/qemu-kvm -name generic ... -boot order=c,menu=on,strict=on...

3. Check boot order on menu 'Boot Manager'

Actual results:
Boot order was still 'cdrom' prior

Expected results:
Boot order should changed to 'hard disk' prior

Additional info:
Boot order modification works well without 'menu=on' for '-boot'

Comment 2 Laszlo Ersek 2016-04-01 12:47:07 UTC
This symptom is not related to <bootmenu enable='yes'/>.

Instead, the problem you are experiencing is the following. When you remove the <boot dev='hd'/> element, you instruct OVMF to remove (= filter out) the UEFI boot option that would boot grub2 from the virtual hard disk. When you later re-add the <boot dev='hd'/> element, you instruct OVMF to *keep* (= filter in) the UEFI boot option that would boot grub2 from the virtual hard disk.

However, OVMF cannot *re-generate* (= re-create) such boot options. It can only filter them out (permanently), or keep them (if they exist). If you remove a UEFI boot option that points to a UEFI application (like grub2.efi) on *fixed media*, then such a boot option can never be re-generated by any UEFI firmware automatically. (This is not an OVMF limitation but a generic UEFI characteristic.) If you want to re-establish such a boot option, then there are three options:
- you have to create the boot option manually in the OVMF setup TUI, and move it to the top of the boot order
- or else, boot the guest OS manually (e.g. by launching grub2 from the UEFI shell), and re-establish the boot option with "efibootmgr"
- or else, rely on "fallback.efi". (See more about this later.)

Note that removable media behaves differently; which is why the CD-ROM can be booted automatically even after you remove and re-add <boot dev='cdrom'/>.

Peter Jones documented this behavior on his blog, in the post
<http://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/>.

In order to confirm my analysis:

(1) Please try to repeat the exact same steps, without <bootmenu enable='yes'/>. I assert that even without <bootmenu enable='yes'/>, if you replace <boot dev='hd'/> with  <boot dev='cdrom'/>, boot the guest, then change it back, you won't be able to boot off the virtual hard disk.

(*Unless* your guest OS installer installed "fallback.efi" as well -- see the blog post. However, if "fallback.efi" is in place on the hard disk, then the grub2.efi / shim.efi boot option should be restored regardless of <bootmenu enable='yes'/>.)

(2) This is a general hint: whenever reporting OVMF bugs, please always capture and attach the OVMF debug log, for every guest boot that is relevant to the issue. You can do this as follows:

- add the following attribute to the root element of the domain XML:

  <domain
   type='kvm'
   xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'  <-- this one
  >

- add the following XML fragment near the end of the domain XML, as a
direct child of the <domain> element:

    <qemu:arg value='-global'/>
    <qemu:arg value='isa-debugcon.iobase=0x402'/>
    <qemu:arg value='-debugcon'/>
    <qemu:arg value='file:/tmp/guest_name.log'/>

This will send the OVMF log to "/tmp/guest_name.log".

-----

If my analysis is confirmed, I will close this bug as NOTABUG (for OVMF), or else reassign it to the "shim" component (which provides "fallback.efi"). Because, re-creating UEFI boot options that point to arbitrary boot loaders on *fixed* media is not the responsibility of the firmware.

BTW, you can easily avoid this issue by not removing <boot dev='hd'/>. Instead, just put <boot dev='cdrom'/> in front of it. Thanks.

Comment 3 Qianqian Zhu 2016-04-05 03:13:16 UTC
Thanks for your detailed explanation.

One additional info: this was no os on my hard disk, it's blank. I was just try the confirm the boot order, so did not install an os there.

I have tired again exactly same steps but without <bootmenu enable='yes'/>, the boot order changed and guest boot up from cdrom when I replace <boot dev='hd'/> with  <boot dev='cdrom'/>. And without <bootmenu enable='yes'/>, boot order will changed accordingly no matter from hd to cdrom or nic or visa versa, while with <bootmenu enable='yes'/>, boot order wont changed to whatever I modified it.

Actually, If I use virt-manager, I wont hit this issue, since virt-manager will add bootindex to individual devices instead when one try to modify the boot order with 'boot menu' on, but libvirt people would like to modify their xml this way, so I filled this bug.

Comment 4 Qianqian Zhu 2016-04-05 03:14:43 UTC
Created attachment 1143663 [details]
bz1323085.log

Comment 5 Qianqian Zhu 2016-04-05 03:26:07 UTC
(In reply to Laszlo Ersek from comment #2)

> (2) This is a general hint: whenever reporting OVMF bugs, please always
> capture and attach the OVMF debug log, for every guest boot that is relevant
> to the issue. You can do this as follows:
> 
> - add the following attribute to the root element of the domain XML:
> 
>   <domain
>    type='kvm'
>    xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'  <-- this one
>   >
> 
> - add the following XML fragment near the end of the domain XML, as a
> direct child of the <domain> element:
> 
>     <qemu:arg value='-global'/>
>     <qemu:arg value='isa-debugcon.iobase=0x402'/>
>     <qemu:arg value='-debugcon'/>
>     <qemu:arg value='file:/tmp/guest_name.log'/>
> 
> This will send the OVMF log to "/tmp/guest_name.log".

For who may use this later, what you add to the end of the domain possibly should be:
  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='isa-debugcon.iobase=0x402'/>
    <qemu:arg value='-debugcon'/>
    <qemu:arg value='file:/tmp/guest_name.log'/>
  </qemu:commandline>

Comment 6 Laszlo Ersek 2016-04-05 08:49:25 UTC
Sorry, I'm confused. If you have no OS installed on your hard disk, then a UEFI boot option, pointing to a file on the EFI system partition of your hard disk, will also not exist.

How do you expect to verify the relative order of this non-existent boot option against other (auto-generated, like CD-ROM) boot options?

In addition, comment 4 does not explain which test case the OVMF debug log belongs to. Either way, looking at the attachment, I can determine that the SetBootOrderFromQemu() function (in "OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c") has returned very early, because otherwise it would have dumped the "bootorder" fw_cfg file exported by QEMU:

  Status = QemuFwCfgFindFile ("bootorder", &FwCfgItem, &FwCfgSize);
  if (Status != RETURN_SUCCESS) {
    return Status;
  }

  if (FwCfgSize == 0) {
    return RETURN_NOT_FOUND;
  }

  FwCfg = AllocatePool (FwCfgSize);
  if (FwCfg == NULL) {
    return RETURN_OUT_OF_RESOURCES;
  }

  QemuFwCfgSelectItem (FwCfgItem);
  QemuFwCfgReadBytes (FwCfgSize, FwCfg);
  if (FwCfg[FwCfgSize - 1] != '\0') {
    Status = RETURN_INVALID_PARAMETER;
    goto ErrorFreeFwCfg;
  }

  DEBUG ((DEBUG_VERBOSE, "%a: FwCfg:\n", __FUNCTION__));
  DEBUG ((DEBUG_VERBOSE, "%a\n", FwCfg));  <------------------- not reached
  DEBUG ((DEBUG_VERBOSE, "%a: FwCfg: <end>\n", __FUNCTION__));
  FwCfgPtr = FwCfg;

This means that QEMU does not export *any* boot order at all, for OVMF to act upon.

Can you please attach all of your domain XMLs, clearly marking which works (according to your expectation), and which doesn't? Thanks.

Comment 7 Laszlo Ersek 2016-04-05 09:02:38 UTC
Stunningly, this bug report is valid. I compared the behavior of the following two domain XML snippets (with an OS installed to the disk):

    <boot dev='hd'/>
    <boot dev='cdrom'/>
    <bootmenu enable='yes'/>


vs.

    <boot dev='hd'/>
    <boot dev='cdrom'/>
    <bootmenu enable='no'/>

And, indeed, the first snippet causes the "bootorder" fw_cfg file to disappear!

(Using the <boot order='N'/> elements, it works all fine, as stated in comment 3 too -- but those translate to -device XXXX,bootindex=N device properties for QEMU, *not* boot order=ZZZ options.)

Let me compare the QEMU command lines, corresponding to the two XML fragments above.

Comment 8 Laszlo Ersek 2016-04-05 09:17:52 UTC
When I flip the @enable attribute of the bootmenu element from no to yes, libvirt generates a different QEMU command line:

> --- bootmenu-no--this-works     2016-04-05 11:10:42.657301532 +0200
> +++ bootmenu-yes--this-breaks   2016-04-05 11:10:42.985295407 +0200
> @@ -16,14 +16,14 @@
>    -no-shutdown \
>    -global PIIX4_PM.disable_s3=0 \
>    -global PIIX4_PM.disable_s4=1 \
> -  -boot menu=off,strict=on \
> +  -boot order=cd,menu=on,strict=on \
>    -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
>    -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
>    -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \
>    -drive file=/mnt/data/virt-images-big/ovmf.rhel6.zimg,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=writeback \
> -  -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
> +  -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
>    -drive file=/mnt/data/isos/rhel/6.4/RHEL6.4-20130130.0-Server-x86_64-DVD1.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=writeback \
> -  -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 \
> +  -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \
>    -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 \
>    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3d:87:cd,bus=pci.0,addr=0x3,romfile= \
>    -chardev pty,id=charserial0 \

That is, when the boot menu is off, then libvirt generates the bootindex device properties *regardless* of the fact that the domain XML does not specify <boot order='N'/> elements. And the bootindex device properties seem to make it work.

Whereas, when the boot menu is on, the bootindex properties are replaced with the "-boot order=cd" option, and that doesn't seem to export any kind of "bootorder" file.

So, this is not an OVMF bug (OVMF cannot do anything about the boot order if it doesn't get the corresponding fw_cfg file). It is either a libvirt or a QEMU bug.

We have to determine (a) why the bootmenu setting in the domain XML switches between the bootindex properties vs. the "-boot order" option on the QEMU command line, and (b) why "-boot order=cd" doesn't generate the "bootorder" fw_cfg file.

Comment 9 Laszlo Ersek 2016-04-05 09:34:34 UTC
Re question (a), the libvirt logic dates back to this (very old) commit:

commit c3068d4d2381146ed46051ad636a928edea5c602
Author: Jiri Denemark <jdenemar>
Date:   Thu May 26 17:15:01 2011 +0300

    qemu: Translate boot config into bootindex if possible
    
    Prefer bootindex=N option for -device over the old way -boot ORDER
    possibly accompanied with boot=on option for -drive. This gives us full
    control over which device will actually be used for booting guest OS.
    Moreover, if qemu doesn't support boot=on, this is the only way to boot
    of certain disks in some configurations (such as virtio disks when used
    together IDE disks) without transforming domain XML to use per device
    boot elements.

It adds code / comments such as:

+        /*
+         * We prefer using explicit bootindex=N parameters for predictable
+         * results even though domain XML doesn't use per device boot elements.
+         * However, we can't use bootindex if boot menu was requested.
+         */

I don't understand why bootindex cannot be used when boot menu is requested, but that's beside the point -- this behavior has been there in libvirt forever, whereas handling of -boot has undergone repeated changes in QEMU. So I think QEMU is the more likely culprit. I'll check it out next.

Comment 10 Laszlo Ersek 2016-04-05 10:14:44 UTC
After investigating QEMU as well: QEMU handles the "-boot order=ZZZ" option
inherently differently from the "-device ...,bootindex=N' device properties.
Namely, the latter get added to the "bootorder" fw_cfg file, whereas the
former are *only* stored in the CMOS. The following functions are very
telling, from "hw/i386/pc.c":

> /* convert boot_device letter to something recognizable by the bios */
> static int boot_device2nibble(char boot_device)
> {
>     switch(boot_device) {
>     case 'a':
>     case 'b':
>         return 0x01; /* floppy boot */
>     case 'c':
>         return 0x02; /* hard drive boot */
>     case 'd':
>         return 0x03; /* CD-ROM boot */
>     case 'n':
>         return 0x04; /* Network boot */
>     }
>     return 0;
> }
>

and

> static void set_boot_dev(ISADevice *s, const char *boot_device, Error **errp)
> {
> #define PC_MAX_BOOT_DEVICES 3
>     int nbds, bds[3] = { 0, };
>     int i;
>
>     nbds = strlen(boot_device);
>     if (nbds > PC_MAX_BOOT_DEVICES) {
>         error_setg(errp, "Too many boot devices for PC");
>         return;
>     }
>     for (i = 0; i < nbds; i++) {
>         bds[i] = boot_device2nibble(boot_device[i]);
>         if (bds[i] == 0) {
>             error_setg(errp, "Invalid boot device for PC: '%c'",
>                        boot_device[i]);
>             return;
>         }
>     }
>     rtc_set_memory(s, 0x3d, (bds[1] << 4) | bds[0]);
>     rtc_set_memory(s, 0x38, (bds[2] << 4) | (fd_bootchk ? 0x0 : 0x1));
> }
>

These functions mean that the "-boot order=ZZZ" option *only* sets some kind
of *legacy BIOS* boot order in the CMOS, in registers 0x3d and 0x38.
(SeaBIOS refers to these registers as CMOS_BIOS_BOOTFLAG1 and
CMOS_BIOS_BOOTFLAG2.) This is *also* historical behavior -- the
implementation has gone through several changes in QEMU, but the logic has
always been the same.

Of course, it is completely unsupportable with OVMF. Under UEFI you may
easily have seven hard drives, ten CD-ROMs, and five NICs, at the same time.
No guesswork, sorry. (Not more than what OVMF is already being forced to do
for the "bootorder" fw_cfg file anyway.)

Luckily, the QEMU manual points out that the behavior of the "-boot" option
is firmware specific. The libvirt domain XML docs also explain that the
"bootmenu" element behaves in a firmware specific way
<http://libvirt.org/formatdomain.html#elementsOSBIOS>:

> Up till here the BIOS/UEFI configuration knobs are generic enough to be
> implemented by majority (if not all) firmwares out there. However, from
> now on not every single setting makes sense to all firmwares. For
> instance, rebootTimeout doesn't make sense for UEFI, useserial might not
> be usable with a BIOS firmware that doesn't produce any output onto serial
> line, etc. Moreover, firmwares don't usually export their capabilities for
> libvirt (or users) to check. And the set of their capabilities can change
> with every new release. Hence users are advised to try the settings they
> use before relying on them in production.
>
> bootmenu
>
>     Whether or not to enable an interactive boot menu prompt on guest
>     startup. The enable attribute can be either "yes" or "no". If not
>     specified, the hypervisor default is used. Since 0.8.3 Additional
>     attribute timeout takes the number of milliseconds the boot menu
>     should wait until it times out. Allowed values are numbers in range
>     [0, 65535] inclusive and it is ignored unless enable is set to "yes".
>     Since 1.2.8

The domain XML docs also strongly recommend using <boot order='N'/>:

> boot
>
>     The dev attribute takes one of the values "fd", "hd", "cdrom" or
>     "network" and is used to specify the next boot device to consider.

(Note how this maps exactly to the legacy BIOS function boot_device2nibble()
above!)

>     [...] It can be tricky to configure in the desired way, which is why
>     per-device boot elements (see disks, network interfaces, and USB and
>     PCI devices sections below) were introduced and they are the preferred
>     way providing full control over booting order. The boot element and
>     per-device boot elements are mutually exclusive. Since 0.1.3,
>     per-device boot since 0.8.8

Summary:

- if the domain XML says <bootmenu enable='yes'/>,

- and the domain XML uses <boot dev='...'/>, despite the recommendation in
  the domain XML docs,

- then libvirt will generate "-boot order=ZZZ", instead of "-device
  ...,bootindex=N", specifically for legacy BIOS purposes, (see comment 9),

- in turn QEMU will not generate (and has never generated) the "bootorder"
  fw_cfg file; only legacy BIOS bits will be set in some RTC / CMOS
  registers,

- which is unsupportable in OVMF.

Solution:

- always use <boot order='N'/> in the domain XML, same as virt-manager does,

- or else, reassign this bug to QEMU, so that the "-boot order=ZZZ" option
  too generate the "bootorder" fw_cfg file.

For now I'm moving the bug to qemu-kvm-rhev. My advice to the qemu-kvm-rhev
assignee (TBD) is: close this bug as WONTFIX. The justification given in
comment 3 (i.e., "libvirt people would like to modify their xml this way")
is not good enough -- <boot dev='...'/> breaks even with legacy BIOS, if
there are several instances of the same device type. The use case clearly
violates documented recommendations.

Comment 11 Jiri Denemark 2016-04-07 12:11:32 UTC
(In reply to Laszlo Ersek from comment #9)
> I don't understand why bootindex cannot be used when boot menu is requested,
> but that's beside the point -- this behavior has been there in libvirt
> forever, whereas handling of -boot has undergone repeated changes in QEMU.

I believe it was impossible to pass -boot menu=on (i.e., without specifying any device to boot from), but it's ages ago so I'm not really sure why I did it this way. In any case, if -boot menu=on can be used together with bootindex=N, libvirt should definitely be changed to use that. In other words, we want to use bootindex=N whenever we can.

Comment 12 Laszlo Ersek 2016-04-11 08:04:56 UTC
(In reply to Jiri Denemark from comment #11)

> In any case, if -boot menu=on can be used together with
> bootindex=N,

It definitely works with OVMF (... as much as -boot menu=on can be approximated in OVMF ...), but I guess the question is if it would regress SeaBIOS.

> libvirt should definitely be changed to use that. In other
> words, we want to use bootindex=N whenever we can.

Right. This would also nicely sidestep the QEMU problem (which can be seen in the bug title currently).

Do you think we should move this bug to libvirt (for triaging / evaluating menu+bootindex with SeaBIOS)? Thanks!

Comment 13 Jiri Denemark 2016-04-11 10:44:57 UTC
Yes. Do you want to keep a copy for QEMU, too?

Comment 14 Laszlo Ersek 2016-04-11 11:11:47 UTC
I don't think so, no.

The -boot options are documented as firmware-specific (esp. "menu=on"). So, if libvirt can either switch to bootindex=N globally, or it can refine the existing logic (see below), then that should be fine, both for the configuration supported on RHEL-7 hosts (where libvirt is strictly required), and for upstream QEMU too (where -boot is allowed to work differently with different firmwares).

As "refinement", I have something like this in mind:

- if bootindex=N works globally (regardless of menu=on|off), that's best

- otherwise, we currently seem to have:

  use bootindex=N if menu==off (regardless of <boot dev="..."/>)

  which could be replaced with:

  use bootindex=N if (menu==off || loader type == pflash)
  (regardless of <boot dev="..."/>)

  In other words, if SeaBIOS still doesn't make it possible to use bootindex=N
  with menu=on, then try to determine if the domain uses OVMF/AAVMF (from the
  loader type being "pflash"... which is not equivalent, but a pretty good
  approximation I guess?), and enable bootindex=N for OVMF/AAVMF regardless of
  menu=on.

So, please feel free to move this bug over to libvirt, and to retitle it. Thank you!

Comment 15 Jiri Denemark 2016-04-19 08:09:24 UTC
*** Bug 1328318 has been marked as a duplicate of this bug. ***

Comment 16 Jiri Denemark 2016-07-01 13:38:03 UTC
Fixed upstream by v2.0.0-6-g0dd67ac:

commit 0dd67acfa752123a297469ff873f47cefce95435
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jun 28 22:15:25 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jul 1 12:20:54 2016 +0200

    qemu: Use bootindex whenever possible

    I'm not sure why our code claimed "-boot menu=on" cannot be used in
    combination with per-device bootindex, but it was proved wrong about
    four years ago by commit 8c952908. Let's always use bootindex when QEMU
    supports it.

    https://bugzilla.redhat.com/show_bug.cgi?id=1323085

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 19 lijuan men 2016-08-25 07:41:46 UTC
I have two questions when I verify the bug.
The two questions are in the following testing steps.

version:
libvirt-2.0.0-6.el7.x86_64
qemu-kvm-rhev-2.6.0-21.el7.x86_64
OVMF-20160608-3.git988715a.el7.noarch

the steps to verify the bug:
1.prepare a guest with the xml:
<os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ovmf-q35-secboot_VARS.fd</nvram>
    <boot dev='hd'/>
    <bootmenu enable='yes' timeout='10000'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
    <smm state='on'/>
  </features>
...
<disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/boot.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/aa.qcow2'/>
      <target dev='sdc' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
...
 <interface type='network'>
      <mac address='52:54:00:67:df:fb'/>
      <source network='default'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
...

2.start the guest
[root@localhost ~]# virsh start ovmf-q35-secboot

[root@localhost ~]# ps -ef | grep ovmf-q35-secboot | grep boot
...
 -boot menu=on,splash-time=10000,strict=on
...
-drive file=/var/lib/libvirt/images/boot.iso,format=raw,if=none,media=cdrom,id=drive-sata0-0-1,readonly=on -device ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1 -drive file=/var/lib/libvirt/images/aa.qcow2,format=qcow2,if=none,id=drive-sata0-0-2 -device ide-hd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,****bootindex=1****
...

3.see the boot menu in the guest

the menu will list:
disk
cdrom
NIC           ----->The menu lists all the devices(disk,cd,NIC), and if the disk boots failed,guest will try to boot from the cdrom. it is expected,right?

The guest will boot from the disk.

4.replace <boot dev='hd'/> with <boot dev='cdrom'/> in the guest xml,restart the guest

[root@localhost ~]# virsh destroy ovmf-q35-secboot
[root@localhost ~]# virsh start ovmf-q35-secboot

[root@localhost ~]# ps -ef | grep ovmf-q35-secboot | grep boot
...
 -boot menu=on,splash-time=10000,strict=on
...
-drive file=/var/lib/libvirt/images/boot.iso,format=raw,if=none,media=cdrom,id=drive-sata0-0-1,readonly=on -device ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1,****bootindex=1**** -drive file=/var/lib/libvirt/images/aa.qcow2,format=qcow2,if=none,id=drive-sata0-0-2 -device ide-hd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2 

5.see the boot menu in the guest
the menu will list:
cdrom
disk
NIC

The guest will boot from the cdrom


****I have another question:****
If the guest tries to boot from a blank disk/cdrom(suppose the guest doesn't have other devices),the guest will hang,the screenshot is in the  attachment.
Is it expected?

Comment 20 lijuan men 2016-08-25 07:43:27 UTC
Created attachment 1193890 [details]
boot from a blank disk

Comment 21 Jiri Denemark 2016-09-06 10:34:04 UTC
(In reply to lijuan men from comment #19)
> the menu will list:
> disk
> cdrom
> NIC           ----->The menu lists all the devices(disk,cd,NIC), and if the
> disk boots failed,guest will try to boot from the cdrom. it is
> expected,right?

Since the command line contains "-boot menu=on,splash-time=10000,strict=on", I
think the menu should not list any devices which are not marked as bootable.
But that's more a question for QEMU (and possibly a bug there or in the
firmware).

> If the guest tries to boot from a blank disk/cdrom(suppose the guest doesn't
> have other devices),the guest will hang,the screenshot is in the  attachment.
> Is it expected?

It's really up to the firmware what it does in such situation. If you think it
should behave differently, you can try filing a bug for it...

Comment 22 Laszlo Ersek 2016-09-06 21:16:30 UTC
(In reply to lijuan men from comment #19)
> I have two questions when I verify the bug.
> The two questions are in the following testing steps.
> 
> version:
> libvirt-2.0.0-6.el7.x86_64
> qemu-kvm-rhev-2.6.0-21.el7.x86_64
> OVMF-20160608-3.git988715a.el7.noarch
> 
> the steps to verify the bug:
> 1.prepare a guest with the xml:
> <os>
>     <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
>     <loader readonly='yes' secure='yes'
> type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
>     <nvram>/var/lib/libvirt/qemu/nvram/ovmf-q35-secboot_VARS.fd</nvram>
>     <boot dev='hd'/>
>     <bootmenu enable='yes' timeout='10000'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <vmport state='off'/>
>     <smm state='on'/>
>   </features>
> ...
> <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='/var/lib/libvirt/images/boot.iso'/>
>       <target dev='sdb' bus='sata'/>
>       <readonly/>
>       <address type='drive' controller='0' bus='0' target='0' unit='1'/>
>     </disk>
>     <disk type='file' device='disk'>
>       <driver name='qemu' type='qcow2'/>
>       <source file='/var/lib/libvirt/images/aa.qcow2'/>
>       <target dev='sdc' bus='sata'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='2'/>
>     </disk>
> ...
>  <interface type='network'>
>       <mac address='52:54:00:67:df:fb'/>
>       <source network='default'/>
>       <model type='rtl8139'/>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x01'
> function='0x0'/>
>     </interface>
> ...
> 
> 2.start the guest
> [root@localhost ~]# virsh start ovmf-q35-secboot
> 
> [root@localhost ~]# ps -ef | grep ovmf-q35-secboot | grep boot
> ...
>  -boot menu=on,splash-time=10000,strict=on
> ...
> -drive
> file=/var/lib/libvirt/images/boot.iso,format=raw,if=none,media=cdrom,
> id=drive-sata0-0-1,readonly=on -device
> ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1 -drive
> file=/var/lib/libvirt/images/aa.qcow2,format=qcow2,if=none,id=drive-sata0-0-
> 2 -device
> ide-hd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,****bootindex=1****
> ...
> 
> 3.see the boot menu in the guest
> 
> the menu will list:
> disk
> cdrom
> NIC           ----->The menu lists all the devices(disk,cd,NIC), and if the
> disk boots failed,guest will try to boot from the cdrom. it is
> expected,right?

The exact behavior that you see is by design, but it might be unexpected for those who don't know the internals of edk2.

When you set the boot order as you set it, that is, you include only the SATA disk in it, OVMF drops the CD-ROM and the NIC from the UEFI boot order. You can verify this with the OVMF debug log. If you simply let OVMF continue booting at this point, then it will try the disk, and if the disk fails, OVMF will *not* boot the cdrom or the NIC.

However, you interrupted the boot process, and entered the setup TUI. At this point the edk2 BDS automatically regenerates all possible boot options, and tacks them to the end of the boot option list. That's why you see the disk at the top of the menu, but also a bunch of other options below it.

In other words, by virtue of looking at the boot order *in the setup TUI*, you also modified, unwittingly, the boot order. If you just want to look at the boot option list after OVMF's filtering and processing, then you should look at the OVMF debug log. It contains a section that goes like

[Bds]=============Begin Load Options Dumping ...=============
...
[Bds]=============End Load Options Dumping=============

This is the list that the firmware will attempt to boot, in order.

> 
> The guest will boot from the disk.
> 
> 4.replace <boot dev='hd'/> with <boot dev='cdrom'/> in the guest xml,restart
> the guest
> 
> [root@localhost ~]# virsh destroy ovmf-q35-secboot
> [root@localhost ~]# virsh start ovmf-q35-secboot
> 
> [root@localhost ~]# ps -ef | grep ovmf-q35-secboot | grep boot
> ...
>  -boot menu=on,splash-time=10000,strict=on
> ...
> -drive
> file=/var/lib/libvirt/images/boot.iso,format=raw,if=none,media=cdrom,
> id=drive-sata0-0-1,readonly=on -device
> ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1,****bootindex=1****
> -drive
> file=/var/lib/libvirt/images/aa.qcow2,format=qcow2,if=none,id=drive-sata0-0-
> 2 -device ide-hd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2 
> 
> 5.see the boot menu in the guest
> the menu will list:
> cdrom
> disk
> NIC

Ditto. Now that you set bootindex=1 for the cdrom, and no bootindex for anything else, OVMF dropped the disk and the NIC. But you also entered the setup TUI, which action regenerated them, and appended them to the end of the list.

> The guest will boot from the cdrom
> 
> 
> ****I have another question:****
> If the guest tries to boot from a blank disk/cdrom(suppose the guest doesn't
> have other devices),the guest will hang,the screenshot is in the  attachment.
> Is it expected?

After carefully reviewing the generic edk2 code: yes, this is expected.

The BdsEntry() function will execute the boot options in sequence, according to the boot order. If every single one of those boot options fails, then it logs a message to the debug output:

[Bds] Unable to boot!

and then hangs on purpose.

Normally people don't witness this, because usually the built-in UEFI Shell gets added by platform BDS at the very end of the boot order, unconditionally. So if everything else fails, the built-in UEFI shell is launched. However, in our downstream we removed the built-in UEFI shell (because we had been advised that it could be used to circumvent secure boot). So now if all of the boot options fail, then OVMF (more precisely, the generic BDS) will hang on purpose.

Comment 23 lijuan men 2016-09-07 03:12:11 UTC
> > 
> > the menu will list:
> > disk
> > cdrom
> > NIC           ----->The menu lists all the devices(disk,cd,NIC), and if the
> > disk boots failed,guest will try to boot from the cdrom. it is
> > expected,right?
> 
> The exact behavior that you see is by design, but it might be unexpected for
> those who don't know the internals of edk2.
> 
> When you set the boot order as you set it, that is, you include only the
> SATA disk in it, OVMF drops the CD-ROM and the NIC from the UEFI boot order.
> You can verify this with the OVMF debug log. If you simply let OVMF continue
> booting at this point, then it will try the disk, and if the disk fails,
> OVMF will *not* boot the cdrom or the NIC.
> 
> However, you interrupted the boot process, and entered the setup TUI. At
> this point the edk2 BDS automatically regenerates all possible boot options,
> and tacks them to the end of the boot option list. That's why you see the
> disk at the top of the menu, but also a bunch of other options below it.
> 
> In other words, by virtue of looking at the boot order *in the setup TUI*,
> you also modified, unwittingly, the boot order. If you just want to look at
> the boot option list after OVMF's filtering and processing, then you should
> look at the OVMF debug log. It contains a section that goes like
> 

thanks for your reply! I have tried the scenario again.

I use the xml:
<os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ovmf-q35-secboot_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
    <smm state='on'/>
  </features>
...
<disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/boot.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/aa.qcow2'/>  --->***not*** a bootable one
      <target dev='sdc' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>

then I started the guest,I did not interrupt the boot process, and not enter the setup TUI, just let the vm start itself.

the vm will boot from the ***cdrom***.

the result is not as what you said("If you simply let OVMF continue
 booting at this point, then it will try the disk, and if the disk fails,
OVMF will *not* boot the cdrom or the NIC.")

Comment 24 Laszlo Ersek 2016-09-07 08:14:08 UTC
This is a valid concern. Please attach your complete domain XML, and the complete OVMF debug log too.

You can capture the OVMF debug log by adding the following to your domain XML:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
...
  <qemu:commandline>   
    <qemu:arg value='-global'/>
    <qemu:arg value='isa-debugcon.iobase=0x402'/>
    <qemu:arg value='-debugcon'/>
    <qemu:arg value='file:/tmp/GUEST_NAME.log'/>
  <qemu:commandline>   
</domain>  

(Customize GUEST_NAME as appropriate.)

Please note the "xmlns:qemu" attribute in the <domain> element -- it is important; without it, the <qemu:*> elements won't work.

... Normally I would ask you to open a separate OVMF BZ for this -- and we still might end up with that -- however, since the functionality is tied strongly to the libvirt change, I'd like to keep the analysis here for just a little longer. Thanks.

Comment 25 Laszlo Ersek 2016-09-07 08:28:26 UTC
Okay, I know what's going on (but please do upload the files that I asked for).

What happens is exactly what I described just yesterday in bug 1373329 comment 6.

Namely,
- your CD-ROM (which you are *not* selecting for boot) contains a file called \EFI\BOOT\BOOTX64.EFI.
- When the hard disk (which you *do* select for booting) fails to boot, the firmware runs out of UEFI boot options to try.
- Then it tries to run platform recovery. The platform recovery looks for \EFI\BOOT\BOOTX64.EFI on any media at all.
- It happens to match your CD-ROM, so it is booted as part of platform recovery. (which it is entirely inappropriate for, of course).

In other words, your CD-ROM is not booted as a UEFI boot option. It is booted, as platform recovery, after the one (= hard disk) boot option that you specified fails.

This is extremely confusing, but it is straight from the UEFI spec. In other words, you could call this a UEFI spec bug. I'll see what I can do.

For now, please upload the files that I asked for; then I hope to be able to confirm that libvirt is actually doing the right thing. Thanks!

Comment 26 lijuan men 2016-09-07 08:49:31 UTC
Created attachment 1198589 [details]
ovmf-q35-secboot.xml

Comment 27 lijuan men 2016-09-07 08:51:13 UTC
Created attachment 1198590 [details]
debug log for comment 24

Comment 28 lijuan men 2016-09-07 08:55:09 UTC
(In reply to Laszlo Ersek from comment #24)
> This is a valid concern. Please attach your complete domain XML, and the
> complete OVMF debug log too.
> 
> You can capture the OVMF debug log by adding the following to your domain
> XML:
> 
> <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
> ...
>   <qemu:commandline>   
>     <qemu:arg value='-global'/>
>     <qemu:arg value='isa-debugcon.iobase=0x402'/>
>     <qemu:arg value='-debugcon'/>
>     <qemu:arg value='file:/tmp/GUEST_NAME.log'/>
>   <qemu:commandline>   
> </domain>  
> 
> (Customize GUEST_NAME as appropriate.)
> 
> Please note the "xmlns:qemu" attribute in the <domain> element -- it is
> important; without it, the <qemu:*> elements won't work.
> 
> ... Normally I would ask you to open a separate OVMF BZ for this -- and we
> still might end up with that -- however, since the functionality is tied
> strongly to the libvirt change, I'd like to keep the analysis here for just
> a little longer. Thanks.

I have uploaded the complete domain XML, and the complete OVMF debug log to the attachments .

Comment 29 Laszlo Ersek 2016-09-07 09:27:10 UTC
Thanks for the uploads, they confirm my suspicion. Namely, in the OVMF debug log from comment 27, I see:

> SetBootOrderFromQemu: FwCfg:
> /pci@i0cf8/pci8086,2922@1f,2/drive@0/disk@0
> HALT
> SetBootOrderFromQemu: FwCfg: <end>

Which proves that libvirt sets the correct QEMU command line options; the above "bootorder" fw_cfg file identifies the disk, not the CD-ROM:

    //
    // OpenFirmware device path (Q35 SATA disk and CD-ROM):
    //
    //   /pci@i0cf8/pci8086,2922@1f,2/drive@1/disk@0
    //        ^                  ^  ^       ^      ^
    //        |                  |  |       |      device number (fixed 0)
    //        |                  |  |       channel (port) number
    //        |                  PCI slot & function holding SATA HBA
    //        PCI root at system bus port, PIO
    //

and in the domain XML (comment 26) you have:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/aa.qcow2'/>
      <target dev='sda' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

(See unit='0'.)

Furthermore, from the OVMF debug log,

> [Bds]=============Begin Load Options Dumping ...=============
>   Driver Options:
>   SysPrep Options:
>   Boot Options:
>     Boot0002: UEFI QEMU HARDDISK QM00001  		 0x0001
>     Boot0000: UiApp 		 0x0109
>   PlatformRecovery Options:
>     PlatformRecovery0000: Default PlatformRecovery 		 0x0001
> [Bds]=============End Load Options Dumping=============

this also confirms that libvirt does the right thing.

The CD-ROM is booted as part of platform recovery:

> Process Load Option (PlatformRecovery0000) ...
> ...
> FSOpen: Open '\EFI\BOOT\BOOTX64.EFI' Success
> [Bds] DevicePath expand: \EFI\BOOT\BOOTX64.EFI -> PciRoot(0x0)/Pci(0x1F,0x2)/Sata(0x1,0xFFFF,0x0)/CDROM(0x1,0x21B0C,0x30FC)/\EFI\BOOT\BOOTX64.EFI

which is -- I think -- a problem coming straight from the UEFI spec.

So, libvirt is all fine, this bug can be set to VERIFIED. Thanks!

Comment 32 lijuan men 2016-09-09 05:30:35 UTC
thanks,Laszlo 

change the bug status as verified

Comment 34 errata-xmlrpc 2016-11-03 18:41:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.