Bug 1278421

Summary: Cannot PXE boot using VF devices
Product: Red Hat Enterprise Linux 7 Reporter: Amador Pahim <asegundo>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: alex.williamson, asanders, dyuan, knoel, laine, lersek, mzhan, rbalakri, virt-maint, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.3.1-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 18:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OVMF log when the guest can not boot by pxe none

Description Amador Pahim 2015-11-05 12:47:46 UTC
Using VFs in a pxeboot environment, the VFs do not present themselves as a bootable item unless the 'bootmenu' is enabled.

Version:
Red Hat Enterprise Linux Server release 7.1 (Maipo)
qemu-kvm-1.5.3-86.el7_1.1.x86_64
seabios-bin-1.7.5-8.el7.noarch

Comment 2 Amador Pahim 2015-11-05 13:05:09 UTC
The first suspect was seabios is too fast that doesn't wait long enough for the hardware to initialize. The delay when bootmenu is enabled could be the time needed to have the hardware initialized.

To proof (or refuse) this theory, I've added a 2,5s delay (the same as in bootmenu) to the boot without bootmenu. Provided the seabios build with that delay, here the tests results:

- Attempt to reproduce the problem with the current official load (SeaBIOS version seabios-1.7.5-8.el7): Boot ok withouth bootmenu. Unable to reproduce the issue (!?!)

- Attempt to reproduce the problem with the modified seabios (SeaBIOS version rel-1.7.5.2-0-g0a8c90f-dirty-20150723_112535-rhel7): Boot failed: not a bootable disk.

So, from the latest tests, the issue was not reproducible with the same seabios version used when the issue was reported in first place. The modified bios, with a hard coded delay, is now the one presenting the issue.

From the tests I can assume the issue is intermittent and affects both tested versions of seabios.

Comment 6 Laine Stump 2015-12-01 19:15:57 UTC
After some digging, I see the cause of this - even if you use <boot dev='xyz'/> lines to specify the boot order of devices in a domain, libvirt will translate this into individual ",bootindex=n" options for the requested devices. If you turn on the bootmenu though, libvirt must instead use "-boot order='cdn'" on the qemu commandline (because "bootindex" and "-boot order" are incompatible with each other for some reason).

So the network device is being tried when we put "-boot oder='cn'" on the qemu commandline, but *not* when we instead add ",bootindex='n'" to each device.

The reason this wasn't happening is because libvirt was failing to add the bootindex option for hostdev devices if they were defined in the XML as <interface>.

The following patch remedies this problem:

https://www.redhat.com/archives/libvir-list/2015-December/msg00039.html

Note that, as Alex and Laszlo suggest, I would still recommend using "<boot order='n'/> in each device's entry in the config rather than <boot dev='blah'/> in the <os> element, as the latter in deprecated and doesn't allow as fine control.

Comment 7 Laszlo Ersek 2015-12-02 09:06:31 UTC
(In reply to Laine Stump from comment #6)
> After some digging, I see the cause of this - even if you use <boot
> dev='xyz'/> lines to specify the boot order of devices in a domain, libvirt
> will translate this into individual ",bootindex=n" options for the requested
> devices. If you turn on the bootmenu though, libvirt must instead use "-boot
> order='cdn'" on the qemu commandline (because "bootindex" and "-boot order"
> are incompatible with each other for some reason).

Well, it is understandable that "-boot order=xyz" and "-device foo,bootindex=N" are incompatible: they modify the exact same fw_cfg file (called "bootorder"). It makes perfect sense to use only one UI for the same thing.

However, the boot *menu* question is different: the options

  -boot menu=on[,splash-time=T]

and

  -device foo,bootindex=N

are orthogonal. The latter modifies the "bootorder" fw_cfg file (as before), whereas the former modifies:
- the fw_cfg *key* with value 0x000e ("menu=on")
- the "etc/boot-menu-wait" fw_cfg file ("splash-time=T").

Thus, it is (or should be) possible to use "-device foo,bootindex=N" together with "-boot menu=on[,splash-time=T]".

In domain XML terms it means that the following snippet is valid (and, in fact, I use this exact pattern for all of my pemanent OVMF guests):

  <os>
    <type arch='x86_64' machine='pc-q35-2.4'>hvm</type>
    <loader readonly='yes' type='pflash'>/home/virt-images/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ovmf.fedora.q35_VARS.fd</nvram>
    <bootmenu enable='yes' timeout='1000'/>
  </os>
  ...
  <devices>
    <disk type='file' device='disk'>
      ...
      <boot order='1'/>
      ...
    </disk>
    <disk type='file' device='cdrom'>
      ...
      <boot order='2'/>
      ...
    </disk>
    <interface type='network'>
      ...
      <boot order='3'/>
      ...
    </interface>
    ...
  </devices>

Note that there are no "boot" elements (with "dev" attributes) under the "os" element.

> So the network device is being tried when we put "-boot oder='cn'" on the
> qemu commandline, but *not* when we instead add ",bootindex='n'" to each
> device.

That sounds very strange to me. Because, the *exact* pattern that you describe works perfectly well for me with OVMF. As I said:
- I enable the boot menu with 1 second splash time (for OVMF this means that
  it will give you 1 second to press a non-Enter key to enter the UEFI setup
  utility, before it starts to process boot options),
- *and* I use the <boot order="N"/> elements exclusively and universally,
  *including* network devices.

On the QEMU command line, this translates e.g. into "-device virtio-net-pci,...,bootindex=3". QEMU's "bootorder" fw_cfg file reflects it perfectly, and with OVMF I can freely place disks, CD-ROMs, and network devices in any order (even multiple network devices).

> The reason this wasn't happening is because libvirt was failing to add the
> bootindex option for hostdev devices if they were defined in the XML as
> <interface>.

Aha! Now *that* makes sense. So the root issue is that "...,bootindex=N" is missing for assigned NICs. I'll admit that I have no such guests.

> The following patch remedies this problem:
> 
> https://www.redhat.com/archives/libvir-list/2015-December/msg00039.html

Interesting, so it looks like "...,bootindex=N" is missing for assigned NICs only when using <boot dev="network"> in <os>. And it is correctly generated when using <boot order="N"/>.

So is this ultimately a *matching* problem in libvirt? Where <boot dev="network"> in <os> wouldn't match an <interface> whose <source> is "passthrough"?

> Note that, as Alex and Laszlo suggest, I would still recommend using "<boot
> order='n'/> in each device's entry in the config rather than <boot
> dev='blah'/> in the <os> element, as the latter in deprecated and doesn't
> allow as fine control.

Yup.

Comment 10 Laine Stump 2015-12-15 16:34:22 UTC
(In reply to Laszlo Ersek from comment #7)
> 
> Well, it is understandable that "-boot order=xyz" and "-device
> foo,bootindex=N" are incompatible: they modify the exact same fw_cfg file
> (called "bootorder"). It makes perfect sense to use only one UI for the same
> thing.
> 
> However, the boot *menu* question is different: the options
> 
>   -boot menu=on[,splash-time=T]
> 
> and
> 
>   -device foo,bootindex=N
> 
> are orthogonal.

Interesting. Logically that makes perfect sense. My information is based 100% on comments in the libvirt source, which I assumed were written by someone who knew what they were talking about, since it is actually extra work and complexity to switch to using "-boot dev='blah'" just because the bootmenu is enabled. Why go to that trouble for no reason? Was this perhaps a limitation in some older version of qemu that has been removed in later versions? Is there any point to attempting to remove the limitation from libvirt as well?


> Thus, it is (or should be) possible to use "-device foo,bootindex=N"
> together with "-boot menu=on[,splash-time=T]".

Yes, I just tried that and see that it does indeed work (unless you have a kernel > 4.1, in which case enabling the boot menu causes qemu or the virtual machine to go into a busy loop right after printing out "Press F12 for Boot Menu :-/. Fortunately, RHEL7 is not using a 4.1+ kernel (although we will need to be careful to test for this as we backport new changes to the 3.10 kernel in RHEL7)

>
> Interesting, so it looks like "...,bootindex=N" is missing for assigned NICs
> only when using <boot dev="network"> in <os>. And it is correctly generated
> when using <boot order="N"/>.

Correct. 

> 
> So is this ultimately a *matching* problem in libvirt? Where <boot
> dev="network"> in <os> wouldn't match an <interface> whose <source> is
> "passthrough"?

Not exactly. That was the effect, but the reason was that hostdev interfaces' commandlines are constructed in a different place than emulated interfaces, and that "different place" hadn't been setup to notice hostdevs that were network interfaces. That problem is fixed upstream now, though:

commit a8e3247e650fc280920cfd4b0b809d521e161348
Author: Laine Stump <laine>
Date:   Mon Nov 30 17:40:44 2015 -0500

    qemu: add bootindex option to hostdev network interface commandline
    
That patch will be in libvirt 1.3.1.

Still I think that Alex's recommendation is the best solution. It allows more concise specification of boot order for individual devices, and it works *now* with existing released software.

Comment 15 yalzhang@redhat.com 2016-05-04 06:46:33 UTC
Sorry, I made the mistake. I should specify "<rom bar='on' file='/usr/share/ipxe/808610ca.rom'/>" to make the device's ROM visible in guest's memory map.

Verified on below packages, the result is as expected.
libvirt-1.3.4-1.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64
seabios-bin-1.9.1-3.el7.noarch

Scenario 1: 
1. bootmenu enable='no' + boot dev='network'
# virsh dumpxml t_R7.2
.........
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
 **   <boot dev='network'/> **
    <boot dev='hd'/>
  **  <bootmenu enable='no'/> **
    <bios useserial='yes' rebootTimeout='3000'/>
  </os>
  <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:d7:3e:2a'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x4'/>
      </source>
      <alias name='hostdev0'/>
      <rom bar='on' file='/usr/share/ipxe/808610ca.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
...........

Start the guest and check it can boot by pxe.

2. delete the bootnemu element + boot dev='network'
The guest boot by pxe successfully.

3. bootmenu enable='yes' + boot dev='network'
check the boot menu on the guest:
  1. iPXE (PCI 00:03.0)
  2. Virtio disk PCI:0:7
  3. Legacy option rom
and the guest can boot by pxe with VF.

Scenario 2:
1. boot order specified + bootmenu enable=yes/no
The guest can boot from pxe with vf.

2. multiple hostdev interface with different boot order, check the boot menu and try each option, it works OK as well.

Comment 16 yalzhang@redhat.com 2016-05-05 01:12:18 UTC
As verified OK as above, change the status to verified.

Comment 17 yalzhang@redhat.com 2016-09-26 16:23:56 UTC
Hi Laine,

I have tried to test with OVMF, and found the guest can not boot by pxe. Please help to check.I will attach the ovmf log.

Description of problem:
vf passthrough do not support pxe boot with OVMF

Version-Release number of selected component (if applicable):
libvirt-2.0.0-10.el7.x86_64
OVMF-20160608-3.git988715a.el7.noarch
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-kvm-rhev-2.6.0-26.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Check the vf can get ip address
# ifconfig enp3s16f3
enp3s16f3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.28  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::74907ff:fee1:207c  prefixlen 64  scopeid 0x20<link>
        ether 76:907:e1:20:7c  txqueuelen 1000  (Ethernet)
        RX packets 48  bytes 3726 (3.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 28  bytes 4765 (4.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# lspci -nn | grep 03:10.3
03:10.3 Ethernet controller [0200]: Intel Corporation 82576 Virtual Function [8086:10ca] (rev 01)

# virsh dumpxml ipxetest 
...
 <os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/ipxe-virtio_VARS.fd</nvram>
  </os>
...
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:53:3b:cf'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/>
      </source>
      <boot order='1'/>
      <rom bar='on' file='/usr/share/ipxe/808610ca.rom'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </interface>
...
# virsh start ipxetest
# virt-viewer ipxetest
===> the guest will hang at the first "TianoCore" page

2. vf passthrough support pxe boot with legacy bios
# virsh destroy ipxetest
# virsh edit ipxetest 
====> edit <os/> part to use legacy bios
# virsh dumpxml ipxetest
...
 <os>
    <type arch='x86_64' machine='pc-q35-rhel7.3.0'>hvm</type>
  </os>
...
# virt-viewer ipxetest
====> the guest can boot by pxe

3. OVMF + macvtap with vf passthrough mode, the guest can boot by pxe.

Actual results:
The guest can not pxe boot use vf passthrough with OVMF.

Expected results:
The guest can boot by pxe.

Comment 18 yalzhang@redhat.com 2016-09-26 16:25:16 UTC
Created attachment 1204884 [details]
OVMF log when the guest can not boot by pxe

Comment 19 Laine Stump 2016-09-26 17:09:49 UTC
So PXE boot works when using BIOS but fails when using OVMF? Then I would suggest talking to Laszlo. Laszlo - is there something extra needed? Is this a known problem?

Comment 20 Alex Williamson 2016-09-26 17:26:39 UTC
Does the ROM in use support UEFI?  A PCI option ROM can support multiple images.  If it works with SeaBIOS that means that it contains at least an x86 legacy BIOS image.  That image will only work for OVMF if the CSM is enabled.  To natively support OVMF, the ROM must also contain a UEFI image.  Use a tool like https://github.com/awilliam/rom-parser to list the images within the ROM.  If there's no UEFI image in the ROM and no CSM support enabled in OVMF, then the ROM is invalid for booting an OVMF-based VM.

Comment 21 Laszlo Ersek 2016-09-26 17:27:43 UTC
The option ROM file

  /usr/share/ipxe/808610ca.rom

comes from the package

  ipxe-roms

and it is not a combined oprom, it is a legacy BIOS-only oprom:

./rom-parser /usr/share/ipxe/808610ca.rom

Valid ROM signature found @0h, PCIR offset 1ch
        PCIR: type 0 (x86 PC-AT), vendor: 8086, device: 10ca, class: 000002
        PCIR: revision 3, vendor revision: 1
        Last image

For bug 1084561, we enabled the building of combined (= legacy BIOS + UEFI) oproms for QEMU's virtual network cards, to be included in the "ipxe-roms-qemu" subpackage. See downstream commit a86aa23c44edf.

For bug 1295673, we enabled the building of standalone UEFI boot images, to be included in the "ipxe-bootimgs" subpackage. See downstream commit 53eca82a1fc8.

We've done no such thing for the "ipxe-roms" subpackage.

In general I have no idea what the "ipxe-roms" subpackage is good for. Why are we shipping it at all? Normally such standalone ROM files can only be used after flashing them physically to NICs. Assigned devices are an exception, but I doubt we support assigning all of those 573 NIC models (= the number of *.rom files under /usr/share/ipxe).

If you want to PXE boot under OVMF with an assigned device, for which iPXE provides an option ROM, then that will take an RFE for iPXE -- namely, combined (= legacy+EFI) oproms should be built for *those* NIC models, and included in the "ipxe-roms" subpackage. I cannot guarantee though that we support assigned devices (any physical devices at all) for booting with (i.e., while the guest firmware runs).

Even if we did, the hard question would be what NIC models exactly combined oproms should be built for. "All of them" is likely the wrong answer. (The *.rom files already take up 144 MB, and again I don't know why we even ship those 500+ ROM files in the first place.)

Summary:
- The boot doesn't work because the oprom you used had no UEFI image.
- PXE booting off of assigned devices might or might not be supported, I can't tell.
- If we do support such use, then an RFE should be filed for iPXE, and we should figure out what cards we want to cover.

Thanks.

Comment 22 Laszlo Ersek 2016-09-26 17:28:36 UTC
Heh, this was a first for me: my comment conflicted with not just one, but two other comments added meanwhile. :)

Comment 23 Laszlo Ersek 2016-09-26 17:30:50 UTC
(For the record, because Alex mentioned the CSM -- we will not enable the CSM in OVMF.)

Comment 25 errata-xmlrpc 2016-11-03 18:30:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html