Bug 1642135 - No mechanism to disable KVM/qemu (SR-IOV) instances from PXE booting
Summary: No mechanism to disable KVM/qemu (SR-IOV) instances from PXE booting
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: seabios
Version: 8.1
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: 8.2
Assignee: Gerd Hoffmann
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On: 1705212 1793377
Blocks: 1633990
TreeView+ depends on / blocked
 
Reported: 2018-10-23 17:18 UTC by Irina Petrova
Modified: 2022-03-13 15:51 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1633990
Environment:
Last Closed: 2021-01-08 16:54:07 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Scrrenshot of boot menu when boot with PF and a available romfile (19.93 KB, image/jpeg)
2018-10-24 11:58 UTC, Yanan Fu
no flags Details
Scrrenshot of boot menu when boot with VF and no available romfile (13.54 KB, image/jpeg)
2018-10-24 12:00 UTC, Yanan Fu
no flags Details
vf_screenshot (3.22 KB, image/png)
2020-02-06 10:58 UTC, Xueqiang Wei
no flags Details
pf_screenshot (4.55 KB, image/png)
2020-02-06 10:59 UTC, Xueqiang Wei
no flags Details

Comment 1 Yanan Fu 2018-10-24 11:56:27 UTC
Failed to reproduce this issue with latest qemu-kvm-rhev build for RHEL7.5.z.


1. I see the component is "qemu-kvm", but we don't support SR-IOV(VF) on qemu-kvm officially,  we support qemu-kvm-rhev.

2. I tested with the latest RHEL7.5.z qemu-kvm-rhev form QEMU side, can not reproduce.

   Version: qemu-kvm-rhev-2.10.0-21.el7_5.7.x86_64
   NIC: Intel XL710

   Since i failed to build a rom file for VF (Build with git://git.ipxe.org/ipxe.git, it doesn't support the vf's device id 8086154c). So i test with PF and a rom file for it.

   Test steps with VF:
   1. Enable one VF, and bind it to vfio-pci.
   2. Boot VM with the VF, but no romfile for it.
      - boot menu=on \
      - device vfio-pci,host=04:02.0,id=vf \

   After guest boot, in boot menu, first is the hard disk, and no PXE for the VF. Guest can boot with hard disk by default normally.


   Test steps with PF:
   1. Bind PF to vfio-pci.
   2. Boot VM with the PF and given romefile:
      - boot menu=on \
      - device vfio-pci,host=04:00.1,id=pf,rombar=1,romfile="/root/ipxe/src/bin/80861583.rom" \
   
   After guest boot up, in boot menu, first is the hard disk, then Legacy option rom, and at last it is the iPXE for the PF.  Guest can boot with hard disk by default normally.


3. Full qemu command line:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -m 7168  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -enable-kvm \
    -monitor stdio \
    -device vfio-pci,host=04:00.1,id=pf,rombar=1,romfile="/root/ipxe/src/bin/80861583.rom" \ ----> this is for pf, the one for vf already listed above. 
    -boot menu=on \

Comment 2 Yanan Fu 2018-10-24 11:58:01 UTC
Created attachment 1497002 [details]
Scrrenshot of boot menu when boot with PF and a available romfile

Comment 3 Yanan Fu 2018-10-24 12:00:01 UTC
Created attachment 1497003 [details]
Scrrenshot of boot menu when boot with VF and no available romfile

Comment 4 Daniel Berrangé 2018-10-25 20:29:09 UTC
Switching to libvirt, since we should be trying to reproduce with libvirt first, not direct QEMU invocation, and its possible that libvirt is not configuring QEMU correctly.

Comment 5 yalzhang@redhat.com 2018-10-26 08:36:01 UTC
I found both the guest xml shows the machine type is "pc-i440fx-rhel7.5.0".
And the "installed-rpms" shows the related package version as below:
libvirt-daemon-3.9.0-14.el7.x86_64
qemu-kvm-1.5.3-156.el7.x86_64

I have tried to install libvirt and qemu with the same version, but the default machine type is "pc-i440fx-rhel7.0.0", the guest with 7.5.0 can not start.
# virsh start rh
error: Failed to start domain rh
error: internal error: process exited while connecting to monitor: qemu-kvm: -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off: Unsupported machine type
Use -machine help to list supported machines!
# rpm -q qemu-kvm
qemu-kvm-1.5.3-156.el7.x86_64

# /usr/libexec/qemu-kvm  -machine help
Supported machines are:
none                 empty machine
pc                   RHEL 7.0.0 PC (i440FX + PIIX, 1996) (alias of pc-i440fx-rhel7.0.0)
pc-i440fx-rhel7.0.0  RHEL 7.0.0 PC (i440FX + PIIX, 1996) (default)
rhel6.6.0            RHEL 6.6.0 PC
rhel6.5.0            RHEL 6.5.0 PC
rhel6.4.0            RHEL 6.4.0 PC
rhel6.3.0            RHEL 6.3.0 PC
rhel6.2.0            RHEL 6.2.0 PC
rhel6.1.0            RHEL 6.1.0 PC
rhel6.0.0            RHEL 6.0.0 PC

I'm curious how it could happen? I think the customer should use "qemu-kvm-rhev" not qemu-kvm.

Comment 6 Irina Petrova 2018-10-26 18:42:00 UTC
(In reply to yalzhang from comment #5)
> I found both the guest xml shows the machine type is "pc-i440fx-rhel7.5.0".
> And the "installed-rpms" shows the related package version as below:
> libvirt-daemon-3.9.0-14.el7.x86_64
> qemu-kvm-1.5.3-156.el7.x86_64
> 

Hmm... from the customer's sos-report:

$ grep kvm installed-rpms 
libvirt-daemon-kvm-3.9.0-14.el7_5.5.x86_64                  Fri May 18 20:13:00 2018
qemu-kvm-common-rhev-2.10.0-21.el7_5.3.x86_64               Fri May 18 20:09:53 2018
qemu-kvm-rhev-2.10.0-21.el7_5.3.x86_64                      Fri May 18 20:12:51 2018


Where did you pull that info from? Sorry if I'm missing the obvious.

Comment 8 yalzhang@redhat.com 2018-10-29 08:24:17 UTC
Sorry, my bad, the correct version in the sos report is:
libvirt-3.9.0-14.el7_5.5.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.3.x86_64
ipxe-bootimgs-20170123-1.git4e85b27.el7_4.1.noarch
ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch 

In the guest xml, there are several hostdev device, both are PFs, and there is no rom file path specified(check instance-000003a3.xml)
 
<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>


According to my experience, pxe by PF/VF only works when rom file is specified in the xml like:

<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
  **   <rom file='/usr/share/ipxe/80861570.rom'/> **
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </hostdev>


I have test with the xml and x710-2 card, still can not reproduce. 

Laine, is it possible that pxe boot by PF without specify the rom file?

Comment 9 Edu Alcaniz 2018-11-07 08:32:18 UTC
Good morning, can we get an update about this bugzilla, please. It is quite urgent.

Thanks in advance.

Comment 10 Laine Stump 2018-11-09 03:19:26 UTC
First - there is a misleading statement somewhere back in here - the default setting for rom bar was changed not by libvirt, but by qemu, and it happened back in 2011 (first entered rhel in RHEL6.2, libvirt-0.9.4, qemu-0.12), so it's far beyond the date that this would be a new behavior caused by a change in default value.

Second - wow! looking into this has been a trip down memory lane and shows me how much of the last almost-10 years I've completely forgotten about! :-)

For example, take a look at Bug 888635

Since you are assigning a PF, I think you need to take into account that it may have its own ROM on the card, which would explain why seabios is detecting a bootable ROM.


According to Bug 888635, since 2013 libvirt has added "-boot strict=on" to all qemu commandlines, which is supposed to cause it to only attempt booting from devices that have an explicit boot order given (without that, by default  SeaBIOS would *still* attempt to boot from that device if all devices that have a specified priority fail to boot.) As long as the generated qemu commandlines contain -boot strict=on then libvirt is doing all that it can.

However, even if boot strict=on wasn't in the commandline, this shouldn't be a problem as long as there is a higher priority device that *does* boot (even if the only operation once booted is to fail, or simply to reboot, thus jumping back to the top of the list of devices to attempt booting. I have a faint recollection of creating a tiny disk image in the past for exactly this purpose.)

In the screenshot of the boot menu, I see iPXE as the 3rd choice, implying that the SCSI disk will be selected for booting before iPXE is attempted. Is that not happening?

Can you provide the libvirt XML of the guest while it's running, and also the qemu commandline generated from it (tail /var/log/libvirt/$guestname.log)? I'd like to see if -boot strict=on is present.

Comment 11 Edu Alcaniz 2018-11-12 09:49:36 UTC
The qemu logs are empty:

[root@server01 qemu]# ls -arlth instance-000003a1.log*
-rw-------. 1 root root 0 Oct 21 03:14 instance-000003a1.log-20181021
-rw-------. 1 root root 0 Oct 28 03:34 instance-000003a1.log-20181028
-rw-------. 1 root root 0 Nov  4 03:11 instance-000003a1.log-20181104
-rw-------. 1 root root 0 Nov 11 03:46 instance-000003a1.log-20181111
-rw-------. 1 root root 0 Nov 11 03:46 instance-000003a1.log

Running the command line(from ps -ef) you can see the "-boot strict=on" exists in the command line. 

qemu        8271       1 99 Sep24 ?        735-00:18:09 /usr/libexec/qemu-kvm -name guest=instance-000003a1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-000003a1/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,pku=on,stibp=on -m 32768 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/3-instance-000003a1,share=yes,size=34359738368,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-15,memdev=ram-node0 -uuid b8f80e76-2b6c-4042-8539-89f53dbc311d -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.1.0-22.el7ost,serial=5a9751e3-43c7-4e09-b579-c965068ab7be,uuid=b8f80e76-2b6c-4042-8539-89f53dbc311d,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-000003a1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/b8f80e76-2b6c-4042-8539-89f53dbc311d/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/b8f80e76-2b6c-4042-8539-89f53dbc311d/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:3e:76:9c,bus=pci.0,addr=0x3 -add-fd set=2,fd=33 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.16.52.53:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device vfio-pci,host=3b:00.0,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 -device vfio-pci,host=3b:00.1,id=hostdev1,bus=pci.0,addr=0x6,rombar=0 -device vfio-pci,host=3b:00.2,id=hostdev2,bus=pci.0,addr=0x7,rombar=0 -device vfio-pci,host=3d:00.0,id=hostdev3,bus=pci.0,addr=0x8,rombar=0 -device vfio-pci,host=3d:00.1,id=hostdev4,bus=pci.0,addr=0x9,rombar=0 -device vfio-pci,host=3d:00.2,id=hostdev5,bus=pci.0,addr=0xa,rombar=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb -msg timestamp=on

Comment 14 Laine Stump 2018-11-12 14:38:46 UTC
> The qemu logs are empty:

Heh. Looks like the guest hasn't been restarted in a long time, and there is some weekly log rollover script that's being overzealous about gzipping and moving the logs even if they are 0 length. Thanks for grabbing the ps output though - it gives the same info I was looking for.

As you say, -boot strict=on *is* on the commandline, so only the devices specified should be used for booting (as far as I understand boot strict=on anyway). (The example you've given has rom bar=off for the devices, but there is no reason boot strict would change if rombar wasn't specified.)

Also, I'm surprised by the statement that SeaBIOS on this system is attempting to boot from the NIC devices *before* the disk. Even without boot strict that doesn't make sense. Can you verify/confirm that? Also, can you confirm that the behavior was the same when using <boot order='n'/> rather than <boot dev='hd'/>?

(I noticed that the config is using <boot dev='hd'/> in <os> rather than setting <boot order='n'/> in the disk device. I don't know if that should make any difference to boot strict, but I wanted to point it out for Gerd, who is getting the followup question)

Gerd:

Is our long-time understanding of the purpose of "-boot strict=on' correct - that it instructs SeaBIOS to attempt booting *only* from those devices that are explicitly given a boot order, or that are listed in the boot devices (depending on which of the two methods is chosen in the config)?

Comment 15 Gerd Hoffmann 2018-11-13 08:35:38 UTC
  Hi,

> Also, I'm surprised by the statement that SeaBIOS on this system is
> attempting to boot from the NIC devices *before* the disk. Even without boot
> strict that doesn't make sense. Can you verify/confirm that? Also, can you
> confirm that the behavior was the same when using <boot order='n'/> rather
> than <boot dev='hd'/>?

Extending that question:  Does it try to pxeboot before the boot menu shows up?

> Is our long-time understanding of the purpose of "-boot strict=on' correct -
> that it instructs SeaBIOS to attempt booting *only* from those devices that
> are explicitly given a boot order, or that are listed in the boot devices
> (depending on which of the two methods is chosen in the config)?

That is correct.

Typically a boot looks like this:

========== [ cut here ] ==========
SeaBIOS (version rel-1.11.0-50-g14221cd86e-prebuilt.qemu-project.org)

iPXE (http://ipxe.org) 00:03.0 C980 PCI2.10 PnP PMM+07F913B0+07EF13B0 C980
[ note: option rom is loaded here, it should initialize and register a boot
        entry for the nic ]

Press ESC for boot menu.

Booting from Hard Disk...
Boot failed: could not read the boot disk

Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
[ note: pxeboot should happen here, more ipxe messages follow ]
========== [ cut here ] ==========

It is possible though that the card's option rom is rude and goes kick the pxeboot right after loading instead of properly registering a boot entry.

Comment 16 Laine Stump 2018-11-13 14:56:21 UTC
> It is possible though that the card's option rom is rude and goes kick the
> pxeboot right after loading instead of properly registering a boot entry.

Are you suggesting that the ROM might directly start up pxeboot when its initialize function is called, thus pre-empting anything else set in the BIOS? If so, that would be worthy of a very strong hand slap :-/ But wouldn't that cause a similar problem when booting the host system?

Also, I've noticed that on my F29 system, even emulated devices show up in the boot menu (in spite of having -boot strict=on in the commandline) - apparently qemu is finding boot roms for them, mapping them into guest memory space, and then SeaBIOS is offering to boot from them. This sounds counter to what you have confirmed as proper behavior when -boot strict is on...

And just to make Gerd's extra question to the BZ reporter more visible, I'll repeat it:

> Does it try to pxeboot before the boot menu shows up?

Comment 17 Gerd Hoffmann 2018-11-13 21:21:22 UTC
(In reply to Laine Stump from comment #16)
> > It is possible though that the card's option rom is rude and goes kick the
> > pxeboot right after loading instead of properly registering a boot entry.
> 
> Are you suggesting that the ROM might directly start up pxeboot when its
> initialize function is called, thus pre-empting anything else set in the
> BIOS?

Yes.

> If so, that would be worthy of a very strong hand slap :-/

Indeed.

> But wouldn't that cause a similar problem when booting the host system?

Maybe PF and VF have different option rom images.

Possibly it is configurable.  option roms sometimes offer some kind of setup, typically announced with something along the lines "press <hotkey> for <device> setup" at option rom load time (on the host, or in the guest, or both).  If that exists it is worth digging there whenever this behavior can be turned off somewhere in the setup.

Failing that it is worth checking whenever the hardware in question is supported by ipxe, and should that be the case use the ipxe rom instead of the one provided by the hardware.

> Also, I've noticed that on my F29 system, even emulated devices show up in
> the boot menu (in spite of having -boot strict=on in the commandline) -
> apparently qemu is finding boot roms for them, mapping them into guest
> memory space, and then SeaBIOS is offering to boot from them. This sounds
> counter to what you have confirmed as proper behavior when -boot strict is
> on...

strict=on only affects automatic boot, i.e. seabios will not fallback to devices without bootindex=x entry (i.e. the nic in this case) when it could not boot from a device with bootindex=x entry (i.e. the disk in this case).

Manually picking the nic in the boot menu is always possible, no matter whenever strict is on or off.

Comment 21 Gerd Hoffmann 2018-11-14 10:23:07 UTC
Screenshot and video are not that helpful unfortunaly.

Is it possible to enable the boot menu, so the "Press ESC for boot menu." line  shows up on the screen?

libvirt xml for that:

  <os>
    [... ]
    <bootmenu enable='yes'/>
  </os>

Even more helpful would be a seabios logfile.
Can be obtained this way:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  [ ... ]
  </devices>
  <qemu:commandline>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='file,id=firmwarelog,path=/tmp/qemu-firmware.log'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='isa-debugcon,iobase=0x402,chardev=firmwarelog'/>
  </qemu:commandline>
</domain>

Then attach /tmp/qemu-firmware.log to this bug.

Comment 24 Edu Alcaniz 2018-11-14 13:39:39 UTC
The information requested is attached in the bugzilla

Comment 25 Gerd Hoffmann 2018-11-14 20:38:40 UTC
Hmm.  That creates more questions than it answers.

The log (comment 22) looks fairly normal.  option roms are loaded and initialized.  They register boot entries, as seen in the boot menu (both logfile and display video).  virtio disk is first in the menu, as it should.

And then, according to the log, seabios boots from the hard disk.  Apparently it successfully passed control to whatever it loaded from the disk.  There is no error message and something (some boot loader probably) invokes a bunch of vgabios calls.

That is not consistent with the video though.

The "Booting from Hard Disk..." line which is in the log should have been printed to the vga display too.  But it isn't there.  And a "Booting from ROM..." message which seabios prints before booting via option rom isn't there either (not in the logfile and not in the display video).

So I'm wondering how the pxerom is invoked in the first place ...

What happens after the search for a boot server times out?
Does the guest actually boot from the hard disk then?

How does the tail of qemu-firmware.log look like while the pxerom is running (and waiting for a dhcp response)?

Comment 26 Edu Alcaniz 2018-11-15 13:05:26 UTC
I have attached two recording where you can see vm screen and qemu-firmware log at same time. In the one with boot menu I chose first one as boot device.

Comment 28 Gerd Hoffmann 2018-11-16 06:03:59 UTC
Ok, the nic option rom hooks into interrupt 19.

I don't think this is how things are supposed to work.

For legacy option roms hooking into int19 is the way to take over control for boot.

For modern option roms which register a BEV (and show up with a descriptive name in the boot menu) this is not needed because the BEV also contains the entry vector which the bios can call to kick pxe boot.

The intel nic rom does both (register BEV and hook int19) though.

Apparently at least some real hardware has a config option in the bios setup to enable/disable int19 hooks (see https://superuser.com/questions/1000339/interrupt-19-capture-bios-option).  seabios allows this unconditionally.  This nicely explains why we see different behavior on physical hardware.

Comment 29 Gerd Hoffmann 2018-11-16 06:16:02 UTC
So, what are our options?

(a) Add a config option to seabios, simliar to real hardware.
    Problem with this is that we have to wire this up through all virt
    management layers so people can actually make use of it.

(b) Try do to something clever in seabios to avoid the need for a config option,
    for example checking whenever the option rom registered a BEV and in case it
    did do not allow to also hook int19.

(c) Backport commit "d2063b7693 [intelxl] Add driver for Intel 40 Gigabit
    Ethernet NICs" to our ipxe package, then go for 
       <rom file='/usr/share/ipxe/80861572.rom'/>
    in the libvirt config.

Comment 31 Gerd Hoffmann 2018-11-16 07:19:39 UTC
https://people.redhat.com/ghoffman/bz1642135/
This is a test build for variant (3), the ipxe update.

Comment 32 Laine Stump 2018-11-19 16:25:53 UTC
It might provide useful information, but option (c) doesn't help much in production - they can already eliminate the problem by setting <rom bar='off'/> in the libvirt config, but apparently adding a custom <rom> element to the config is problematic to do in OpenStack.

(Also, note that making the default for rombar to "off" isn't acceptable either, since its default has been "on" since 2011 (through 2 major releases of RHEL. Changing it would create havoc among all those installations that expect it to be on).

If it's possible to "do something clever" as you suggest in (b), that would be the most useful.

The real problem here AFAICS is that the firmware in this ultra-modern 40Gb NIC is pulling shenanigans that would have been acceptable in 1985, but in this age really aren't. Should we be filing a bug with Intel (where the problem really lies IMO)?

Comment 33 Gerd Hoffmann 2018-11-20 06:46:24 UTC
> If it's possible to "do something clever" as you suggest in (b), that would
> be the most useful.

Ok, can try that.

> The real problem here AFAICS is that the firmware in this ultra-modern 40Gb
> NIC is pulling shenanigans that would have been acceptable in 1985, but in
> this age really aren't. Should we be filing a bug with Intel (where the
> problem really lies IMO)?

Well, it is more like 1995.  I think the BEV mechanism was added in mid-90ies when PCI support showed up in PCs.  More than two decades ago.

But, yes, you have a point here.  It is kida silly to care about backward compatibility to bios firmware from the 90ies in an option rom for PCI express hardware.

Filing a bug with Intel is worth trying.

Comment 35 Gerd Hoffmann 2018-11-20 07:52:09 UTC
https://people.redhat.com/ghoffman/bz1642135/
Has the seabios update now.

It simply disallows any int19 changes for pnp roms, unconditionally.  It also prints a debug message with some rom info, so we can refine the logic should that be needed.  So, please try this with seabios logfile enabled (see comment 21).

Comment 36 Jaroslav Suchanek 2018-11-21 12:22:13 UTC
Seems like there is nothing libvirt should fix. I am moving this bug to seabios component. Please reset it back if you disagree. Thanks.

Comment 37 Gerd Hoffmann 2018-11-22 08:09:07 UTC
ping, any test results with the seabios update?

Comment 39 Irina Petrova 2018-11-26 09:21:29 UTC
(In reply to Gerd Hoffmann from comment #37)
> ping, any test results with the seabios update?


Negative feedback. Please see the following recording:
Screen_Recording_2018-11-25_at_16.25.54.mov

The file can be found in the internal Google Drive created by Edu Alcaniz (c#27).

Comment 40 Gerd Hoffmann 2018-11-27 05:31:31 UTC
> Negative feedback. Please see the following recording:
> Screen_Recording_2018-11-25_at_16.25.54.mov

Why negative?  What is the problem?
From the screen recording it looks like everthing works as intended.

Comment 41 Irina Petrova 2018-11-27 10:31:00 UTC
Gerd,

Customer feedback verbatim:

"Thanks for update . I have tried and it seems to be skipping pxe boot attempts. Please find attached screen recording shows the log from vm console and /tmp/qemu-firmware.log."


Ah, wait, I just rewatched the video. It does try to pxe-boot first, it can't find an interface, and then it falls back to booting from Hard Drive. Right?

Comment 42 Gerd Hoffmann 2018-11-27 12:00:06 UTC
(In reply to Irina Petrova from comment #41)
> Gerd,
> 
> Customer feedback verbatim:
> 
> "Thanks for update . I have tried and it seems to be skipping pxe boot
> attempts. Please find attached screen recording shows the log from vm
> console and /tmp/qemu-firmware.log."
> 
> 
> Ah, wait, I just rewatched the video. It does try to pxe-boot first, it
> can't find an interface, and then it falls back to booting from Hard Drive.
> Right?

The pxe roms are loaded (this is where the messages printed come from).
The roms are not started for pxe boot (no attempt to dhcp, compare with the other videos), so boot ordering (hard drive has highest priority) works again.

Comment 44 Laszlo Ersek 2018-11-28 22:23:03 UTC
The idea that appears to emerge on the upstream SeaBIOS list is similar to (c) in comment 29. Generally disabling Int19 hooking for oproms is considered risky (it could regress valid oproms, if I understand correctly). And a blacklist of specific oproms, if it existed, should not be maintained within SeaBIOS. (I hope that I've summarized the discussion more or less faithfully.)

Comment 45 Laine Stump 2018-11-29 16:43:44 UTC
But option (c) requires a config special case for that particular card (and the desire to avoid that was the entire purpose of this BZ being filed). And if you're going to require a config change, you may as well just change the config by disabling the rombar, which already works without needing to coordinate with the backport of a commit in ipxe.

I think everyone agrees that the real culprit is the firmware on the card though, and Alex W. told me yesterday that this particular card (Intel XL710) permits updates to its firmware, so how about suggesting that Intel make a firmware update available for the card that doesn't capture int 19h, then the customer can install that update on their hardware once and be done with it - no config changes needed.

I think at the very least someone who knows how to navigate Intel firmware bug reporting should report this to them - maybe it truly is an oversight (if it wasn't, I would expect this behavior from other Intel SRIOV NIC cards, and this is the only one I've heard of), and they'll just respond with "Oh yeah, how did we miss that?!?" (or maybe they'll respond with "Yes, it does do that, and we had to do it because XYZ motherboard had obscure problem PDQ, and this was the only way to fix it.", but at least then we'll know the reason).

Comment 46 Gerd Hoffmann 2018-11-30 11:46:47 UTC
Upstream discussion is still in progress.

May I ask for an additional test?

With the updated seabios and boot menu enabled (see comment 21), will the pxeboot start correctly if one of the NICs is picked in the boot menu?

Comment 49 Gerd Hoffmann 2018-12-07 09:31:42 UTC
Patch has been updated after some upstream discussions:
https://mail.coreboot.org/pipermail/seabios/2018-December/012669.html

New test packages are available:
http://people.redhat.com/ghoffman/bz1642135/

Can you please test this version too?

Behavior should be identical to the previous version (no pxe boot by default, but pxe booot via boot menu should be possible), except for a slightly changed message text in the debug log when seabios reverts the int19 redirection.

Comment 56 Gerd Hoffmann 2019-05-22 06:03:40 UTC
upstream commit 0932c20560574696cf87ddd12623e8c423ee821b

Comment 57 Gerd Hoffmann 2019-05-22 06:08:20 UTC
seabios rebase (to not-yet available 1.13 probably) will pickup the fix.

Comment 61 Ademar Reis 2019-09-23 19:31:03 UTC
Discussing this with Gerd, we should wait for a full rebase of seabios to 1.13 to get the latest patches. This will happen in time for RHEL-AV-8.2.0.

Given this BZ is low/low, deferring it to RHEL-AV-8.2.

Comment 63 Xueqiang Wei 2020-02-06 10:54:46 UTC
Tested with seabios-1.13, not hit this issue. So set status to VERIFIED.


Versions:
Host:
kernel-4.18.0-175.el8.x86_64
qemu-kvm-4.2.0-7.module+el8.2.0+5520+4e5817f3
seabios-1.13.0-1.module+el8.2.0+5520+4e5817f3.x86_64
seabios-bin-1.13.0-1.module+el8.2.0+5520+4e5817f3.noarch
NIC: Intel 82576


Test steps with VF:
   1. Enable one VF, and bind it to vfio-pci.
   2. Boot VM with the VF, but no romfile for it.
      - boot menu=on \
      - device vfio-pci,host=08:00.0,id=vf \

After guest boot, in boot menu, first is the hard disk, and no PXE for the VF. Guest can boot with hard disk by default normally.


Test steps with PF:
   1. Bind PF to vfio-pci.
   # echo 0000:08:00.0 > /sys/bus/pci/devices/0000\:08\:00.0/driver/unbind
   # echo "8086 10c9" > /sys/bus/pci/drivers/vfio-pci/new_id
   # echo "8086 10c9" > /sys/bus/pci/drivers/vfio-pci/remove_id
   2. generate rom file
   #  git clone https://github.com/ipxe/ipxe.git
   # cd ipxe/src
   # make
   # make bin/808610c9.rom
   3. Boot VM with the PF and given romefile:
      - boot menu=on \
      -device vfio-pci,host=08:00.0,id=pf,rombar=1,romfile="/home/wei_test/ipxe/src/bin/808610c9.rom" \
   
After guest boot up, in boot menu, first is the hard disk, then Legacy option rom, and at last it is the iPXE for the PF.  Guest can boot with hard disk by default normally.

4. Full qemu command line:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.6.0  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/wei_test/rhel820-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -m 7168  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'Haswell-noTSX',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -enable-kvm \
    -monitor stdio \
    -device vfio-pci,host=08:00.0,id=pf,rombar=1,romfile="/home/wei_test/ipxe/src/bin/808610c9.rom" \
    -boot menu=on \


Please refer to attachment for screenshots.


If I was wrong, please correct me, Thanks.

Comment 64 Xueqiang Wei 2020-02-06 10:58:34 UTC
Created attachment 1658139 [details]
vf_screenshot

Comment 65 Xueqiang Wei 2020-02-06 10:59:58 UTC
Created attachment 1658140 [details]
pf_screenshot

Comment 67 Jeff Nelson 2021-01-08 16:54:07 UTC
Changing this TestOnly BZ as CLOSED CURRENTRELEASE. Please reopen if the issue is not resolved.


Note You need to log in before you can comment on or make changes to this bug.