Bug 1412327

Summary: RFE: negotiable broadcast SMI for Q35
Product: Red Hat Enterprise Linux 7 Reporter: Laszlo Ersek <lersek>
Component: qemu-kvm-rhevAssignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: jingzhao <jinzhao>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.4CC: chayang, jinzhao, juzhang, lersek, michen, mrezanin, mtessun, virt-maint
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:42:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1227278    
Bug Blocks: 1412313    

Description Laszlo Ersek 2017-01-11 18:44:33 UTC
*** Description of problem:

The generic edk2 SMM infrastructure prefers
EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If
Trigger() only brings the current processor into SMM, then edk2 handles it
in the following ways:

(1) If Trigger() is executed by the BSP (which is guaranteed before
    ExitBootServices(), but is not necessarily true at runtime), then:

    (a) If edk2 has been configured for "traditional" SMM synchronization,
        then the BSP sends directed SMIs to the APs with APIC delivery,
        bringing them into SMM individually. Then the BSP runs the SMI
        handler / dispatcher.

    (b) If edk2 has been configured for "relaxed" SMM synchronization, then
        the APs that are not already in SMM are not brought in, and the BSP
        runs the SMI handler / dispatcher.

(2) If Trigger() is executed by an AP (which is possible after
    ExitBootServices(), and can be forced e.g. by "taskset -c 1
    efibootmgr"), then the AP in question brings in the BSP with a directed
    SMI, and the BSP runs the SMI handler / dispatcher.

The smaller problem with (1a) and (2) is that the BSP and AP synchronization
is slow. For example, the "taskset -c 1 efibootmgr" command from (2) can
take more than 3 seconds to complete, because efibootmgr accesses
non-volatile UEFI variables intensively.

The larger problem is that QEMU's current behavior diverges from the
behavior usually seen on physical hardware, and that keeps exposing obscure
corner cases, race conditions and other instabilities in edk2, which
generally expects / prefers a software SMI to affect all CPUs at once.

Therefore introduce the "broadcast SMI" feature that causes QEMU to inject
the SMI on all VCPUs.

*** Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.8.0-1.el7

*** How reproducible:
*** Steps to Reproduce:
*** Actual results:
*** Expected results:

Please refer to bug 1412313.

*** Additional info:
Versions of the patch set posted thus far:

v1:
http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html

v2:
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02687.html

v3:
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg03582.html

v4:
http://lists.nongnu.org/archive/html/qemu-devel/2016-12/msg00129.html

v5 wave 1:
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg01897.html

v5 wave 2:
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg01902.html

Comment 4 Laszlo Ersek 2017-01-25 12:45:57 UTC
v6 wave 1 commits:

baf2d5bfbac0 fw-cfg: support writeable blobs
e12f3a13e2e1 fw-cfg: turn FW_CFG_FILE_SLOTS into a device property
d580bd4b73f0 pc: Add 2.9 machine-types
a5b3ebfd23bc fw-cfg: bump "x-file-slots" to 0x20 for 2.9+ machine types

Comment 6 Laszlo Ersek 2017-01-30 14:21:33 UTC
v7 wave 2 commits:

50de920b372b hw/isa/lpc_ich9: add SMI feature negotiation via fw_cfg
5ce45c7a2b15 hw/isa/lpc_ich9: add broadcast SMI feature
b8bab8eb6934 hw/isa/lpc_ich9: negotiate SMI broadcast on pc-q35-2.9+ machine
             types

Comment 7 jingzhao 2017-02-13 05:25:13 UTC
Hi Lazslo

How to reproduce the bz, could you share with me? According to the comments, I just do the following steps, am I right?

Version:
      qemu-kvm-rhev-2.8.0-3.el7.x86_64
   
Reproduced Steps:

1. Boot guest with qemu command
2. In guest, do the following command

[root@localhost ~]# taskset -c 0 efibootmgr
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001,0000,0002,0003
Boot0000* UiApp
Boot0001* Red Hat Enterprise Linux
Boot0002* UEFI Misc Device
Boot0003* UEFI PXEv4 (MAC:9A6A6B6C6D6E)

[root@localhost ~]# taskset -c 1 efibootmgr
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001,0000,0002,0003
Boot0000* UiApp
Boot0001* Red Hat Enterprise Linux
Boot0002* UEFI Misc Device
Boot0003* UEFI PXEv4 (MAC:9A6A6B6C6D6E)

Actual Result:
"taskset -c 1 efibootmgr" responsed slowly 

[1]
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 4,sockets=4,cores=1,threads=1 \
-enable-kvm \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-nodefaults \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/home/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-serial unix:/tmp/serial0,server,nowait \
-debugcon file:/home/ovmf.log \
-global isa-debugcon.iobase=0x402 \
-boot menu=on \
-qmp tcp:0:6666,server,nowait \
-vga qxl \
-spice port=5932,disable-ticketing \
-drive file=/home/test/ovmf.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \
-device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \
-monitor stdio \


Thanks
Jing Zhao

Comment 8 Laszlo Ersek 2017-02-13 17:46:04 UTC
(In reply to jingzhao from comment #7)

> Reproduced Steps:
> 
> 1. Boot guest with qemu command
> 2. In guest, do the following command
> 
> [root@localhost ~]# taskset -c 0 efibootmgr
> BootCurrent: 0001
> Timeout: 0 seconds
> BootOrder: 0001,0000,0002,0003
> Boot0000* UiApp
> Boot0001* Red Hat Enterprise Linux
> Boot0002* UEFI Misc Device
> Boot0003* UEFI PXEv4 (MAC:9A6A6B6C6D6E)
> 
> [root@localhost ~]# taskset -c 1 efibootmgr
> BootCurrent: 0001
> Timeout: 0 seconds
> BootOrder: 0001,0000,0002,0003
> Boot0000* UiApp
> Boot0001* Red Hat Enterprise Linux
> Boot0002* UEFI Misc Device
> Boot0003* UEFI PXEv4 (MAC:9A6A6B6C6D6E)
> 
> Actual Result:
> "taskset -c 1 efibootmgr" responsed slowly 

Yes, this is correct. It is how you can reproduce it most easily. (See "A.3." in bug 1412313 comment 0.)

However, your QEMU command line is not entirely correct:

> 
> [1]
> /usr/libexec/qemu-kvm \
> -M q35 \
> -cpu SandyBridge \
> -nodefaults -rtc base=utc \
> -m 4G \
> -smp 4,sockets=4,cores=1,threads=1 \
> -enable-kvm \

* Please drop the "-M q35" and "-enable-kvm" options, and add the following one instead:

  -machine q35,smm=on,accel=kvm

* Additionally, please append the following switch:

  -global driver=cfi.pflash01,property=secure,value=on

The rest looks okay.

Thanks!
Laszlo

> -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
> -k en-us \
> -nodefaults \
> -drive
> file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,
> readonly=on \
> -drive file=/home/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
> -serial unix:/tmp/serial0,server,nowait \
> -debugcon file:/home/ovmf.log \
> -global isa-debugcon.iobase=0x402 \
> -boot menu=on \
> -qmp tcp:0:6666,server,nowait \
> -vga qxl \
> -spice port=5932,disable-ticketing \
> -drive
> file=/home/test/ovmf.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,
> cache=none,werror=stop,rerror=stop \
> -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \
> -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev
> tap,id=tap10 \
> -monitor stdio \

Comment 9 jingzhao 2017-04-24 06:06:31 UTC
Reproduce the issue on qemu-kvm-rhev-2.8.0-6.el7.x86_64 
Verified the issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64

Following are the detailed info

1.Boot guest with qemu command line [1]
2.In guest, executed the command line
[root@localhost ~]# taskset -c 0 efibootmgr 
BootCurrent: 0003
Timeout: 0 seconds
BootOrder: 0003,0000,0001,0002
Boot0000* UiApp
Boot0001* UEFI Misc Device
Boot0002* UEFI PXEv4 (MAC:9A6A6B6C6D6E)
Boot0003* Red Hat Enterprise Linux
[root@localhost ~]# taskset -c 1 efibootmgr 
BootCurrent: 0003
Timeout: 0 seconds
BootOrder: 0003,0000,0001,0002
Boot0000* UiApp
Boot0001* UEFI Misc Device
Boot0002* UEFI PXEv4 (MAC:9A6A6B6C6D6E)
Boot0003* Red Hat Enterprise Linux

3."taskset -c 1 efibootmgr" responsed quickly with qemu-kvm-rhev-2.9.0-1.el7.x86_64

[1]/usr/libexec/qemu-kvm \
-machine q35,smm=on,accel=kvm \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 4,sockets=4,cores=1,threads=1 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-k en-us \
-nodefaults \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/home/test/rhel/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-global driver=cfi.pflash01,property=secure,value=on \
-serial unix:/tmp/serial0,server,nowait \
-debugcon file:/home/ovmf.log \
-global isa-debugcon.iobase=0x402 \
-boot menu=on \
-qmp tcp:0:6666,server,nowait \
-vga qxl \
-vnc :0 \
-drive file=/home/test/rhel/ovmf-rhel7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \
-device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10 \
-monitor stdio \

Thanks
Jing

Comment 11 errata-xmlrpc 2017-08-01 23:42:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 12 errata-xmlrpc 2017-08-02 01:19:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 13 errata-xmlrpc 2017-08-02 02:11:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 14 errata-xmlrpc 2017-08-02 02:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 03:17:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392