Bug 1412313 - select broadcast SMI if available
Summary: select broadcast SMI if available
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ovmf
Version: 7.3
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 7.4
Assignee: Laszlo Ersek
QA Contact: FuXiangChun
URL:
Whiteboard:
Depends On: 1412327 Red Hat1416919
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-11 18:10 UTC by Laszlo Ersek
Modified: 2017-08-01 22:22 UTC (History)
10 users (show)

Fixed In Version: ovmf-20170228-1.gitc325e41585e3.el7
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 22:22:15 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2056 0 normal SHIPPED_LIVE new packages: OVMF 2017-08-01 19:34:11 UTC
TianoCore 230 0 None None None 2019-06-27 19:41:47 UTC

Description Laszlo Ersek 2017-01-11 18:10:27 UTC
*** Description of problem:

When writing to IO port 0xB2 (ICH9_APM_CNT), QEMU by default injects an SMI
only on the VCPU that is writing the port. This has exposed corner cases and
strange behavior with edk2 code, which generally expects a software SMI to
affect all CPUs at once. We've experienced instability despite the fact that
OVMF sets PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode differently from the
UefiCpuPkg defaults, such that they match QEMU's unicast SMIs better. (Refer
to edk2 commits 9b1e378811ff and bb0f18b0bce6.)

There are two known groups of symptoms.

The first is a performance issue: when the SMI is raised on an AP (that is,
VCPU#1, VCPU#2, ..., just not VCPU#0), then the the BSP and AP
synchronization is slow; it may take several seconds until all VCPUs are
pulled into SMM.

The second group of symptoms is general instability, which can manifest in
KVM emulation failures, especially during ACPI S3 suspend-resume. It can be
experienced more markedly in Ia32 guests (rather than Ia32X64 guests), and
ranges from crashes / hangs to "lost APs" (that is, after resume, some of
the originally present VCPUs don't exist / are not rebooted).

*** Version-Release number of selected component (if applicable):

ovmf-20160608b-1.git988715a.el7

*** How reproducible:

The first symptom is 100% reproducible.

The second group of symptoms is harder to reproduce, in particular with
Ia32X64 builds. (Note that RHEL7 ships only Ia32X64; it doesn't ship Ia32.)

*** Steps to Reproduce:

For the first symptom:

A.1. Boot an SMM-enabled Q35 guest with OVMF. Use 4 VCPUs. The guest OS
     should be RHEL7, or a recent Fedora release.

A.2. Open a root shell in the guest, and issue the following command:

     time taskset -c 0 efibootmgr

A.3. Issue the following command:

     time taskset -c 1 efibootmgr

For the second symptom (NOTE: we don't support this configuration, but for
QE purposes it can be enabled):

B.1. Use the same virtual machine as in (A.1.), but also enable ACPI S3
     suspend-resume, with the following domain XML snippet (see
     <http://libvirt.org/formatdomain.html#elementsPowerManagement>):

     <domain>
       <pm>
         <suspend-to-disk enabled='no'/>
         <suspend-to-mem enabled='yes'/>
       </pm>
     </domain>

     Also, the video card should be QXL or standard VGA.

B.2. Open a root shell in the guest, and issue the following commands:

     X=0
     while :; do
       pm-suspend
       echo -n "iteration=$((X++)) #VCPUs="
       grep -c -i '^processor'  /proc/cpuinfo
     done

B.3. Whenever the guest is suspended (keep an eye on virt-manager to see the
     guest's status), press Enter in the guest's window, to run another
     iteration of the loop.

*** Actual results:

For the first symptom:

The (A.2.) step will complete almost immediately, but the (A.3.) step will
take several seconds.

For the second group of symptoms:

As the iteration counter increases, at some point

- the #VCPUs result will fall from 4 to 3 (or lower),

- or the S3 resume will fail completely. In this case, the status of the VM
  might switch from Running or Suspended to Paused in virt-manager (meaning
  a crash), and the QEMU stderr captured under /var/log/libvirt/qemu may
  report a KVM emulation failure or other guest crash. (It's not a QEMU
  process crash.)

Note that triggering the second group of symptoms might take hundreds of
iterations.

*** Expected results:

For the first symptom:

The (A.3.) step should complete as quickly as the (A.2.) step.

For the second group of symptoms:

No VCPU should be "lost", and no guest crash should occur, during S3 resume.

*** Additional info:

- Upstream tracker: <https://bugzilla.tianocore.org/show_bug.cgi?id=230>

- Solving this in OVMF requires a new QEMU feature. The RHBZ for that
  feature will be filed later, and it will block this bug report.

- The OVMF solution requires several patches, some of which have already
  been committed to edk2. Given that they are scattered over a larger time
  range, plus that Intel implemented numerous SMM changes meanwhile, this
  RHBZ shall be resolved as part of an OVMF rebase.

Comment 1 Laszlo Ersek 2017-02-07 12:22:42 UTC
Fixed in upstream commit range 7c609a144b66..a316d7ac91d3.

Comment 5 FuXiangChun 2017-05-03 06:38:31 UTC
1.Reproduced this bug with OVMF-20160608b-1.git988715a.el7.noarch.

A.1. Boot an SMM-enabled Q35 guest with OVMF. Use 4 VCPUs. The guest OS
     should be RHEL7

result:
Boot guest with -q35,smm=on

A.2. Open a root shell in the guest, and issue the following command:

     time taskset -c 0 efibootmgr

A.3. Issue the following command:

     time taskset -c 1 efibootmgr

result:

# time taskset -c 0 efibootmgr
BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0001,0005,0004,0003,0000
Boot0000* UiApp
Boot0001* Red Hat Enterprise Linux
Boot0003* UEFI QEMU DVD-ROM QM00011
Boot0004* UEFI QEMU QEMU HARDDISK
Boot0005* UEFI PXEv4 (MAC:089E01C26D6E)

real    0m0.008s
user    0m0.001s
sys     0m0.008s

# time taskset -c 1 efibootmgr
BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0001,0005,0004,0003,0000
Boot0000* UiApp
Boot0001* Red Hat Enterprise Linux
Boot0003* UEFI QEMU DVD-ROM QM00011
Boot0004* UEFI QEMU QEMU HARDDISK
Boot0005* UEFI PXEv4 (MAC:089E01C26D6E)

real    0m4.816s
user    0m0.000s
sys     0m4.815s


B.1. Use the same virtual machine as in (A.1.), but also enable ACPI S3
     suspend-resume, with the following domain XML snippet (see
     <http://libvirt.org/formatdomain.html#elementsPowerManagement>):

     <domain>
       <pm>
         <suspend-to-disk enabled='no'/>
         <suspend-to-mem enabled='yes'/>
       </pm>
     </domain>

     Also, the video card should be QXL or standard VGA.

result:
-vga qxl -global ICH9-LPC.disable_s3=0 -global ICH9-LPC.disable_s4=1


B.2. Open a root shell in the guest, and issue the following commands:

     X=0
     while :; do
       pm-suspend
       echo -n "iteration=$((X++)) #VCPUs="
       grep -c -i '^processor'  /proc/cpuinfo
     done

B.3. Whenever the guest is suspended (keep an eye on virt-manager to see the
     guest's status), press Enter in the guest's window, to run another
     iteration of the loop.

(qemu) KVM internal error. Suberror: 1
KVM internal error. Suberror: 1
emulation failure
emulation failure
RAX=0000000000000000 RBX=0000000000000000 RCX=000000007ffcd550 RDX=000000007ffcd550
RSI=000000000009e000 RDI=000000007fe797a8 RBP=0000000000000000 RSP=000000007e5cd000
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=000000000009e0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000007f704000 00000047
IDT=     000000007f704048 00000fff
CR0=e0000011 CR2=0000000000000000 CR3=000000007ff68000 CR4=00000220
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffcd550 RDX=000000007ffcd550
RSI=000000000009e000 RDI=000000007fe797d8 RBP=0000000000000000 RSP=000000007e5c5000
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=000000000009e0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000007f704000 00000047
IDT=     000000007f704048 00000fff
CR0=e0000011 CR2=0000000000000000 CR3=000000007ff68000 CR4=00000220
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

2.Verified this bug with OVMF-20170228-4.gitc325e41585e3.el7.noarch

# time taskset -c 0 efibootmgr
BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0003,0004,0001,0000
Boot0000* UiApp
Boot0001* UEFI QEMU DVD-ROM QM00011 
Boot0003* UEFI PXEv4 (MAC:089E01C26D6E)
Boot0004* Red Hat Enterprise Linux

real	0m0.061s
user	0m0.001s
sys	0m0.012s

# time taskset -c 1 efibootmgr
BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0003,0004,0001,0000
Boot0000* UiApp
Boot0001* UEFI QEMU DVD-ROM QM00011 
Boot0003* UEFI PXEv4 (MAC:089E01C26D6E)
Boot0004* Red Hat Enterprise Linux

real	0m0.008s
user	0m0.000s
sys	0m0.008s

For B.2 scenario, KVM internal error is gone. Bug guest can not resume from S3. I used 2 methods to test S3. 

1)#pm-suspend in guest

2)#echo mem >/sys/power/state

It is different from expected result. Could you help confirm it? Thanks.


This is qemu command line as below.

/usr/libexec/qemu-kvm -M q35,smm=on -cpu Westmere -nodefaults -rtc base=utc -m 2G -smp 4,sockets=2,cores=2,threads=1 -enable-kvm -name rhel7.4 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -global driver=cfi.pflash01,property=secure,value=on -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ide-cd,drive=cdrom1,id=ide-cd1,bootindex=4 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0 -drive file=/home/ovmf/guest/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -k en-us -debugcon file:/home/test/ovmf.log -global isa-debugcon.iobase=0x402 -serial unix:/tmp/console,server,nowait -boot menu=on,splash-time=100 -qmp tcp::4446,server,nowait -drive file=/home/ovmf/guest/rhel7.4-ovmf.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-scsi-pci,id=scsi1,disable-legacy=off,disable-modern=off -device scsi-hd,id=virtio-disk0,drive=drive0,bus=scsi1.0,bootindex=3 -vnc :1 -monitor stdio -device virtio-net-pci,netdev=tap10,mac=08:9e:01:c2:6d:6e,disable-legacy=off,disable-modern=off,bootindex=2 -netdev tap,id=tap10 -smbios type=1,manufacturer=redhat-kvmqe,product=rhel7.4-kvm,version=7.444444,serial=123456789,uuid=4C4C4544-0044-3010-8047-B4C04F313232,sku=fuxc,family=rhel7 -vga qxl -global ICH9-LPC.disable_s3=0 -global ICH9-LPC.disable_s4=1

Comment 6 Laszlo Ersek 2017-05-03 12:33:18 UTC
The reproduction steps (A.1 through A.3, and B.1 through B.3) are correct.

The verification for test case "A" is also correct.

Regarding the S3 resume failure in the verification of test case "B". That is not a definitive problem. S3 resume requires a lot of guest OS cooperation, and the quality of that has been very un-even over time. Which is why we don't officially support S3 in our virtual machines.

The most frequent problem with apparent S3 resume failure is a video driver (or more generic video subsystem) issue in the guest OS. You didn't say what guest OS you used for testing -- for example, with a Fedora guest, the S3 resume experience can vary from kernel update to kernel update. So here's what I think:

- the imporant test case (A) has been verified; that is sufficient for setting this BZ to VERIFIED

- if you wish to spend more time on checking (B), using this same guest, I recommend the following workarounds:

  - Install a graphical (X11) environment in the guest. And, after you resume
    it from S3 sleep, cycle the virtual consoles between "text console" and
    "GUI" a few times, by sending Ctrl+Alt+F1 <-> Ctrl+Alt+F2 repeatedly.
    Sometimes this is enough to restore video to a working state.

  - Alternatively, try to ping, or ssh into, the VM, or else try to use its
    serial console, after resuming it. This elimintes video.

But, if S3 resume doesn't work even with the above workarounds, that's fine. You verified the important test case. It's up to you if you'd like to spend more time on the S3 case. Thanks!

Comment 7 FuXiangChun 2017-05-03 13:53:44 UTC
Thanks Laszlo's explanation detailed. I will set this bug as verified.

Comment 8 errata-xmlrpc 2017-08-01 22:22:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2056


Note You need to log in before you can comment on or make changes to this bug.