Hide Forgot
*** Description of problem: When writing to IO port 0xB2 (ICH9_APM_CNT), QEMU by default injects an SMI only on the VCPU that is writing the port. This has exposed corner cases and strange behavior with edk2 code, which generally expects a software SMI to affect all CPUs at once. We've experienced instability despite the fact that OVMF sets PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode differently from the UefiCpuPkg defaults, such that they match QEMU's unicast SMIs better. (Refer to edk2 commits 9b1e378811ff and bb0f18b0bce6.) There are two known groups of symptoms. The first is a performance issue: when the SMI is raised on an AP (that is, VCPU#1, VCPU#2, ..., just not VCPU#0), then the the BSP and AP synchronization is slow; it may take several seconds until all VCPUs are pulled into SMM. The second group of symptoms is general instability, which can manifest in KVM emulation failures, especially during ACPI S3 suspend-resume. It can be experienced more markedly in Ia32 guests (rather than Ia32X64 guests), and ranges from crashes / hangs to "lost APs" (that is, after resume, some of the originally present VCPUs don't exist / are not rebooted). *** Version-Release number of selected component (if applicable): ovmf-20160608b-1.git988715a.el7 *** How reproducible: The first symptom is 100% reproducible. The second group of symptoms is harder to reproduce, in particular with Ia32X64 builds. (Note that RHEL7 ships only Ia32X64; it doesn't ship Ia32.) *** Steps to Reproduce: For the first symptom: A.1. Boot an SMM-enabled Q35 guest with OVMF. Use 4 VCPUs. The guest OS should be RHEL7, or a recent Fedora release. A.2. Open a root shell in the guest, and issue the following command: time taskset -c 0 efibootmgr A.3. Issue the following command: time taskset -c 1 efibootmgr For the second symptom (NOTE: we don't support this configuration, but for QE purposes it can be enabled): B.1. Use the same virtual machine as in (A.1.), but also enable ACPI S3 suspend-resume, with the following domain XML snippet (see <http://libvirt.org/formatdomain.html#elementsPowerManagement>): <domain> <pm> <suspend-to-disk enabled='no'/> <suspend-to-mem enabled='yes'/> </pm> </domain> Also, the video card should be QXL or standard VGA. B.2. Open a root shell in the guest, and issue the following commands: X=0 while :; do pm-suspend echo -n "iteration=$((X++)) #VCPUs=" grep -c -i '^processor' /proc/cpuinfo done B.3. Whenever the guest is suspended (keep an eye on virt-manager to see the guest's status), press Enter in the guest's window, to run another iteration of the loop. *** Actual results: For the first symptom: The (A.2.) step will complete almost immediately, but the (A.3.) step will take several seconds. For the second group of symptoms: As the iteration counter increases, at some point - the #VCPUs result will fall from 4 to 3 (or lower), - or the S3 resume will fail completely. In this case, the status of the VM might switch from Running or Suspended to Paused in virt-manager (meaning a crash), and the QEMU stderr captured under /var/log/libvirt/qemu may report a KVM emulation failure or other guest crash. (It's not a QEMU process crash.) Note that triggering the second group of symptoms might take hundreds of iterations. *** Expected results: For the first symptom: The (A.3.) step should complete as quickly as the (A.2.) step. For the second group of symptoms: No VCPU should be "lost", and no guest crash should occur, during S3 resume. *** Additional info: - Upstream tracker: <https://bugzilla.tianocore.org/show_bug.cgi?id=230> - Solving this in OVMF requires a new QEMU feature. The RHBZ for that feature will be filed later, and it will block this bug report. - The OVMF solution requires several patches, some of which have already been committed to edk2. Given that they are scattered over a larger time range, plus that Intel implemented numerous SMM changes meanwhile, this RHBZ shall be resolved as part of an OVMF rebase.
Fixed in upstream commit range 7c609a144b66..a316d7ac91d3.
1.Reproduced this bug with OVMF-20160608b-1.git988715a.el7.noarch. A.1. Boot an SMM-enabled Q35 guest with OVMF. Use 4 VCPUs. The guest OS should be RHEL7 result: Boot guest with -q35,smm=on A.2. Open a root shell in the guest, and issue the following command: time taskset -c 0 efibootmgr A.3. Issue the following command: time taskset -c 1 efibootmgr result: # time taskset -c 0 efibootmgr BootCurrent: 0004 Timeout: 0 seconds BootOrder: 0001,0005,0004,0003,0000 Boot0000* UiApp Boot0001* Red Hat Enterprise Linux Boot0003* UEFI QEMU DVD-ROM QM00011 Boot0004* UEFI QEMU QEMU HARDDISK Boot0005* UEFI PXEv4 (MAC:089E01C26D6E) real 0m0.008s user 0m0.001s sys 0m0.008s # time taskset -c 1 efibootmgr BootCurrent: 0004 Timeout: 0 seconds BootOrder: 0001,0005,0004,0003,0000 Boot0000* UiApp Boot0001* Red Hat Enterprise Linux Boot0003* UEFI QEMU DVD-ROM QM00011 Boot0004* UEFI QEMU QEMU HARDDISK Boot0005* UEFI PXEv4 (MAC:089E01C26D6E) real 0m4.816s user 0m0.000s sys 0m4.815s B.1. Use the same virtual machine as in (A.1.), but also enable ACPI S3 suspend-resume, with the following domain XML snippet (see <http://libvirt.org/formatdomain.html#elementsPowerManagement>): <domain> <pm> <suspend-to-disk enabled='no'/> <suspend-to-mem enabled='yes'/> </pm> </domain> Also, the video card should be QXL or standard VGA. result: -vga qxl -global ICH9-LPC.disable_s3=0 -global ICH9-LPC.disable_s4=1 B.2. Open a root shell in the guest, and issue the following commands: X=0 while :; do pm-suspend echo -n "iteration=$((X++)) #VCPUs=" grep -c -i '^processor' /proc/cpuinfo done B.3. Whenever the guest is suspended (keep an eye on virt-manager to see the guest's status), press Enter in the guest's window, to run another iteration of the loop. (qemu) KVM internal error. Suberror: 1 KVM internal error. Suberror: 1 emulation failure emulation failure RAX=0000000000000000 RBX=0000000000000000 RCX=000000007ffcd550 RDX=000000007ffcd550 RSI=000000000009e000 RDI=000000007fe797a8 RBP=0000000000000000 RSP=000000007e5cd000 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=000000000009e0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 000000007f704000 00000047 IDT= 000000007f704048 00000fff CR0=e0000011 CR2=0000000000000000 CR3=000000007ff68000 CR4=00000220 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000500 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? RAX=0000000000000002 RBX=0000000000000000 RCX=000000007ffcd550 RDX=000000007ffcd550 RSI=000000000009e000 RDI=000000007fe797d8 RBP=0000000000000000 RSP=000000007e5c5000 R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000 R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000 RIP=000000000009e0fd RFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy GDT= 000000007f704000 00000047 IDT= 000000007f704048 00000fff CR0=e0000011 CR2=0000000000000000 CR3=000000007ff68000 CR4=00000220 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000500 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 2.Verified this bug with OVMF-20170228-4.gitc325e41585e3.el7.noarch # time taskset -c 0 efibootmgr BootCurrent: 0004 Timeout: 0 seconds BootOrder: 0003,0004,0001,0000 Boot0000* UiApp Boot0001* UEFI QEMU DVD-ROM QM00011 Boot0003* UEFI PXEv4 (MAC:089E01C26D6E) Boot0004* Red Hat Enterprise Linux real 0m0.061s user 0m0.001s sys 0m0.012s # time taskset -c 1 efibootmgr BootCurrent: 0004 Timeout: 0 seconds BootOrder: 0003,0004,0001,0000 Boot0000* UiApp Boot0001* UEFI QEMU DVD-ROM QM00011 Boot0003* UEFI PXEv4 (MAC:089E01C26D6E) Boot0004* Red Hat Enterprise Linux real 0m0.008s user 0m0.000s sys 0m0.008s For B.2 scenario, KVM internal error is gone. Bug guest can not resume from S3. I used 2 methods to test S3. 1)#pm-suspend in guest 2)#echo mem >/sys/power/state It is different from expected result. Could you help confirm it? Thanks. This is qemu command line as below. /usr/libexec/qemu-kvm -M q35,smm=on -cpu Westmere -nodefaults -rtc base=utc -m 2G -smp 4,sockets=2,cores=2,threads=1 -enable-kvm -name rhel7.4 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -global driver=cfi.pflash01,property=secure,value=on -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ide-cd,drive=cdrom1,id=ide-cd1,bootindex=4 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0 -drive file=/home/ovmf/guest/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -k en-us -debugcon file:/home/test/ovmf.log -global isa-debugcon.iobase=0x402 -serial unix:/tmp/console,server,nowait -boot menu=on,splash-time=100 -qmp tcp::4446,server,nowait -drive file=/home/ovmf/guest/rhel7.4-ovmf.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-scsi-pci,id=scsi1,disable-legacy=off,disable-modern=off -device scsi-hd,id=virtio-disk0,drive=drive0,bus=scsi1.0,bootindex=3 -vnc :1 -monitor stdio -device virtio-net-pci,netdev=tap10,mac=08:9e:01:c2:6d:6e,disable-legacy=off,disable-modern=off,bootindex=2 -netdev tap,id=tap10 -smbios type=1,manufacturer=redhat-kvmqe,product=rhel7.4-kvm,version=7.444444,serial=123456789,uuid=4C4C4544-0044-3010-8047-B4C04F313232,sku=fuxc,family=rhel7 -vga qxl -global ICH9-LPC.disable_s3=0 -global ICH9-LPC.disable_s4=1
The reproduction steps (A.1 through A.3, and B.1 through B.3) are correct. The verification for test case "A" is also correct. Regarding the S3 resume failure in the verification of test case "B". That is not a definitive problem. S3 resume requires a lot of guest OS cooperation, and the quality of that has been very un-even over time. Which is why we don't officially support S3 in our virtual machines. The most frequent problem with apparent S3 resume failure is a video driver (or more generic video subsystem) issue in the guest OS. You didn't say what guest OS you used for testing -- for example, with a Fedora guest, the S3 resume experience can vary from kernel update to kernel update. So here's what I think: - the imporant test case (A) has been verified; that is sufficient for setting this BZ to VERIFIED - if you wish to spend more time on checking (B), using this same guest, I recommend the following workarounds: - Install a graphical (X11) environment in the guest. And, after you resume it from S3 sleep, cycle the virtual consoles between "text console" and "GUI" a few times, by sending Ctrl+Alt+F1 <-> Ctrl+Alt+F2 repeatedly. Sometimes this is enough to restore video to a working state. - Alternatively, try to ping, or ssh into, the VM, or else try to use its serial console, after resuming it. This elimintes video. But, if S3 resume doesn't work even with the above workarounds, that's fine. You verified the important test case. It's up to you if you'd like to spend more time on the S3 case. Thanks!
Thanks Laszlo's explanation detailed. I will set this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2056