Bug 1447027
Summary: | Guest cannot boot with 240 or above vcpus when using ovmf | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Guo, Zhiyi <zhguo> | ||||||||||||||
Component: | ovmf | Assignee: | Laszlo Ersek <lersek> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | unspecified | ||||||||||||||||
Version: | 7.4 | CC: | areis, chayang, juzhang, kraxel, lersek, michen, mrezanin, pbonzini, rkrcmar, Robert.Hu, xfu, zhguo | ||||||||||||||
Target Milestone: | rc | Keywords: | TestOnly | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | All | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | ovmf-20171011-1.git92d07e48907f.el7 | Doc Type: | If docs needed, set a value | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2018-04-10 16:28:00 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | 1469787 | ||||||||||||||||
Bug Blocks: | 1468526 | ||||||||||||||||
Attachments: |
|
Description
Guo, Zhiyi
2017-05-01 11:07:08 UTC
Created attachment 1275384 [details]
ovmf logs
guest can boot with 200 vcpus -- good.log
guest cannot boot with 240 vcpus -- bad.log
This is an out-of-SMRAM condition. The Q35 board provides 1MB, 2MB or 8MB of SMRAM in TSEG (configurably), OVMF already picks 8MB. In bug 1341733, we encountered a stack overflow in SMM, while running EnrollDefaultKeys.efi. That issue was addressed with the upstream commits 509f8425b75d UefiCpuPkg: change PcdCpuSmmStackGuard default to TRUE 0d0c245dfb14 OvmfPkg: set SMM stack size to 16KB commit 0d0c245dfb147956cb597582bc481579ceb612c0 Author: Laszlo Ersek <lersek> Date: Wed Jun 1 19:59:52 2016 +0200 OvmfPkg: set SMM stack size to 16KB The default stack size (from UefiCpuPkg/UefiCpuPkg.dec) is 8KB, which proved too small (i.e., led to stack overflow) across commit range 98c2d9610506^..f85d3ce2efc2^, during certificate enrollment into "db". As the edk2 codebase progresses and OVMF keeps including features, the stack demand constantly fluctuates; double the SMM stack size for good measure. Raising the SMM stack size from 8KB to 16KB incurs an extra 8KB SMRAM demand, *per VCPU*. Around 240 VCPUs, this is almost 2MB of SMRAM, from the 8MB total. The one thing we can try here is an SMM stack size of 12KB, which should recover about 1MB of SMRAM, for a VCPU count of ~240. If that helps here, then bug 1341733 has to be re-verified too. If lowering the SMM stack size to 12KB does not help, then something way more intrusive will be necessary. (I don't know what.) Lowering the SMM stack size to 12KB did not help, unfortunately. I asked on edk2-devel whether we could do anything about this in the short term, by tweaking various build knobs: https://lists.01.org/pipermail/edk2-devel/2017-May/010371.html Otherwise, we might have to look into increasing the SMRAM size on Q35. (Later.) Based on the upstream discussion (comment 10), a larger, firmware-detectable TSEG size appears the preferred solution. IMO, that should be possible for upstream QEMU 2.10. Posted QEMU RFC patch: [RFC] q35/mch: implement extended TSEG sizes http://mid.mail-archive.com/20170530212614.18343-1-lersek@redhat.com Posted QEMU patch (identical to the RFC in comment 14): [PATCH] q35/mch: implement extended TSEG sizes http://mid.mail-archive.com/20170608161013.17920-1-lersek@redhat.com Posted upstream edk2 series: [PATCH 0/5] OvmfPkg: recognize an extended TSEG when QEMU offers it https://lists.01.org/pipermail/edk2-devel/2017-June/011452.html http://mid.mail-archive.com/20170608171333.17937-1-lersek@redhat.com Posted qtest patches for upstream QEMU (on top of the patch linked in comment 15): [PATCH 0/2] tests/q35-test: add TSEG size checks http://mid.mail-archive.com/20170616112201.24512-1-lersek@redhat.com QEMU commits: 1 2f295167e0c4 q35/mch: implement extended TSEG sizes 2 8bbf4aa96efb tests/q35-test: push down qtest_start / qtest_end to test case(s) 3 e691ef69911a tests/q35-test: add TSEG size checks Posted v2 upstream edk2 series: [PATCH v2 0/8] OvmfPkg: recognize an extended TSEG when QEMU offers it https://lists.01.org/pipermail/edk2-devel/2017-July/012122.html http://mid.mail-archive.com/20170704165629.13610-1-lersek@redhat.com edk2 commits: 253d81c71f67..d04b72c67097 1 5b31f660c92c OvmfPkg: widen PcdQ35TsegMbytes to UINT16 2 23bfb5c0aab6 OvmfPkg/PlatformPei: prepare for PcdQ35TsegMbytes becoming dynamic 3 1372f8d347ab OvmfPkg/SmmAccess: prepare for PcdQ35TsegMbytes becoming dynamic 4 966dbaf40075 OvmfPkg: make PcdQ35TsegMbytes dynamic 5 031e4ce26287 OvmfPkg/IndustryStandard/Q35MchIch9.h: add extended TSEG size macros 6 6812bb7bb5a6 OvmfPkg/SmmAccess: support extended TSEG size 7 d5e064447f8c OvmfPkg/PlatformPei: honor extended TSEG in PcdQ35TsegMbytes if available 8 d04b72c67097 OvmfPkg: mention the extended TSEG near the PcdQ35TsegMbytes declaration Successful test with 272 VCPUs. Host: see comment 22 Host kernel: 3.10.0-691.el7.x86_64 libvirt: 3.2.0-14.el7.x86_64 QEMU: upstream v2.9.0-1829-gb113658 OVMF: upstream edk2 built at commit 60e85a39fe49 Domain XML: see attached (ovmf.rhel7.q35.xml.xz) OVMF boot log: see attached (ovmf.rhel7.q35.boot.log.xz) Guest OS: "Server with GUI" installed from "RHEL-7.4-20170706.n.0-Server-x86_64-dvd1.iso" Guest dmesg: see attached (guest-dmesg.txt.xz) OVMF S3 resume log: see attached (ovmf.rhel7.q35.s3.log.xz) Relevant OVMF boot log entries (visually compressed here a bit): > Q35TsegMbytesInitialization: QEMU offers an extended TSEG (16 MB) > ... > SmmAccessPeiEntryPoint: SMRAM map follows, 2 entries > PhysicalStart(0x) PhysicalSize(0x) CpuStart(0x) RegionState(0x) > 7F000000 1000 7F000000 1A > 7F001000 FFF000 7F001000 A Tests performed: see <https://da.gd/Wt1K>: * 'Confirm "simple" multiprocessing during boot' Note: requires enabling nested virt; and msr-tools is available from EPEL7. I'm unsure we support nested virt on RHEL-7.4 hosts. > [root@ovmf-rhel7-q35 ~]# rdmsr -a 0x3a > ... 272 lines of "5" printed ... * 'UEFI variable access test' > [root@ovmf-rhel7-q35 ~]# time taskset -c 0 efibootmgr > BootCurrent: 0004 > Timeout: 0 seconds > BootOrder: 0001,0004,0002,0000,0006 > Boot0000* UiApp > Boot0001* UEFI QEMU QEMU CD-ROM > Boot0002* UEFI QEMU QEMU CD-ROM 2 > Boot0004* Red Hat Enterprise Linux > Boot0006* EFI Internal Shell > > real 0m1.466s > user 0m0.085s > sys 0m1.381s > [root@ovmf-rhel7-q35 ~]# time taskset -c 1 efibootmgr > BootCurrent: 0004 > Timeout: 0 seconds > BootOrder: 0001,0004,0002,0000,0006 > Boot0000* UiApp > Boot0001* UEFI QEMU QEMU CD-ROM > Boot0002* UEFI QEMU QEMU CD-ROM 2 > Boot0004* Red Hat Enterprise Linux > Boot0006* EFI Internal Shell > > real 0m1.559s > user 0m0.077s > sys 0m1.480s * 'ACPI S3 suspend/resume loop' Note: because I was using upstream QEMU and edk2, I enabled S3 in the domain XML, and I tested it too. (After resume, the screen can be a bit messy, send Ctrl+Alt+F1 and then Ctrl+Alt+F2, to switch back to the character terminal where the script below is run.) Remember that this is not supported on RHEL7. > [root@ovmf-rhel7-q35 ~]# cat > sleep-cycle && chmod +x sleep-cycle > > X=0 > while read -p "about to suspend"; do > systemctl suspend > echo -n "iteration=$((X++)) #VCPUs=" > grep -c -i '^processor' /proc/cpuinfo > done > ^D > [root@ovmf-rhel7-q35 ~]# ./sleep-cycle > about to suspend > iteration=0 #VCPUs=272 > about to suspend > iteration=1 #VCPUs=272 > about to suspend > iteration=2 #VCPUs=272 > about to suspend > iteration=3 #VCPUs=272 > about to suspend > iteration=4 #VCPUs=272 > about to suspend^C Then repeat 'Confirm "simple" multiprocessing during boot' and 'UEFI variable access test'. Created attachment 1295496 [details] Domain XML for comment 24 Created attachment 1295497 [details] OVMF boot log from comment 24 Created attachment 1295498 [details] Guest dmesg from comment 24 Created attachment 1295499 [details] OVMF S3 resume log from comment 24 Created attachment 1295500 [details]
guest kernel S3 messages on ttyS0
guest kernel cmdline parameters were:
ignore_loglevel no_console_suspend console=ttyS0 console=tty
QE re-tested this bug with this version. # rpm -qa|grep qemu qemu-kvm-rhev-2.10.0-10.el7.x86_64 # rpm -qa|grep OVMF OVMF-20171011-3.git92d07e48907f.el7.noarch smp 255-->works. smp 256 -->fail /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 256 -m 4096 -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -device intel-iommu,intremap=on,eim=on qemu-kvm: -device intel-iommu,intremap=on,eim=on: Intel Interrupt Remapping cannot work with kernel-irqchip=on, please use 'split|off'. Is this an new issue? (CC Radim) Hello FuXiangChun, (In reply to FuXiangChun from comment #31) > QE re-tested this bug with this version. > > # rpm -qa|grep qemu > qemu-kvm-rhev-2.10.0-10.el7.x86_64 > # rpm -qa|grep OVMF > OVMF-20171011-3.git92d07e48907f.el7.noarch > > smp 255-->works. > smp 256 -->fail > > /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 256 -m 4096 -name > vm1 -drive > file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0, > readonly=on -drive > file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon > file:/home/test/ovmf.log -device intel-iommu,intremap=on,eim=on > > qemu-kvm: -device intel-iommu,intremap=on,eim=on: Intel Interrupt Remapping > cannot work with kernel-irqchip=on, please use 'split|off'. > > Is this an new issue? Yes, this is known and expected. References: (1) https://libvirt.org/formatdomain.html#elementsIommu "The eim attribute (with possible values on and off) can be used to configure Extended Interrupt Mode. A q35 domain with split I/O APIC (as described in hypervisor features), and both interrupt remapping and EIM turned on for the IOMMU, will be able to use more than 255 vCPUs." So, there are three requirements, and your command line only satisfies two. (2) https://bugzilla.redhat.com/show_bug.cgi?id=1289151#c6 "Please note that you should need '-machine kernel_irqchip=split' and '-device intel-iommu,intremap=on,eim=on' parameters for qemu-kvm in order to run > 255." Ehh, I meant, "this is known and expected, and therefore NOT a new issue". Sorry about the confusion. Thanks for Laszlo's explanation. According to comment 32. I booted RHEL7.5 guest with '-smp 384', guest works well. and all vcpus can be found inside guest. This is key qemu-kvm command line. /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 384,cores=4,threads=4,sockets=24 -m 8192 -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 -global isa-debugcon.iobase=0x402 -drive file=/home/rhel7.5-secureboot.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=1 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -machine kernel_irqchip=split -device intel-iommu,intremap=on,eim=on Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0902 |