Bug 1447027 - Guest cannot boot with 240 or above vcpus when using ovmf
Summary: Guest cannot boot with 240 or above vcpus when using ovmf
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ovmf
Version: 7.4
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: FuXiangChun
URL:
Whiteboard:
Keywords: TestOnly
Depends On: ovmf-rebase-rhel-7.5
Blocks: 1468526
TreeView+ depends on / blocked
 
Reported: 2017-05-01 11:07 UTC by Guo, Zhiyi
Modified: 2018-04-10 16:30 UTC (History)
12 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-04-10 16:28:00 UTC


Attachments (Terms of Use)
ovmf logs (72.97 KB, application/zip)
2017-05-01 11:11 UTC, Guo, Zhiyi
no flags Details
Domain XML for comment 24 (1.63 KB, application/x-xz)
2017-07-08 16:41 UTC, Laszlo Ersek
no flags Details
OVMF boot log from comment 24 (21.43 KB, application/x-xz)
2017-07-08 16:42 UTC, Laszlo Ersek
no flags Details
Guest dmesg from comment 24 (19.23 KB, application/x-xz)
2017-07-08 16:43 UTC, Laszlo Ersek
no flags Details
OVMF S3 resume log from comment 24 (3.12 KB, application/x-xz)
2017-07-08 16:44 UTC, Laszlo Ersek
no flags Details
guest kernel S3 messages on ttyS0 (10.71 KB, application/x-xz)
2017-07-08 17:34 UTC, Laszlo Ersek
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0902 None None None 2018-04-10 16:30 UTC
Red Hat Bugzilla 1289151 None None None 2019-04-01 12:35 UTC
Red Hat Bugzilla 1433956 None None None 2019-04-01 12:35 UTC
Red Hat Bugzilla 1469338 None CLOSED RFE: expose Q35 extended TSEG size in domain XML element or attribute 2019-04-01 12:35 UTC

Internal Trackers: 1289151 1433956 1469338

Description Guo, Zhiyi 2017-05-01 11:07:08 UTC
Description of problem:
Guest cannot boot with 240 or above vcpus when using ovmf

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-2.el7.x86_64
OVMF-20170228-4.gitc325e41585e3.el7.noarch
3.10.0-657.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot a uefi guest with cli:
/usr/libexec/qemu-kvm -name intel74 -m 32G \
        -cpu IvyBridge,enforce \
        -machine q35,kernel-irqchip=split \
        -smp 240 \
        -monitor stdio \
        -qmp tcp:0:4444,server,nowait \
        -vga std \
        -vnc :0 \
        -serial unix:/tmp/console,server,nowait \
        -drive file=rhel74-0428.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop
 -device virtio-scsi-pci,id=scsi0,addr=04 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=s
csi-disk0,bootindex=1 \
        -netdev tap,id=idinWyYp,vhost=on -device virtio-net-pci,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp
 \
        -device intel-iommu,intremap=on,eim=on \
        -drive file=/usr/share/OVMF/rhel7/OVMF_CODE.secboot.fd,if=pflash,format=raw,readonly=on,unit=0 -drive fil
e=/usr/share/OVMF/rhel7/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
        -debugcon file:debug.log -global isa-debugcon.iobase=0x402 \

2.
3.

Actual results:
Guest cannot boot and error from ovmf log:
ASSERT /builddir/build/BUILD/ovmf-c325e41585e3/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c(210): PageDirectoryEntry !
= ((void *) 0)

Expected results:
Guest can boot 

Additional info:
Guest can boot if switch to seabios

Comment 2 Guo, Zhiyi 2017-05-01 11:11 UTC
Created attachment 1275384 [details]
ovmf logs

guest can boot with 200 vcpus -- good.log
guest cannot boot with 240 vcpus -- bad.log

Comment 4 Laszlo Ersek 2017-05-01 13:39:59 UTC
This is an out-of-SMRAM condition.

The Q35 board provides 1MB, 2MB or 8MB of SMRAM in TSEG (configurably), OVMF already picks 8MB.

In bug 1341733, we encountered a stack overflow in SMM, while running EnrollDefaultKeys.efi. That issue was addressed with the upstream commits

509f8425b75d UefiCpuPkg: change PcdCpuSmmStackGuard default to TRUE
0d0c245dfb14 OvmfPkg: set SMM stack size to 16KB

commit 0d0c245dfb147956cb597582bc481579ceb612c0
Author: Laszlo Ersek <lersek@redhat.com>
Date:   Wed Jun 1 19:59:52 2016 +0200

    OvmfPkg: set SMM stack size to 16KB
    
    The default stack size (from UefiCpuPkg/UefiCpuPkg.dec) is 8KB, which
    proved too small (i.e., led to stack overflow) across commit range
    98c2d9610506^..f85d3ce2efc2^, during certificate enrollment into "db".
    
    As the edk2 codebase progresses and OVMF keeps including features, the
    stack demand constantly fluctuates; double the SMM stack size for good
    measure.

Raising the SMM stack size from 8KB to 16KB incurs an extra 8KB SMRAM demand, *per VCPU*. Around 240 VCPUs, this is almost 2MB of SMRAM, from the 8MB total.

The one thing we can try here is an SMM stack size of 12KB, which should recover about 1MB of SMRAM, for a VCPU count of ~240.

If that helps here, then bug 1341733 has to be re-verified too.

If lowering the SMM stack size to 12KB does not help, then something way more intrusive will be necessary. (I don't know what.)

Comment 10 Laszlo Ersek 2017-05-02 18:19:40 UTC
Lowering the SMM stack size to 12KB did not help, unfortunately.

I asked on edk2-devel whether we could do anything about this in the short term, by tweaking various build knobs:

https://lists.01.org/pipermail/edk2-devel/2017-May/010371.html

Otherwise, we might have to look into increasing the SMRAM size on Q35. (Later.)

Comment 12 Laszlo Ersek 2017-05-03 13:01:48 UTC
Based on the upstream discussion (comment 10), a larger, firmware-detectable TSEG size appears the preferred solution. IMO, that should be possible for upstream QEMU 2.10.

Comment 14 Laszlo Ersek 2017-05-30 21:28:22 UTC
Posted QEMU RFC patch:
[RFC] q35/mch: implement extended TSEG sizes
http://mid.mail-archive.com/20170530212614.18343-1-lersek@redhat.com

Comment 15 Laszlo Ersek 2017-06-08 16:12:59 UTC
Posted QEMU patch (identical to the RFC in comment 14):
[PATCH] q35/mch: implement extended TSEG sizes
http://mid.mail-archive.com/20170608161013.17920-1-lersek@redhat.com

Comment 16 Laszlo Ersek 2017-06-08 17:14:54 UTC
Posted upstream edk2 series:
[PATCH 0/5] OvmfPkg: recognize an extended TSEG when QEMU offers it
https://lists.01.org/pipermail/edk2-devel/2017-June/011452.html
http://mid.mail-archive.com/20170608171333.17937-1-lersek@redhat.com

Comment 17 Laszlo Ersek 2017-06-16 11:26:29 UTC
Posted qtest patches for upstream QEMU (on top of the patch linked in comment 15):
[PATCH 0/2] tests/q35-test: add TSEG size checks
http://mid.mail-archive.com/20170616112201.24512-1-lersek@redhat.com

Comment 18 Laszlo Ersek 2017-06-21 06:21:41 UTC
QEMU commits:
1 2f295167e0c4 q35/mch: implement extended TSEG sizes
2 8bbf4aa96efb tests/q35-test: push down qtest_start / qtest_end to test
               case(s)
3 e691ef69911a tests/q35-test: add TSEG size checks

Comment 19 Laszlo Ersek 2017-07-04 16:58:01 UTC
Posted v2 upstream edk2 series:
[PATCH v2 0/8] OvmfPkg: recognize an extended TSEG when QEMU offers it
https://lists.01.org/pipermail/edk2-devel/2017-July/012122.html
http://mid.mail-archive.com/20170704165629.13610-1-lersek@redhat.com

Comment 20 Laszlo Ersek 2017-07-05 20:48:02 UTC
edk2 commits: 253d81c71f67..d04b72c67097

1 5b31f660c92c OvmfPkg: widen PcdQ35TsegMbytes to UINT16
2 23bfb5c0aab6 OvmfPkg/PlatformPei: prepare for PcdQ35TsegMbytes becoming
               dynamic
3 1372f8d347ab OvmfPkg/SmmAccess: prepare for PcdQ35TsegMbytes becoming
               dynamic
4 966dbaf40075 OvmfPkg: make PcdQ35TsegMbytes dynamic
5 031e4ce26287 OvmfPkg/IndustryStandard/Q35MchIch9.h: add extended TSEG size
               macros
6 6812bb7bb5a6 OvmfPkg/SmmAccess: support extended TSEG size
7 d5e064447f8c OvmfPkg/PlatformPei: honor extended TSEG in PcdQ35TsegMbytes
               if available
8 d04b72c67097 OvmfPkg: mention the extended TSEG near the PcdQ35TsegMbytes
               declaration

Comment 24 Laszlo Ersek 2017-07-08 16:39:07 UTC
Successful test with 272 VCPUs.

Host:               see comment 22
Host kernel:        3.10.0-691.el7.x86_64
libvirt:            3.2.0-14.el7.x86_64
QEMU:               upstream v2.9.0-1829-gb113658
OVMF:               upstream edk2 built at commit 60e85a39fe49
Domain XML:         see attached (ovmf.rhel7.q35.xml.xz)
OVMF boot log:      see attached (ovmf.rhel7.q35.boot.log.xz)
Guest OS:           "Server with GUI" installed from
                    "RHEL-7.4-20170706.n.0-Server-x86_64-dvd1.iso"
Guest dmesg:        see attached (guest-dmesg.txt.xz)
OVMF S3 resume log: see attached (ovmf.rhel7.q35.s3.log.xz)

Relevant OVMF boot log entries (visually compressed here a bit):

> Q35TsegMbytesInitialization: QEMU offers an extended TSEG (16 MB)
> ...
> SmmAccessPeiEntryPoint: SMRAM map follows, 2 entries
>    PhysicalStart(0x)  PhysicalSize(0x)  CpuStart(0x)  RegionState(0x)
>             7F000000              1000      7F000000               1A
>             7F001000            FFF000      7F001000                A

Tests performed: see <https://da.gd/Wt1K>:

* 'Confirm "simple" multiprocessing during boot'

  Note: requires enabling nested virt; and msr-tools is available from
  EPEL7. I'm unsure we support nested virt on RHEL-7.4 hosts.

> [root@ovmf-rhel7-q35 ~]# rdmsr -a 0x3a
> ... 272 lines of "5" printed ...

* 'UEFI variable access test'

> [root@ovmf-rhel7-q35 ~]# time taskset -c 0 efibootmgr
> BootCurrent: 0004
> Timeout: 0 seconds
> BootOrder: 0001,0004,0002,0000,0006
> Boot0000* UiApp
> Boot0001* UEFI QEMU QEMU CD-ROM
> Boot0002* UEFI QEMU QEMU CD-ROM  2
> Boot0004* Red Hat Enterprise Linux
> Boot0006* EFI Internal Shell
>
> real    0m1.466s
> user    0m0.085s
> sys     0m1.381s

> [root@ovmf-rhel7-q35 ~]# time taskset -c 1 efibootmgr
> BootCurrent: 0004
> Timeout: 0 seconds
> BootOrder: 0001,0004,0002,0000,0006
> Boot0000* UiApp
> Boot0001* UEFI QEMU QEMU CD-ROM
> Boot0002* UEFI QEMU QEMU CD-ROM  2
> Boot0004* Red Hat Enterprise Linux
> Boot0006* EFI Internal Shell
>
> real    0m1.559s
> user    0m0.077s
> sys     0m1.480s

* 'ACPI S3 suspend/resume loop'

  Note: because I was using upstream QEMU and edk2, I enabled S3 in the
  domain XML, and I tested it too. (After resume, the screen can be a
  bit messy, send Ctrl+Alt+F1 and then Ctrl+Alt+F2, to switch back to
  the character terminal where the script below is run.) Remember that
  this is not supported on RHEL7.

> [root@ovmf-rhel7-q35 ~]# cat > sleep-cycle && chmod +x sleep-cycle
>
> X=0
> while read -p "about to suspend"; do
>   systemctl suspend
>   echo -n "iteration=$((X++)) #VCPUs="
>   grep -c -i '^processor' /proc/cpuinfo
> done
> ^D

> [root@ovmf-rhel7-q35 ~]# ./sleep-cycle
> about to suspend
> iteration=0 #VCPUs=272
> about to suspend
> iteration=1 #VCPUs=272
> about to suspend
> iteration=2 #VCPUs=272
> about to suspend
> iteration=3 #VCPUs=272
> about to suspend
> iteration=4 #VCPUs=272
> about to suspend^C

  Then repeat 'Confirm "simple" multiprocessing during boot' and 'UEFI
  variable access test'.

Comment 25 Laszlo Ersek 2017-07-08 16:41 UTC
Created attachment 1295496 [details]
Domain XML for comment 24

Comment 26 Laszlo Ersek 2017-07-08 16:42 UTC
Created attachment 1295497 [details]
OVMF boot log from comment 24

Comment 27 Laszlo Ersek 2017-07-08 16:43 UTC
Created attachment 1295498 [details]
Guest dmesg from comment 24

Comment 28 Laszlo Ersek 2017-07-08 16:44 UTC
Created attachment 1295499 [details]
OVMF S3 resume log from comment 24

Comment 29 Laszlo Ersek 2017-07-08 17:34 UTC
Created attachment 1295500 [details]
guest kernel S3 messages on ttyS0

guest kernel cmdline parameters were:
ignore_loglevel no_console_suspend console=ttyS0 console=tty

Comment 31 FuXiangChun 2017-12-05 16:04:51 UTC
QE re-tested this bug with this version.

# rpm -qa|grep qemu
qemu-kvm-rhev-2.10.0-10.el7.x86_64
# rpm -qa|grep OVMF
OVMF-20171011-3.git92d07e48907f.el7.noarch

smp 255-->works.
smp 256 -->fail

/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 256 -m 4096 -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -device intel-iommu,intremap=on,eim=on

qemu-kvm: -device intel-iommu,intremap=on,eim=on: Intel Interrupt Remapping cannot work with kernel-irqchip=on, please use 'split|off'.

Is this an new issue?

Comment 32 Laszlo Ersek 2017-12-05 20:56:59 UTC
(CC Radim)

Hello FuXiangChun,

(In reply to FuXiangChun from comment #31)
> QE re-tested this bug with this version.
> 
> # rpm -qa|grep qemu
> qemu-kvm-rhev-2.10.0-10.el7.x86_64
> # rpm -qa|grep OVMF
> OVMF-20171011-3.git92d07e48907f.el7.noarch
> 
> smp 255-->works.
> smp 256 -->fail
> 
> /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 256 -m 4096 -name
> vm1 -drive
> file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,
> readonly=on -drive
> file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon
> file:/home/test/ovmf.log -device intel-iommu,intremap=on,eim=on
> 
> qemu-kvm: -device intel-iommu,intremap=on,eim=on: Intel Interrupt Remapping
> cannot work with kernel-irqchip=on, please use 'split|off'.
> 
> Is this an new issue?

Yes, this is known and expected. References:

(1) https://libvirt.org/formatdomain.html#elementsIommu

"The eim attribute (with possible values on and off) can be used to configure Extended Interrupt Mode. A q35 domain with split I/O APIC (as described in hypervisor features), and both interrupt remapping and EIM turned on for the IOMMU, will be able to use more than 255 vCPUs."

So, there are three requirements, and your command line only satisfies two.

(2) https://bugzilla.redhat.com/show_bug.cgi?id=1289151#c6

"Please note that you should need '-machine kernel_irqchip=split' and '-device intel-iommu,intremap=on,eim=on' parameters for qemu-kvm in order to run > 255."

Comment 33 Laszlo Ersek 2017-12-05 20:58:02 UTC
Ehh, I meant, "this is known and expected, and therefore NOT a new issue". Sorry about the confusion.

Comment 34 FuXiangChun 2017-12-06 03:09:43 UTC
Thanks for Laszlo's explanation. According to comment 32. I booted RHEL7.5 guest with '-smp 384', guest works well. and all vcpus can be found inside guest. This is key qemu-kvm command line. 

/usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 384,cores=4,threads=4,sockets=24 -m 8192 -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 -global isa-debugcon.iobase=0x402 -drive file=/home/rhel7.5-secureboot.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=1 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -machine kernel_irqchip=split -device intel-iommu,intremap=on,eim=on

Comment 35 FuXiangChun 2017-12-06 03:10:45 UTC
According to comment32~comment34, Set this bug as verified.

Comment 38 errata-xmlrpc 2018-04-10 16:28:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0902


Note You need to log in before you can comment on or make changes to this bug.