Bug 1468526
Summary: | >1TB RAM support | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dr. David Alan Gilbert <dgilbert> | ||||||||||||||||
Component: | ovmf | Assignee: | Laszlo Ersek <lersek> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> | ||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||
Priority: | medium | ||||||||||||||||||
Version: | 7.5 | CC: | chayang, jinzhao, juzhang, lersek, michen, mrezanin, mtessun, xfu, yduan | ||||||||||||||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | ovmf-20171011-1.git92d07e48907f.el7 | Doc Type: | If docs needed, set a value | ||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2018-04-10 16:28:00 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Bug Depends On: | 1447027, 1469787 | ||||||||||||||||||
Bug Blocks: | |||||||||||||||||||
Attachments: |
|
Description
Dr. David Alan Gilbert
2017-07-07 10:26:21 UTC
(1) Reproducing the problem: - qemu-kvm-rhev-2.9.0-16.el7.x86_64 - machine type: pc-q35-rhel7.4.0 - OVMF-20170228-5.gitc325e41585e3.el7.noarch (1a) Specifying 1026 GB guest RAM, the problem is triggered. Q35 puts 2GB RAM in the 32-bit address space, and 1024 GB above it. This means taht the CMOS would have to express 0x100_0000 64KB chunks for the high 1024GB, which cannot be represented in the 24-bit (= 6-nibble) CMOS register that OVMF reads (and upstream QEMU sets BTW). The MEMMAP command of the UEFI shell reports the total RAM size as 2GB, because all six nibbles read from the CMOS are 0. (1b) If we decrease the guest RAM size by 1MB, to 1026*1024-1 MB == 1,050,623 MB, the number of 64KB chunks becomes 0xFF_FFF0. OVMF reads this correctly from the CMOS. However, this triggers SMRAM exhaustion in PiSmmCpuDxeSmm.efi (using the above package versions, we have 8MB of SMRAM): > 1GPageTableSupport - 0x0 > PcdCpuSmmStaticPageTable - 0x1 > PhysicalAddressBits - 0x29 > ASSERT > /builddir/build/BUILD/ovmf-c325e41585e3/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c(210): > PageDirectoryEntry != ((void *) 0) (2) Excursion: while the subject of this BZ is problem (1a), we have to mitigate (1b) first, so that we can go above 1026 GB guest RAM and verify the fix for problem (1a). (2a) We can determine the SMRAM footprint needed for such large memory amounts from two sources: - The commit message on <https://github.com/tianocore/edk2/commit/28b020b5de1e>, in which Jiewen provided some SMRAM footprint examples back then, at my request. - The code fingered by the failed ASSERT itself. With 1026 GB RAM (and by default 32 GB of 64-bit PCI MMIO aperture), we're looking at an address width of 41-bits. The SetStaticPageTable() function -- which runs out of SMRAM above -- maps the entire address space using 2MB pages (if the guest supports 1GB pages, then those are used and less SMRAM is needed, but we should make a pessimistic estimate). A 2MB page covers 21 bits, and the remaining (41-21)=20 bits are subdivided (from least to most significant) 9+9+2: - On the lowest level, a 4KB page is needed for a page directory covering 9 bits (512 PDEs). - On the middle level, a 4KB page needed for a page directory pointer table, covering 9 bits (512 PDPTEs). Meaning, up to and including the middle level, we need 4KB (for the PDPT) plus 512 * 4KB (for the pointed-to PDs). - On the top level, a 4KB page needed for the single PML4 table, from which we use 4 entries (of the 512 possible) for covering the remaining 2 bits. This means that up to and including the top level, we need 4KB + 4 * (4KB + 512 * 4KB) == 8,409,088 bytes (a bit more than 8MB). - The Customer Portal article "Virtualization limits for Red Hat Enterprise Virtualization" at <https://access.redhat.com/articles/906543> (last modified: April 10 2017 at 9:08 AM) states that under RHV4, "Maximum memory in virtualized guest" is 4TB. For every other 1TB beyond the initial 1TB that we used above for the calculation, we need 4 more entries in the PML4 table, meaning 4 * (4KB + 512 * 4KB) = 8,404,992 additional bytes of SMRAM, for paging structures. We can roughly say, adding 1TB of guest RAM requires 8MB more SMRAM. (2b) Given that bug 1447027 is now fixed in upstream QEMU and in upstream edk2 as well, we can experiment with the SMRAM sizes in practice. For this, the following snippet is needed in the domain XML: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <qemu:commandline> <qemu:arg value='-global'/> <qemu:arg value='mch.extended-tseg-mbytes=N'/> </qemu:commandline> </domain> For the required upstream QEMU and OVMF commits, refer to bug 1447027. The required machine type is "pc-q35-2.10". Using said components, the extended TSEG defaults to 16MB (double of the earlier max, which is 8MB). - Starting the domain with such a TSEG and 1026GB of RAM, we get an iPXE splat (note "1af41000.efidrv"): > !!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000000 > !!!! > ExceptionData - 000000000000000B I:0 R:1 U:0 W:1 P:1 PK:0 S:0 > RIP - 000000007D0B1C74, CS - 0000000000000038, RFLAGS - 0000000000010206 > RAX - 0000000000000000, RCX - 0000000000000014, RDX - 000001080000C014 > RBX - 000000007D0C1670, RSP - 000000007EEC66E8, RBP - 000000007D0C1680 > RSI - 000000007D0C1680, RDI - 000000007D0C1670 > R8 - 0000000000000000, R9 - 0000000000000000, R10 - 000000007D0BE680 > R11 - 000000007D0BE940, R12 - 000000007D0C1660, R13 - 0000000000000060 > R14 - 0000000000000084, R15 - 0000000000000070 > DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030 > GS - 0000000000000030, SS - 0000000000000030 > CR0 - 0000000080010033, CR2 - 000001080000C014, CR3 - 000000007E6A2000 > CR4 - 0000000000000668, CR8 - 0000000000000000 > DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000 > DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400 > GDTR - 000000007E68FA98 0000000000000047, LDTR - 0000000000000000 > IDTR - 000000007DEC5018 0000000000000FFF, TR - 0000000000000000 > FXSAVE_STATE - 000000007EEC6340 > !!!! Find image 1af41000.efidrv (ImageBase=000000007D0A0000, > EntryPoint=000000007D0A7005) !!!! Debugging this issue is left as an exercise to the reader; for now I've disabled the iPXE oprom with <rom bar='off'/> under <interface>, and let the built-in VirtioNetDxe bind the virtio-net NIC. That way, the UEFI shell is reached fine, and the MEMMAP shell command reports approx. 1025.9 GB free memory. I checked 8MB, 9MB, 10MB, 11MB and 12MB extended TSEG sizes individually, and 12MB is the first that succeeds, the smaller ones all prevent the firmware from booting due to SMRAM exhaustion at different places (the exhaustion progresses to later and later points as the SMRAM size grows). Therefore problem (1b) has been mitigated. (3) Testing the candidate patch for problem (1a) -- which uses the "etc/e820" fw_cfg as a preference to the CMOS -- , in parallel with growing SMRAM footprint: (3a) Specifying 2TB of guest RAM, 16MB of SMRAM is insufficient. I "bisected" the 16..32 MB range, and the first SMRAM size that allowed the firmware to boot the UEFI shell was 20MB. This confirms the calculation in (2a) -- we went from 1TB to 2TB guest RAM, and had to use 20MB of TSEG rather than 12MB. Regarding the subject of ths BZ, problem (1a), the UEFI shell reports approx. 2047.97 GB total memory. The candidate patch seems to work. (3b) I couldn't test 4TB of guest RAM. When I tried that, still using Dave's trick from comment 5, the host (having 24GB phys RAM) seemed to lock up. (Dave said in comment 7 that he couldn't get more than 2TB to work.) After a while I started seeing "task XXXX:PID blocked for more than 120 seconds" messages from the host kernel, with stack traces indicating swap activity. I forcefully rebooted the host. (3c) On this host, I cannot actually install a guest OS, with 1026GB or more RAM. This host only has 40 physical address bits, and that's not enough for more than 1TB of address space -- EPT just stops working, and guest Linux immediately hits that issue. (The UEFI shell is reached fine because it doesn't try to massage the missing phys address bits.) Disabling EPT (and going with the less performant shadow paging in KVM) can work this around, speaking from past (SMM-less) experience. However, SMM emulation doesn't work without EPT at the moment, see bug 1348092. So, for installing an actual guest OS with >=1TB address space, I'd need a box with at least 41 phys address bits. (4) Further SMRAM size considerations (4a) SMRAM footprint grows with both VCPU count (see bug 1447027) and guest RAM size (see this bug). (These needs are added, not multiplied, together -- my 2TB testing in (3a) was indifferent to using 4 vs 16 VCPUs.) Providing sane defaults is a hard question here, especially if we consider 1GB paging as well. I think we'll need a libvirt bug for exposing "-global mch.extended-tseg-mbytes=N", and then separate documentation for tweaking the value as necessary. For everyday purposes, the default 16MB extened TSEG (with pc-q35-2.10) should be plenty, it accommodates 272 VCPUs (tested with 5GB of guest RAM, in bug 1447027.) For the currently published RHV4 limits (see link above, 240 VCPUs and 4TB guest RAM), 16MB SMRAM for the VCPUs and 4*8MB=32MB SMRAM for 4TB guest RAM should suffice (48MB SMRAM total). (4b) Edk2 has a knob called "PcdCpuSmmStaticPageTable". From "UefiCpuPkg/UefiCpuPkg.dec": > ## Indicates if SMM uses static page table. > # If enabled, SMM will not use on-demand paging. SMM will build static > # page table for all memory.<BR><BR> > # This flag only impacts X64 build, because SMM alway builds static > # page table for IA32. > # TRUE - SMM uses static page table for all memory.<BR> > # FALSE - SMM uses static page table for below 4G memory and use > # on-demand paging for above 4G memory.<BR> > # @Prompt Use static page table for all memory in SMM. > gUefiCpuPkgTokenSpaceGuid.PcdCpuSmmStaticPageTable|TRUE|BOOLEAN|0x3213210D We should not disable this PCD (i.e., we shouldn't opt for on-demand paging): - The savings are negligible (again, the impact of static paging is, without 1GB paging support (which is the worst case), ~8MB TSEG needed per 1TB guest RAM. The TSEG is chipped away from guest RAM.) - When Jiewen was working on SMM memory protection and about to add this knob, I asked him to describe its effects. He wrote in <http://mid.mail-archive.com/74D8A39837DF1E4DA445A8C0B3885C50386BD98A@shsmsx102.ccr.corp.intel.com>, > If static page is supported, page table is RO. [...] If we use dynamic > paging, we can still provide *partial* protection. And hope page table is > not modified by other component. I don't think we should weaken any such protection for relatively negligible memory savings. Update to point (3c) from comment 8: I got access to the host mentioned in comment 4. That box has 46 physical bits (amazing!) and enough disk space for a ~4.3TB swap file. (So I did create a real, non-sparse swap file.) I tried installing a domain with 4TB RAM, but as soon as the guest OS was booted, it actually hit the swap file. Not wanting to wait for weeks, I power-cycled rebooted the machine. I lowered the RAM size to 1026 GB (see point (1a) in comment 8). This way the guest installed fine. At the end of the installation, there was 11GB swap space in use, with QEMU having 1.010TB for VIRT and 0.021TB for RES. Host: see comment 4 Host kernel: 3.10.0-693.el7.x86_64 libvirt: 3.2.0-14.el7.x86_64 QEMU: upstream v2.9.0-1880-g94c5665 OVMF: upstream edk2 built at commit 60e85a39fe49 *plus* candidate patch for this BZ Domain XML: see attached (ovmf.rhel7.q35.xml.xz) OVMF boot log: see attached (ovmf.rhel7.q35.boot.log.xz) Guest OS: "Minimal install" from "RHEL-7.4-20170630.1-Server-x86_64-dvd1.iso" (via symlink called "RHEL-7-Server-x86_64-dvd1.iso") Guest dmesg: see attached (guest-dmesg.txt.xz) OVMF S3 resume log: see attached (ovmf.rhel7.q35.s3.log.xz) guest kernel S3 log: see attached (guest-s3-dmesg.txt.xz) (NOTE: S3 is not supported on RHEL7 hosts; this was just for upstream testing.) Relevant OVMF boot log entries (visually compressed here a bit): (In reply to Laszlo Ersek from comment #9) > Relevant OVMF boot log entries (visually compressed here a bit): > E820HighRamIterate: Base=0xFEFFC000 Length=0x4000 Type=2 > E820HighRamIterate: Base=0x0 Length=0x80000000 Type=1 > E820HighRamIterate: Base=0x100000000 Length=0x10000000000 Type=1 > E820HighRamFindHighestExclusiveAddress: MaxAddress=0x10100000000 > GetFirstNonAddress: Pci64Base=0x10800000000 Pci64Size=0x800000000 > MaxCpuCountInitialization: QEMU reports 48 processor(s) > Q35TsegMbytesInitialization: QEMU offers an extended TSEG (16 MB) > PublishPeiMemory: mPhysMemAddressWidth=41 PeiMemoryCap=73748 KB > PeiInstallPeiMemory MemoryBegin 0x7A5FB000, MemoryLength 0x4805000 > E820HighRamIterate: Base=0xFEFFC000 Length=0x4000 Type=2 > E820HighRamIterate: Base=0x0 Length=0x80000000 Type=1 > E820HighRamIterate: Base=0x100000000 Length=0x10000000000 Type=1 > E820HighRamAddMemoryHob: [0x100000000, 0x10100000000) And, for this test, the default 16MB extended TSEG size was used. Created attachment 1296032 [details] Domain XML for comment 9 Created attachment 1296033 [details] OVMF boot log for comment 9 Created attachment 1296034 [details] Guest dmesg for comment 9 Created attachment 1296035 [details] OVMF S3 resume log from comment 9 Created attachment 1296036 [details]
guest kernel S3 log
(In reply to Laszlo Ersek from comment #15) > Created attachment 1296036 [details] > guest kernel S3 log ... for comment 9 Posted upstream patch (called "candidate patch" in comment 8 and comment 9): [edk2] [PATCH 0/1] OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM Message-Id: <20170711032231.29280-1-lersek> https://lists.01.org/pipermail/edk2-devel/2017-July/012304.html Posted upstream v2: [edk2] [PATCH v2 0/1] OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM Message-Id: <20170804230043.12977-1-lersek> https://lists.01.org/pipermail/edk2-devel/2017-August/012942.html Upstream commit 1fceaddb12b5 ("OvmfPkg/PlatformPei: support >=1TB high RAM, and discontiguous high RAM", 2017-07-08). QE tested 1T memory, RHEL7.5 guest works well. But, Fail to boot when using 2T memory to boot RHEL7.5 guest. OVMF log is uploaded as attachment. In additional. Host memory is enough as below. # free -g total used free shared buff/cache available Mem: 12094 84 12004 0 5 12006 Swap: 3 0 3 # rpm -qa|grep qemu qemu-kvm-rhev-2.10.0-10.el7.x86_64 # rpm -qa|grep OVMF OVMF-20171011-3.git92d07e48907f.el7.noarch qemu command: /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 16,cores=2,threads=8,sockets=1 -m 2T -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 Created attachment 1363251 [details]
2T memory ovmf log
Hello FuXiangChun, thanks for the log. While building the static SMM page tables (for the whole guest RAM), the SetStaticPageTable() function in "UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c" runs out of SMRAM: 211 PageDirectoryEntry = AllocatePageTableMemory (1); 212 ASSERT(PageDirectoryEntry != NULL); * This is discussed at length in (2a), in comment 8 -- the summary is, "We can roughly say, adding 1TB of guest RAM requires 8MB more SMRAM." * In comment 8 bullet (3a), I specifically tested 2TB and stated that the default 16MB SMRAM size is insufficient. * Furthermore, please refer to the following test case in the RHEL-7.5 OVMF test plan (bug 1505265): RHEL7-110151. It carries the following Note: > If boot guest with a very large guest RAM size(>=4T) and a high VCPU > count(>272), then need add this option to qemu command > > -global mch.extended-tseg-mbytes=48 Based on the above references, please *either* append -global mch.extended-tseg-mbytes=24 to the QEMU command line (because you added 1TB of RAM, and 16 + 8 = 24); *or else* append -global mch.extended-tseg-mbytes=48 (which is the value given under the RHEL7-110151 test case that should be sufficient up to 4TB -- namely, 16 + 4*8 = 16 + 32 = 48.) Thank you! Thanks Laszlo. Re-tested this bug with 3.10.0-693.5.2.el7.x86_64 & OVMF-20171011-3.git92d07e48907f.el7.noarch & qemu-kvm-rhev-2.10.0-10.el7.x86_64. Key qemu command: /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 384,cores=8,threads=24,sockets=2 -m 4T -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 -global isa-debugcon.iobase=0x402 -drive file=/home/rhel7.5-secureboot.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=1 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -device pcie-root-port,bus=pcie.0,id=root.0,slot=0,io-reserve=0 -device e1000,netdev=tap0,mac=9a:6a:6b:6c:6d:50,bus=root.0 -netdev tap,id=tap0 -machine kernel_irqchip=split -device intel-iommu,intremap=on,eim=on -global mch.extended-tseg-mbytes=48 -serial unix:/tmp/console,server,nowait -vnc :1 Result: 4T size memory can be found inside guest. and guest works well. I need to confirm 2 small problems with you. Q1). If I reboot 7.5 guest with 4T. It will take ~50 minutes from the send reboot command to load ovmf UI. Then it will take ~40 minutes during booting. Is it normal? Q2) As this big machine can not install RHEL7.5 host(always kernel panic). I use RHEL7.4 host to test this bug. But qemu-kvm-rhev and OVMF are the latest version. and Guest is RHEL7.5 guest. Can it be used as valid results to verify this bug? Thanks. (In reply to FuXiangChun from comment #24) > Result: > 4T size memory can be found inside guest. and guest works well. Thanks! > I need to confirm 2 small problems with you. > > Q1). If I reboot 7.5 guest with 4T. It will take ~50 minutes from the send > reboot command to load ovmf UI. Then it will take ~40 minutes during > booting. Is it normal? I have absolutely no idea. I've never used a 4TB guest. * Can the slowness be related to swap space usage on the host, perhaps? * How does a 4TB SeaBIOS/RHEL-7.5 guest behave across a reboot? * How does the physical host -- with the multi-terabyte RAM -- behave across a RHEL-7.5 reboot? > Q2) As this big machine can not install RHEL7.5 host(always kernel panic). Oh, wow. That sort of answers my last question above -- "it does not boot at all", namely. So: * Do we have an RHBZ about this, for the host kernel? > I use RHEL7.4 host to test this bug. But qemu-kvm-rhev and OVMF are the > latest version. and Guest is RHEL7.5 guest. Can it be used as valid > results to verify this bug? Thanks. I have two thoughts -- Dave, please comment if you can: (1) Apparently, the RHEL-7.5 kernel does not boot at all on a similarly large physical machine. That makes me wonder if we can at all use the RHEL-7.5 kernel for testing guest functionality! - What is the reboot behavior of a 4TB OVMF/RHEL-7.4 guest (using OVMF from RHEL-7.5 on a RHEL-7.4 host)? (2) I *think* it could be OK to use a RHEL-7.4 host, with only OVMF upgraded to 7.5, but I'm not entirely sure. The only practical scenario where I can imagine such a setup is the following: (a) You start the guest on a RHEL-7.5 host (including OVMF from the 7.5 host). (b) You use the pc-q35-rhel7.4.0 machine type. (c) You migrate the guest *down* to a RHEL-7.4 host. (d) You reboot the migrated guest on the target host. Because the firmware is migrated (in memory / flash) together with the guest, the reboot will effectively execute OVMF from RHEL-7.5 on the RHEL-7.4 host. However, I'm unsure if we support backwards migration from RHEL-7.5 to RHEL-7.4 hosts. (I think backward migration is supported on a case-by-case basis only. I could be wrong.) Summary: - I think your host setup is fine. - Please use a kernel in the guest that is known to work well on large hosts too (that is, RHEL-7.4.z). - If the RHEL-7.4.z guest takes very long to reboot as well, with OVMF from RHEL-7.5, then please repeat the test with SeaBIOS as well (preserving all other details of the OVMF test). Thanks! I'm not sure - I've not used anything this big either; I agree with Laszlo's suggestions; lets find the bz for the reason 7.5 kernel crashes on the host, and lets see if SeaBIOS takes that long as well. ~50minute reboot sounds like a bug somewhere. (In reply to Laszlo Ersek from comment #25) > (In reply to FuXiangChun from comment #24) > > > Result: > > 4T size memory can be found inside guest. and guest works well. > > Thanks! > > > > I need to confirm 2 small problems with you. > > > > Q1). If I reboot 7.5 guest with 4T. It will take ~50 minutes from the send > > reboot command to load ovmf UI. Then it will take ~40 minutes during > > booting. Is it normal? > > I have absolutely no idea. I've never used a 4TB guest. > > * Can the slowness be related to swap space usage on the host, perhaps? # free -g total used free shared buff/cache available Mem: 12094 182 11897 0 14 11907 Swap: 3 0 3 so,Host doesn't use Swap. > > * How does a 4TB SeaBIOS/RHEL-7.5 guest behave across a reboot? I installed a RHEL7.5 with SeaBIOS with 4T memory. Reboot guest only need ~2 minutes. > > * How does the physical host -- with the multi-terabyte RAM -- behave across > a RHEL-7.5 reboot? > Reboot host will takes ~20~30 minutes to reboot. > > > Q2) As this big machine can not install RHEL7.5 host(always kernel panic). > > Oh, wow. That sort of answers my last question above -- "it does not boot at > all", namely. So: > > * Do we have an RHBZ about this, for the host kernel? https://bugzilla.redhat.com/show_bug.cgi?id=1446771 > > > > I use RHEL7.4 host to test this bug. But qemu-kvm-rhev and OVMF are the > > latest version. and Guest is RHEL7.5 guest. Can it be used as valid > > results to verify this bug? Thanks. > > I have two thoughts -- Dave, please comment if you can: > > (1) Apparently, the RHEL-7.5 kernel does not boot at all on a similarly > large physical machine. That makes me wonder if we can at all use the > RHEL-7.5 kernel for testing guest functionality! > > - What is the reboot behavior of a 4TB OVMF/RHEL-7.4 guest (using OVMF > from RHEL-7.5 on a RHEL-7.4 host)? > Sorry, I need correct host and guest's version. Host is RHEL7.4.z(3.10.0-693.5.2.el7.x86_64) Guest is RHEL7.4(3.10.0-693.el7.x86_64) > (2) I *think* it could be OK to use a RHEL-7.4 host, with only OVMF upgraded > to 7.5, but I'm not entirely sure. The only practical scenario where I > can imagine such a setup is the following: > > (a) You start the guest on a RHEL-7.5 host (including OVMF from the 7.5 > host). > As bug 1446771, I can not install fresh RHEL-7.5 host. It always fail. > (b) You use the pc-q35-rhel7.4.0 machine type. I tested pc-q35-rhel7.4.0 and pc-q35-rhel7.5.0, Both are the same result. > > (c) You migrate the guest *down* to a RHEL-7.4 host. Sorry, I only found a memory big in beaker. so can not do migrate. > > (d) You reboot the migrated guest on the target host. > > Because the firmware is migrated (in memory / flash) together with the > guest, the reboot will effectively execute OVMF from RHEL-7.5 on the > RHEL-7.4 host. > > However, I'm unsure if we support backwards migration from RHEL-7.5 to > RHEL-7.4 hosts. (I think backward migration is supported on a > case-by-case basis only. I could be wrong.) > I'm sorry, I updated inaccurate time for reboot guest. I re-tested it with 4T memory and less vcpu(32). Boot RHEL7.4 guest will take ~17 minutes on RHEL7.4.z host. (Before I use 384 vcpu and 4T memory,It will take more time) > > Summary: > > - I think your host setup is fine. > > - Please use a kernel in the guest that is known to work well on large hosts > too (that is, RHEL-7.4.z). > > - If the RHEL-7.4.z guest takes very long to reboot as well, with OVMF from > RHEL-7.5, then please repeat the test with SeaBIOS as well (preserving all > other details of the OVMF test). > > Thanks! Summary my testing for ovmf and seabios. 1.For seabios. It will take ~2 minutes to boot. and take ~3 minutes to reboot. version: host is RHEL7.4.z(3.10.0-693.5.2.el7.x86_64) guest is RHEL7.5(3.10.0-799.el7.x86_64) qemu-kvm-rhev-2.10.0-10.el7.x86_64 qemu command: /usr/libexec/qemu-kvm -enable-kvm -M q35 -nodefaults -smp 32,cores=4,threads=4,sockets=2 -m 4T -name vm1 -global isa-debugcon.iobase=0x402 -drive file=rhel7.5-seabios.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk -spice port=5931,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -device pcie-root-port,bus=pcie.0,id=root.0,slot=0,io-reserve=0 -device e1000,netdev=tap0,mac=9a:6a:6b:6c:6d:50,bus=root.0 -netdev tap,id=tap0 -vnc :1 2. For OVMF It will take ~17 minutes to boot guest. and take ~18 minutes to reboot guest. Version: host is RHEL7.4.z(3.10.0-693.5.2.el7.x86_64) guest is RHEL7.4(3.10.0-693.el7.x86_64) qemu-kvm-rhev-2.10.0-10.el7.x86_64 OVMF-20171011-3.git92d07e48907f.el7.noarch qemu command: /usr/libexec/qemu-kvm -enable-kvm -M pc-q35-rhel7.5.0 -nodefaults -smp 32,cores=4,threads=4,sockets=2 -m 4T -name vm1 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -debugcon file:/home/test/ovmf.log -drive file=/usr/share/OVMF/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1,bus=ahci0.1 -global isa-debugcon.iobase=0x402 -drive file=/home/rhel7.5-secureboot.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=1 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on,reboot-timeout=8,strict=on -device pcie-root-port,bus=pcie.0,id=root.0,slot=0,io-reserve=0 -device e1000,netdev=tap0,mac=9a:6a:6b:6c:6d:50,bus=root.0 -netdev tap,id=tap0 -machine kernel_irqchip=split -device intel-iommu,intremap=on,eim=on -global mch.extended-tseg-mbytes=48 -serial unix:/tmp/console,server,nowait -vnc :1 In addition. This host will be returned to beaker tomorrow. Because of existing bug, I can not install RHEL7.5 host. Do I need to do other tests? Created attachment 1363791 [details]
ovmf-4T log
FuXiangChun, your feedback is extremely helpful, thank you for that. Yes, I would like to ask you for one more test with OVMF. Let me analyze the newest information below; there are two important facts: (1) boot time of OVMF is consistent with reboot time of OVMF (17 mins vs. 18 mins -- this is from your summary in comment 27). That's great. (2) Your command line for the OVMF testing, from comment 27, uses: OVMF-20171011-3.git92d07e48907f.el7.noarch and specifies: -debugcon file:/home/test/ovmf.log \ -global isa-debugcon.iobase=0x402 In turn, the boot produces the *absolutely hugest* OVMF debug log (comment 28) I've ever seen. It's uncompressed size is 133M, containing 2,112,575 lines. Why is this relevant? Because: - in ovmf-20171011-2.git92d07e48907f.el7, Paolo fixed bug 1488247, such that the debug log is written to the QEMU debug port *if and only if* the debug console is actually enabled with the "-debugcon" switch; - producing large amounts of debug log impacts performance; - the log from comment 28 consists overwhelmingly of lines that say: > ConvertPageEntryAttribute 0x800000007F0FB067->0x800000007F0FB065 and the number of such lines is linearly proportional to guest RAM size. (If you remove these lines from the log file, only 224,043 bytes are left; or put differently, 3685 lines. That's ~0.17% of the full line count.) So, the one test that I would like to request in addition is just this: please repeat your last OVMF test (from the end of comment 27), but *remove* the following options: -debugcon file:/home/test/ovmf.log \ -global isa-debugcon.iobase=0x402 and measure the boot time like this. (Functionally, you already confirmed in comment 24, "4T size memory can be found inside guest. and guest works well". So this is now only about the performance.) I expect that the boot (and reboot) will be sped up quite a bit. If it does not catch up with SeaBIOS, that's not a problem though; a similarly sized host reboot takes ~20~30 minutes as well, according to comment 27. Thank you! Thanks Laszl, I re-tested this problem without ' -debugcon file:/home/test/ovmf.log' and '-global isa-debugcon.iobase=0x402', Guest just spend 3 minutes to boot or reboot. and guest works well, All memory and vcpus can be found inside guest. According to this test result. Can I set this bug as verified? FuXiangChun, those are awesome results; many thanks for your continued thorough work! Yes, please set this BZ to VERIFIED status. Cheers! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0902 |