Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1528259

Summary: [Q35][OVMF] Boot guest failed with 8T memory
Product: Red Hat Enterprise Linux 7 Reporter: yduan
Component: ovmfAssignee: Laszlo Ersek <lersek>
Status: CLOSED NOTABUG QA Contact: FuXiangChun <xfu>
Severity: low Docs Contact:
Priority: low    
Version: 7.5CC: chayang, jinzhao, juzhang, michen, xfu, yduan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-04 12:42:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhel75-q35-ovmf.log none

Description yduan 2017-12-21 11:14:54 UTC
Created attachment 1370831 [details]
rhel75-q35-ovmf.log

Description of problem:
Boot guest failed with 8T memory.

Version-Release number of selected component (if applicable):
Host: hp-bl920gen8-01.khw.lab.eng.bos.redhat.com
# uname -r
3.10.0-693.5.2.el7.x86_64
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.10.0-13.el7.x86_64
# rpm -q OVMF
OVMF-20171011-4.git92d07e48907f.el7.noarch

Guest:
# uname -r
3.10.0-823.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.On host:
# free -h
              total        used        free      shared  buff/cache   available
Mem:            11T        114G         11T         26M         24G         11T
Swap:          4.0G          0B        4.0G

2.Boot guest with 8T memory:
/usr/libexec/qemu-kvm \
 -S \
 -name 'RHEL7.5-1' \
 -machine q35,kernel-irqchip=split \
 -device intel-iommu,intremap=on,eim=on \
 -m 8T \
 -smp 384,maxcpus=384,sockets=2,cores=96,threads=2 \
 -cpu SandyBridge,enforce \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -device AC97 \
 -vga qxl \
 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
 -drive file=OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
 -serial unix:/tmp/serial0,server,nowait \
 -debugcon file:test/rhel75-q35-ovmf.log \
 -global isa-debugcon.iobase=0x402 \
 -global mch.extended-tseg-mbytes=48 \
 -device usb-ehci,id=usb1 \
 -device usb-tablet,id=usb-tablet1 \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -device pcie-root-port,id=root1,chassis=1 \
 -netdev tap,id=netdev0,vhost=on \
 -device virtio-net-pci,mac=BA:BC:13:83:3F:1D,id=net0,netdev=netdev0,status=on \
 -spice port=5800,disable-ticketing \
 -qmp tcp:0:8888,server,nowait \
 -drive file=images/rhel75-ovmf-virtio.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bus=root1,bootindex=1 \


Actual results:
Cannot boot up guest successfully.
OVMF log:
......
mXdSupported - 0x1
One Semaphore Size    = 0x40
Total Semaphores Size = 0x12540
1GPageTableSupport - 0x0
PcdCpuSmmStaticPageTable - 0x1
PhysicalAddressBits - 0x2C
ASSERT /builddir/build/BUILD/ovmf-92d07e48907f/UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c(212): PageDirectoryEntry != ((void *) 0)

Expected results:
Boot up guest successfully

Additional info:
1.It cannot reproduced with seabios.
# rpm -q seabios
seabios-1.11.0-1.el7.x86_64

2.Installing RHEL7.5 failed to this host "hp-bl920gen8-01.khw.lab.eng.bos.redhat.com".

3.OVMF log is attached.

Comment 2 Laszlo Ersek 2018-01-03 09:56:44 UTC
Hello Yanbin,

this guest configuration has huge SMRAM requirements (384 VCPUs and 8TB RAM), so the 48MB SMRAM size may not be enough. I have two suggestions / requests:


(1) Please keep incrasing the

  -global mch.extended-tseg-mbytes=N

value until the guest boots OK -- it's hard to tell the exact N value in advance.


(2) I see another thing from the error message. Namely, the following source code location:

  UefiCpuPkg/PiSmmCpuDxeSmm/X64/PageTbl.c(212)

implies that the VM configuration does not support 1GB pages. With 1GB pages enabled, the memory footprint of the SMRAM page tables would be much smaller.

The QEMU source code calls this CPU model feature "CPUID_EXT2_PDPE1GB". On the QEMU command line, it is called "pdpe1gb". Only the following CPU models seem to enable it by default:

- phenom
- Skylake-Server
- Opteron_G4
- Opteron_G5
- EPYC

Your current command line says

  -cpu SandyBridge,enforce

Please try Skylake-Server instead of SandyBridge:

  -cpu Skylake-Server,enforce

Or else, keep SandyBridge, but add "pdpe1gb" explicitly:

  -cpu SandyBridge,+pdpe1gb,enforce

(It's entirely possible that QEMU will not launch with these options at all, if the host CPU does not support 1GB pages. In that case, only option (1) remains viable.)


Either way, this does not look like an OVMF bug; it's a domain tuning question. I'll await your response and then I'll likely suggest closing this BZ as NOTABUG. Thanks!

Comment 3 Laszlo Ersek 2018-01-03 10:01:55 UTC
... To clarify, options (1) and (2) in comment 2 are alternatives -- please do one or the other, but both at the same time shouldn't be necessary.

Comment 4 Laszlo Ersek 2018-01-03 10:02:30 UTC
setting NEEDINFO for comment 2 / comment 3

Comment 6 yduan 2018-01-04 03:15:51 UTC
Hi Laszlo,

1.Yes, you're right. Guest works well after I add '+pdpe1gb'.
  -cpu SandyBridge,+pdpe1gb,enforce

2.Then I try to add memory to 8.2T, qemu core dumped.

(qemu) kvm_set_phys_mem: error registering slot: Invalid argument
rhel-q35-ovmf.sh: line 31: 331557 Aborted                 (core dumped) /usr/libexec/qemu-kvm -S -name 'RHEL7.5-1' -machine q35,kernel-irqchip=split -device intel-iommu,intremap=on,eim=on -m 8.2T -smp 384,maxcpus=384,sockets=2,cores=96,threads=2 -cpu SandyBridge,+pdpe1gb,enforce -rtc base=localtime,clock=host,driftfix=slew -nodefaults -device AC97 -vga qxl -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/mnt/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -serial unix:/tmp/serial0,server,nowait -debugcon file:/mnt/rhel75-q35-ovmf.log -global isa-debugcon.iobase=0x402 -device usb-ehci,id=usb1 -device usb-tablet,id=usb-tablet1 -boot menu=on -enable-kvm -monitor stdio -monitor unix:/tmp/monitor2,server,nowait -device pcie-root-port,id=root1,chassis=1 -netdev tap,id=netdev0,vhost=on -device virtio-net-pci,mac=BA:BC:13:83:3F:1D,id=net0,netdev=netdev0,status=on -spice port=5900,disable-ticketing -qmp tcp:0:9999,server,nowait -drive file=/mnt/rhel75-ovmf-virtio.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive_sysdisk,id=device_sysdisk,bus=root1,bootindex=1

(gdb) bt
#0  0x00007fffdf4821f7 in raise () from /lib64/libc.so.6
#1  0x00007fffdf4838e8 in abort () from /lib64/libc.so.6
#2  0x00005555557f71b0 in kvm_set_phys_mem (kml=0x555556df10a0, 
    section=0x7fffffffd560, add=true)
    at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:786
#3  0x00005555557e84f1 in address_space_update_topology_pass (
    as=as@entry=0x55555607f620 <address_space_memory>, 
    adding=adding@entry=true, new_view=0x55555a03ca80, 
    new_view=0x55555a03ca80, old_view=<optimized out>, 
    old_view=<optimized out>) at /usr/src/debug/qemu-2.10.0/memory.c:962
#4  0x00005555557e88a4 in address_space_set_flatview (
    as=as@entry=0x55555607f620 <address_space_memory>)
    at /usr/src/debug/qemu-2.10.0/memory.c:1037
#5  0x00005555557ea630 in memory_region_transaction_commit ()
    at /usr/src/debug/qemu-2.10.0/memory.c:1089
#6  0x000055555583a128 in pc_memory_init (pcms=pcms@entry=0x555556d80380, 
    system_memory=0x555556d3e780, rom_memory=rom_memory@entry=0x555556d3eb40, 
    ram_memory=ram_memory@entry=0x7fffffffd728)
    at /usr/src/debug/qemu-2.10.0/hw/i386/pc.c:1386
#7  0x000055555583ce30 in pc_q35_init (machine=0x555556d80380)
    at /usr/src/debug/qemu-2.10.0/hw/i386/pc_q35.c:148
#8  0x000055555590a4d8 in machine_run_board_init (machine=0x555556d80380)
    at hw/core/machine.c:760
---Type <return> to continue, or q <return> to quit---
#9  0x000055555579a6ef in main (argc=<optimized out>, argv=<optimized out>, 
    envp=<optimized out>) at vl.c:4645

Thanks!
yduan

Comment 7 Laszlo Ersek 2018-01-04 12:42:42 UTC
Hello Yanbin,

based on your +pdpe1gb result (and my earlier comments), I'm closing this as NOTABUG (for OVMF).

--*--

Regarding the QEMU crash with 8.2TB guest RAM -- it is an intentional abort() on QEMU's part:

849     err = kvm_set_user_memory_region(kml, mem);
850     if (err) {
851         fprintf(stderr, "%s: error registering slot: %s\n", __func__,
852                 strerror(-err));
853         abort();
854     }

If we wanted to investigate the error here, then a new BZ should please be filed for qemu-kvm-rhev. However, the 8TB guest RAM size (which you successfully tested) is already way above the limit that RHV4 supports:

https://access.redhat.com/articles/906543
(Updated April 10 2017 at 9:08 AM)
- Maximum memory in virtualized guest: 4 TB

So, personally I don't think a new qemu-kvm-rhev RHBZ is necessary either.

Thanks!
Laszlo

Comment 8 yduan 2018-01-05 02:15:20 UTC
Hi Laszlo,

  Thanks for your detailed explanation.
  Then I will file a new bug for qemu-kvm-rhev with low priority.
  It's valuable to track not only the fully support memory (4T) but also the internal actual maximum memory from QE's perspective.

BR,
yduan

Comment 9 yduan 2018-01-05 02:27:54 UTC
I think it's about a same root cause with bz1528149, so no need to file a new bug.

Comment 10 Laszlo Ersek 2018-01-05 10:58:40 UTC
(In reply to yduan from comment #8)
>   It's valuable to track not only the fully support memory (4T) but also the
> internal actual maximum memory from QE's perspective.

Makes sense.

(In reply to yduan from comment #9)
> I think it's about a same root cause with bz1528149, so no need to file a
> new bug.

Right, it seems to be the same.

Thanks!