Bug 1789335

Summary: VM with edk2 can't boot when setting memory with '-m 2001'
Product: Red Hat Enterprise Linux 8 Reporter: Xueqiang Wei <xuwei>
Component: edk2Assignee: Laszlo Ersek <lersek>
Status: CLOSED ERRATA QA Contact: Xueqiang Wei <xuwei>
Severity: high Docs Contact:
Priority: high    
Version: 8.2CC: berrange, chayang, coli, jinzhao, juzhang, kraxel, lersek, pbonzini, philmd
Target Milestone: rcKeywords: Regression
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: edk2-20190829git37eef91017ad-6.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xml file for guest none

Description Xueqiang Wei 2020-01-09 11:43:26 UTC
Created attachment 1650942 [details]
xml file for guest

Description of problem:


If set memory to "2048576",  guest can't enter BIOS setup, can't boot up.

e.g.
 <memory unit='KiB'>2048576</memory>
 <currentMemory unit='KiB'>2048576</currentMemory>


If set memory to "2097152", it works well.
e.g.
<memory unit='KiB'>2097152</memory>
<currentMemory unit='KiB'>2097152</currentMemory>



Version-Release number of selected component (if applicable):
Host:
kernel-4.18.0-165.el8.x86_64
qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc
edk2-ovmf-20190829git37eef91017ad-4.el8.noarch


How reproducible:
5/5

Steps to Reproduce:
1. start a guest with net-client.xml (please refer to attachment.)

# virsh define net-client.xml
Domain net-client defined from net-client.xml

# virsh start net-client
Domain net-client started

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 7     net-server                     running
 21    net-client                     running


2. connect to guest
# remote-viewer vnc://10.66.8.143:5901


Actual results:
after step 2:
guest can't enter BIOS setup, can't boot up.

# remote-viewer vnc://10.66.8.143:5901
Guest has not initialized the display (yet).


logs like:
SMM exception at access (0x7C101010)
It is invoked from the instruction before IP(0x7D0F6F4F) in module (/builddir/build/BUILD/edk2-37eef91017ad/Build/Ovmf3264/DEBUG_GCC5/X64/MdeModulePkg/Core/PiSmmCore/PiSmmCore/DEBUG/PiSmmCore.dll)


Expected results:
guest boot up normally.



Additional info:

Host memory:

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7684        2586        4782          37         315        4800
Swap:          7975         240        7735


qemu cmd lines:
# cp /usr/share/edk2/ovmf/OVMF_VARS.fd /home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2.fd

usr/libexec/qemu-kvm \
        -name guest=net-client,debug-threads=on \
        -S \
        -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off \
        -cpu IvyBridge-IBRS,ss=on,pcid=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,ibpb=on,skip-l1dfl-vmentry=on \
        -m 2001 \
        -smp 4,sockets=1,cores=4,threads=1 \
        -uuid 6151ce83-f196-462c-995b-a87ef8c31f6c \
        -no-user-config \
        -nodefaults \
        -rtc base=utc,driftfix=slew \
        -global kvm-pit.lost_tick_policy=delay \
        -no-hpet \
        -no-shutdown \
        -global ICH9-LPC.disable_s3=1 \
        -global ICH9-LPC.disable_s4=1 \
        -boot menu=on,splash-time=5000,strict=on \
        -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
        -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
        -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
        -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
        -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
        -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
        -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
        -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
        -global driver=cfi.pflash01,property=secure,value=on \
        -blockdev node-name=file_ovmf_code,driver=file,read-only=on,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd \
        -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2.fd \
        -machine pflash0=file_ovmf_code,pflash1=file_ovmf_vars \
        -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \
        -blockdev node-name=file_image1,driver=file,aio=native,filename=/home/kvm_autotest_root/images/net-client.qcow2,cache.direct=on,cache.no-flush=off \
        -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
        -device scsi-hd,id=image1,drive=drive_image1,write-cache=on,bus=scsi0.0,channel=0,scsi-id=0,lun=0 \
        -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \
        -device usb-tablet,id=input0,bus=usb.0,port=1 \
        -vnc 0.0.0.0:1 \
        -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \
        -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
        -object rng-random,id=objrng0,filename=/dev/urandom \
        -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 \
        -global isa-debugcon.iobase=0x402 \
        -debugcon file:/tmp/ovmf_test.log \
        -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
        -msg timestamp=on \
        -monitor stdio \

Comment 1 Laszlo Ersek 2020-01-09 23:28:06 UTC
I can reproduce the issue.

My debugging & analysis indicates it is a regression from upstream
commit 4eee0cc7cc0d ("UefiCpuPkg/PiSmmCpu: Enable 5 level paging when
CPU supports", 2019-07-12). That commit was released as part of upstream
tag edk2-stable201908.

For downstream, it means this problem is a regression from BZ#1748180 /
"edk2-20190829git37eef91017ad-2.el8" -- that's where we rebased
downstream to edk2-stable201908.

Using "edk2-20190308git89910a39dcfd-6.el8" (the last official
development build in Brew, from before BZ#1748180), the symptom
disappears.

I've posted the upstream fix now:

* [edk2-devel] [PATCH]
  UefiCpuPkg/PiSmmCpuDxeSmm: fix 2M->4K page splitting regression for PDEs

  https://edk2.groups.io/g/devel/message/53098
  http://mid.mail-archive.com/20200109232512.11022-1-lersek@redhat.com

Comment 2 Laszlo Ersek 2020-01-17 09:50:59 UTC
(In reply to Laszlo Ersek from comment #1)

> I've posted the upstream fix now:
> 
> * [edk2-devel] [PATCH]
>   UefiCpuPkg/PiSmmCpuDxeSmm: fix 2M->4K page splitting regression for PDEs
> 
>   https://edk2.groups.io/g/devel/message/53098
>   http://mid.mail-archive.com/20200109232512.11022-1-lersek@redhat.com

Upstream commit a52355624440.

Comment 8 Xueqiang Wei 2020-02-04 05:07:49 UTC
Tested on edk2-20190829git37eef91017ad-6.el8, not hit this issue. So set status to VERIFIED.

Details:

Host:
kernel-4.18.0-175.el8.x86_64
qemu-kvm-4.2.0-7.module+el8.2.0+5520+4e5817f3
edk2-ovmf-20190829git37eef91017ad-6.el8.noarch

Guest:
kernel-4.18.0-147.el8.x86_64



1. boot guest with "-m 2001"

/usr/libexec/qemu-kvm \
        -name guest=net-client,debug-threads=on \
        -S \
        -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off \
        -cpu 'Opteron_G5',+kvm_pv_unhalt  \
        -m 2001 \
        -smp 4,sockets=1,cores=4,threads=1 \
        -uuid 6151ce83-f196-462c-995b-a87ef8c31f6c \
        -no-user-config \
        -nodefaults \
        -rtc base=utc,driftfix=slew \
        -global kvm-pit.lost_tick_policy=delay \
        -no-hpet \
        -no-shutdown \
        -global ICH9-LPC.disable_s3=1 \
        -global ICH9-LPC.disable_s4=1 \
        -boot menu=on,splash-time=5000,strict=on \
        -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
        -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
        -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
        -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
        -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
        -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
        -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
        -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
        -global driver=cfi.pflash01,property=secure,value=on \
        -blockdev node-name=file_ovmf_code,driver=file,read-only=on,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd \
        -blockdev node-name=file_ovmf_vars,driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2.fd \
        -machine pflash0=file_ovmf_code,pflash1=file_ovmf_vars \
        -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \
        -blockdev node-name=file_image1,driver=file,aio=native,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
        -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
        -device scsi-hd,id=image1,drive=drive_image1,write-cache=on,bus=scsi0.0,channel=0,scsi-id=0,lun=0 \
        -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \
        -device usb-tablet,id=input0,bus=usb.0,port=1 \
        -vnc 0.0.0.0:1 \
        -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \
        -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
        -object rng-random,id=objrng0,filename=/dev/urandom \
        -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 \
        -global isa-debugcon.iobase=0x402 \
        -debugcon file:/tmp/ovmf_test.log \
        -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
        -msg timestamp=on \
        -monitor stdio \


after step 1, guest boot up successfully.


Set memory to 2048, 4001, 4096, 15360, all work well.


Tested on slow train: qemu-kvm-2.12.0-97.module+el8.2.0+5545+14c6799f, also work well.

Comment 10 errata-xmlrpc 2020-04-28 16:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1712