RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1666941 - UEFI guest cannot boot into os when setting some special memory size
Summary: UEFI guest cannot boot into os when setting some special memory size
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: edk2
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: 8.0
Assignee: Laszlo Ersek
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-17 03:00 UTC by liuzi
Modified: 2020-11-14 16:54 UTC (History)
14 users (show)

Fixed In Version: edk2-20190308git89910a39dcfd-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-05 20:44:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
uefi guest cannot boot into os (18.59 KB, image/png)
2019-01-17 03:00 UTC, liuzi
no flags Details
reorder the 32-bit PCI hole vs. the PCIEXBAR on q35 (tarball of INSUFFICIENT edk2 patches) (3.37 KB, application/octet-stream)
2019-01-22 21:36 UTC, Laszlo Ersek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:3338 0 None None None 2019-11-05 20:45:46 UTC
TianoCore 1814 0 None None None 2019-06-24 20:19:19 UTC
TianoCore 1859 0 None None None 2019-06-24 20:19:19 UTC

Internal Links: 1701710

Description liuzi 2019-01-17 03:00:01 UTC
Created attachment 1521179 [details]
uefi guest cannot boot into os

Description of problem:
RHEL UEFI guest cannot boot into os when setting some special memory size

Version-Release number of selected component (if applicable):
kernel-4.18.0-56.el8.x86_64
virt-manager-2.0.0-2.el8.noarch
edk2-ovmf-20180508gitee3198e672e2-8.el8.noarch
qemu-kvm-2.12.0-45.module+el8+2313+d65431a0.x86_64
libvirt-4.5.0-18.module+el8+2691+dc742e5d.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Install a rhel uefi guest in virt-manager and make sure the guest can boots into os normally.
2.Change the guest memory size(such as 2049M,1025M)
3.Reboot the guest after changing memory size,the guest cannot boot into os normally and pop up an error message.pls refer to screenshots.

Actual results:
As above description

Expected results:
Can boot into os normally after changing memory size to 1025 or 2049.

Additional info:
1.Cannot reproduce on RHEL non-uefi guest.

Comment 1 Laszlo Ersek 2019-01-17 14:01:58 UTC
(Gerd, can you please help?)

The root cause is an interface contract that went wrong between QEMU and
OVMF.

- In November 2015, a user reported the ticket "no pcie hotplug support for
  q35/ovmf", in the then-bug-tracker for edk2; namely at
  <https://github.com/tianocore/edk2/issues/32>.

- That bug tracker was torn down soon after, but I mirrored all OVMF bugs
  faithfully (and manually) to the new bug tracker, so you can still read
  through it at <https://bugzilla.tianocore.org/show_bug.cgi?id=75>.

- The relevant upstream thread is linked in
  <https://bugzilla.tianocore.org/show_bug.cgi?id=75#c25>, namely
  <http://thread.gmane.org/gmane.comp.bios.edk2.devel/8707>.

- Obviously, GMANE too has died since, so let me give you a working link to
  that thread:

  [edk2] [PATCH 0/5] OvmfPkg: enable PCIe on Q35
  Message-Id: <1457102794-25499-1-git-send-email-lersek>
  https://www.mail-archive.com/edk2-devel@lists.01.org/msg08570.html

- Of particular interest for the interface contract is the discussion in the
  sub-thread

  Re: [edk2] [PATCH 2/5] OvmfPkg: PlatformPei: enable PCIEXBAR (aka MMCONFIG / ECAM) on Q35
  Message-Id: <1457340448.25423.43.camel>
  https://www.mail-archive.com/edk2-devel@lists.01.org/msg08682.html

  In that thread, we agreed that on Q35, QEMU would not map any RAM into the
  [2GB, 3GB) GPA range. OVMF would program the EXBAR at 2GB, and the 32-bit
  PCI MMIO aperture would start at 2GB+256MB.

- As a result, I posted the v2 series:

  https://bugzilla.tianocore.org/show_bug.cgi?id=75#c27
  [edk2] [PATCH v2 0/6] OvmfPkg: enable PCIe on Q35
  Message-Id: <1457446804-18892-1-git-send-email-lersek>
  https://www.mail-archive.com/edk2-devel@lists.01.org/msg08743.html

which was then merged as commit range 7e869eeb15b0..7daf2401d420.

- The relevant commits, which rely on the interface contract, are:

  OvmfPkg: PlatformPei: lower the 32-bit PCI MMIO base to 2GB on Q35
  https://github.com/tianocore/edk2/commit/b01acf6ea7e7

  OvmfPkg: PlatformPei: enable PCIEXBAR (aka MMCONFIG / ECAM) on Q35
  https://github.com/tianocore/edk2/commit/7b8fe63561b4

- The problem is that in reality, QEMU does not adhere to the interface
  contract. It does place RAM into the [2GB, 3GB) GPA range. See
  pc_q35_init() in "hw/i386/pc_q35.c":

>     /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
>      * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
>      * also known as MMCFG).
>      * If it doesn't, we need to split it in chunks below and above 4G.
>      * In any case, try to make sure that guest addresses aligned at
>      * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
>      */
>     if (machine->ram_size >= 0xb0000000) {
>         lowmem = 0x80000000;
>     } else {
>         lowmem = 0xb0000000;
>     }
>
>     /* Handle the machine opt max-ram-below-4g.  It is basically doing
>      * min(qemu limit, user limit).
>      */
>     [...]
>
>     if (machine->ram_size >= lowmem) {
>         pcms->above_4g_mem_size = machine->ram_size - lowmem;
>         pcms->below_4g_mem_size = lowmem;
>     } else {
>         pcms->above_4g_mem_size = 0;
>         pcms->below_4g_mem_size = machine->ram_size;
>     }


If the full RAM size is smaller than 2816 MB (0xb0000000), then it is
allowed to extend up to the [2048MB, 2816MB) GPA range: the second branch
will be taken in the last "if" above.

And that breaks the interface contract. OVMF maps the 256MB PCIEXBAR, and
the start of the 32-bit PCI MMIO aperture, over the [2048MB, 2816MB) range,
and Bad Things Happen (TM).

Unfortunately, I have no idea how to fix this. The Q35 memory split code is
very complex, and I'm not sure what the firmware should do in the first
place. I was happy that the contract was so simple. :(

Comment 2 Gerd Hoffmann 2019-01-17 15:19:42 UTC
> - The problem is that in reality, QEMU does not adhere to the interface
>   contract. It does place RAM into the [2GB, 3GB) GPA range. See
>   pc_q35_init() in "hw/i386/pc_q35.c":
> 
> >     /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> >      * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> >      * also known as MMCFG).
> >      * If it doesn't, we need to split it in chunks below and above 4G.
> >      * In any case, try to make sure that guest addresses aligned at
> >      * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> >      */
> >     if (machine->ram_size >= 0xb0000000) {
> >         lowmem = 0x80000000;
> >     } else {
> >         lowmem = 0xb0000000;
> >     }

Whoops.  Totally forgot this little detail.
I'm kida surprised that it took *that* long to show up.
Seems nobody uses VMs with odd memory sizes.

> >     /* Handle the machine opt max-ram-below-4g.  It is basically doing
> >      * min(qemu limit, user limit).
> >      */

Setting max-ram-below-4g=2G probably helps?

> Unfortunately, I have no idea how to fix this. The Q35 memory split code is
> very complex, and I'm not sure what the firmware should do in the first
> place. I was happy that the contract was so simple. :(

Hmm, I think we should fix qemu to stop using lowmem = 0xb0000000.  Unfortunaly
that'll work for new machine types only, for live migration compatibility reasons.
So we could fix the rhel-8 q35 machine types but rhel-7 q35 machine types would
stay broken.  Given that odd memory sizes seem to be used rarely in practice and
we have a workaround (max-ram-below-4g=2G) this might still be good enough though.

If we want fix things on the edk2 side I'd suggest to move MMCFG as high as possible,
i.e. use 0xe0000000 -> 0xefffffff range instead of 0x80000000 -> 0x8fffffff range.
Advantage: we can stick to a fixed location.  Drawback: we'll have two windows for
io then (end-of-ram -> 0xdfffffff and 0xf0000000 -> ioapic-base), but I think edk2
resource management can deal with that.  I hope the qemu acpi generator can handle
it too.

Comment 4 Laszlo Ersek 2019-01-18 23:46:52 UTC
Hi Gerd, thank you for the help!

Unfortunately, the edk2 PCI infrastructure cannot currently deal with discontiguous MMIO apertures. The interface through which a platform expresses the MMIO apertures for a specific root bridge is

  MdeModulePkg/Include/Library/PciHostBridgeLib.h

and it only offers (under 4GB):

  PCI_ROOT_BRIDGE_APERTURE Mem;                   ///< MMIO aperture below 4GB which can be used by the root bridge.
  PCI_ROOT_BRIDGE_APERTURE PMem;                  ///< Prefetchable MMIO aperture below 4GB which can be used by the root bridge.

in the PCI_ROOT_BRIDGE structure.

The platform is supposed to:
- implement a PciHostBridgeLib instance in the platform-specific way,
- build "MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf" in the platform DSC file,
- hook the platform's PciHostBridgeLib instance into PciHostBridgeDxe.

Then PciHostBridgeDxe calls into the platforms PciHostBridgeLib instance (the PciHostBridgeGetRootBridges() function), and gets the root bridges with their apertures from platform-specific code.

In OVMF, the lib instance lives at "OvmfPkg/Library/PciHostBridgeLib", and it relies on dynamic PCDs that OvmfPkg/PlatformPei sets during the PEI phase. All root bridges share the same apertures (one per bitness).

... The reason that PlatformPei handles this information, as very first agent in the firmware, is that the MMIO apertures affect the layout / size of the full address space, and dealing with that belongs in PlatformPei.

Anyway, here's what I suggest, as "action items":
- We should fix this as soon as we can, for the next available machine type. Honestly, I'd like to punt this part to you. :)
- I should test whether max-ram-below-4g=2G helps. Setting needinfo on myself for that.
- If it helps, we should update the RHEL documentation and explain that these memory sizes should not be used -- add a Known Issue. In practice, I think their impact is nil; nobody really uses such memory sizes.

Sound good?

Comment 6 Laszlo Ersek 2019-01-19 00:12:36 UTC
(In reply to Laszlo Ersek from comment #4)
> - I should test whether max-ram-below-4g=2G helps. Setting needinfo on
> myself for that.
> - If it helps, we should update the RHEL documentation and explain that
> these memory sizes should not be used -- add a Known Issue. In practice, I
> think their impact is nil; nobody really uses such memory sizes.

Sorry, I was unclear. I meant that we should document the Known Issue either way, and if "max-ram-below-4g=2G" works it around, then document the workaround as well.

However... we only support QEMU usage through libvirt, and I don't think libvirt exposes this knob. And hooking it in with <qemu:arg> taints the domain and makes it unsupportable again. So I guess we can't really fix it for old machine types.

Comment 7 Laszlo Ersek 2019-01-21 14:56:55 UTC
Gerd, how about this:

- Move the exbar base to fixed 0xe000_0000.
- Write off the GPA range [0xf000_0000, 0xfc00_0000), as unusable for 32-bit PCI MMIO aperture.
- Use the [TopOfLowRam, 0xe000_0000) range as 32-bit PCI MMIO aperture. (This would change the relative order of the 32-bit aperture and the exbar.)

Comparing the cases when the 32-bit window is largest:
- Currently OVMF exposes [0x9000_0000, 0xfc00_0000); that is, 0x6c00_0000 == 1728 MB.
- With the suggested change, OVMF would expose [0x8000_0000, 0xe000_0000] == 1536 MB (192 MB less).

In other words, we would trade
- 192MB of the 32-bit aperture, in the common "much RAM" case,
- for arbitrary RAM size support, in the uncommon "not much RAM" case.

Is it worth it?

Also, the change would be guest visible, if a guest were forward-migrated and then rebooted on the target host (which had a more recent OVMF).

Thanks.

Comment 9 Laszlo Ersek 2019-01-21 21:06:20 UTC
(In reply to Laszlo Ersek from comment #4)

> - I should test whether max-ram-below-4g=2G helps. Setting needinfo on
> myself for that.

I accidentally cleared the needinfo on myself; in order to avoid future instances of the same, let me re-route this question to QE :)

Michael, can you please repeat the test with the following addition:

  -machine max-ram-below-4g=2G

? Thanks!

Comment 10 Michael 2019-01-22 02:18:17 UTC
(In reply to Laszlo Ersek from comment #9)
> (In reply to Laszlo Ersek from comment #4)
> 
> > - I should test whether max-ram-below-4g=2G helps. Setting needinfo on
> > myself for that.
> 
> I accidentally cleared the needinfo on myself; in order to avoid future
> instances of the same, let me re-route this question to QE :)
> 
> Michael, can you please repeat the test with the following addition:
> 
>   -machine max-ram-below-4g=2G
> 
> ? Thanks!

Hi Laszlo:

Version-Release number:
kernel:4.18.0-60.el8.x86_64
qemu-kvm-3.1.0-4.module+el8+2681+819ab34d.x86_64


I tried 3 scenarios as follow:
[1] #/usr/libexec/qemu-kvm -machine max-ram-below-4g=2G
VNC server running on ::1:5900

==> Kernel and qemu can fully support this machine type. 

[2] If I use virtio-blk-pci, 
#/usr/libexec/qemu-kvm -enable-kvm -machine max-ram-below-4g=2G  -cpu SandyBridge \
-nodefaults -smp 4,cores=2,threads=2,sockets=1 -m 2049 -name win-OVMF \
-global driver=cfi.pflash01,property=secure,value=on  \
-drive file=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/tmp/win-OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=off \
-debugcon file:/home/win-OVMF.log -global isa-debugcon.iobase=0x402 \
-vnc :3 -vga qxl -monitor stdio \
-drive file=OVMF-win2016-virtio-win.qcow2,if=none,id=guest-img,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=guest-img,id=os-disk,bootindex=1 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:84:ed:01:00:09 \
-boot menu=on,splash-time=5000

==> The result is same with comments#0. 


[3] If I add device pcie-root-port, 
/usr/libexec/qemu-kvm -machine max-ram-below-4g=2G  -cpu SandyBridge -enable-kvm -m 2048 -smp 4 -nodefaults \
-global driver=cfi.pflash01,property=secure,value=on \
-drive file=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/tmp/rhel8/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=off \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-2,addr=0x0 \
-object secret,id=sec0,data=redhat \
-blockdev driver=luks,cache.direct=off,cache.no-flush=on,file.filename=OVMF-rhel8.luks,node-name=my_disk,file.driver=file,key-secret=sec0 \
-device scsi-hd,bus=scsi0.0,drive=my_disk \
-device virtio-net-pci,mac=24:be:05:15:d1:90,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-vnc :1 -monitor stdio -vga qxl -boot menu=o

==> qemu-kvm can not find 'pcie.0' bus. 


Hope those information can give some help. 

Thanks

Comment 11 Gerd Hoffmann 2019-01-22 07:25:20 UTC
(In reply to Laszlo Ersek from comment #7)
> Gerd, how about this:
> 
> - Move the exbar base to fixed 0xe000_0000.
> - Write off the GPA range [0xf000_0000, 0xfc00_0000), as unusable for 32-bit
> PCI MMIO aperture.
> - Use the [TopOfLowRam, 0xe000_0000) range as 32-bit PCI MMIO aperture.
> (This would change the relative order of the 32-bit aperture and the exbar.)
> 
> Comparing the cases when the 32-bit window is largest:
> - Currently OVMF exposes [0x9000_0000, 0xfc00_0000); that is, 0x6c00_0000 ==
> 1728 MB.
> - With the suggested change, OVMF would expose [0x8000_0000, 0xe000_0000] ==
> 1536 MB (192 MB less).

Note that the [0x8000_0000, 0xe000_0000] window alignment is better.  For example
it is possible to fit a 1G pci bar.  Not sure how much of an advantage that is in
practice, such large bars are typically 64bit bars anyway so we don't have to fit
them into 32bit address space.

> In other words, we would trade
> - 192MB of the 32-bit aperture, in the common "much RAM" case,
> - for arbitrary RAM size support, in the uncommon "not much RAM" case.
> 
> Is it worth it?

If it works without qemu changes...
Have you tried?  Any acpi errors in the guest kernel log?  Does /proc/iomem look sane?

> Also, the change would be guest visible, if a guest were forward-migrated
> and then rebooted on the target host (which had a more recent OVMF).
> 
> Thanks.

Comment 12 Laszlo Ersek 2019-01-22 17:46:49 UTC
Hi Michael,

(In reply to Michael from comment #10)
> (In reply to Laszlo Ersek from comment #9)

> > Michael, can you please repeat the test with the following addition:
> > 
> >   -machine max-ram-below-4g=2G

> I tried 3 scenarios as follow:
> [...]

sorry, my request was not clear enough. I wrote above, "with the following addition".

I meant that you should please *add* the option to everything else that you already have on the command line.

In particular, the addition was not supposed to *replace* the Q35 machine type. The option was supposed to set a machine property *for* the Q35 machine type, not to replace the Q35 machine type.

In all three of your scenarios, QEMU ended up using the "pc" machine type.

In scenario [2], you get a black screen because OVMF simply doesn't boot on the "pc" (i440fx) machine type. Although you wrote "The result is same with comments#0", in reality that only applies to the end-user-visible symptom ("black screen"); internally, the problem is entirely different.

Similarly, in scenario [3], QEMU complains about not finding the "pcie.0" bus because, i440fx has none.

Can you please repeat the test with the "max-ram-below-4g=2G" machine property added to everything else on the command line? For this, it is possible to

(a) use two -machine options, such as:

  -machine q35,[other options you usually have here] \
  -machine max-ram-below-4g=2G \

(b) or identically, use a single merged -machine option, such as:

  -machine q35,[other options you usually have here],max-ram-below-4g=2G

The reason that I couldn't immediately recommend a merged "-machine" option was that comment 0 doesn't contain a QEMU command line, so I couldn't modify it. I could only suggest to add (= append) "-machine max-ram-below-4g=2G".

Thanks!

Comment 13 Laszlo Ersek 2019-01-22 21:32:44 UTC
Unfortunately, I think we *really* cannot fix this. Not the "PCIEXBAR
vs. low RAM" mapping issue specifically, but the general "failure to
boot OVMF with arbitrary small RAM sizes" issue.

I've now written a small series to address the first issue (i.e., for
moving the EXBAR to 0xE000_0000, and to move the 32-bit PCI hole below
it). I wrote a super-detailed commit message for the last patch (the
main one), and then I prepared to start testing all the boundary
conditions that I described in that patch. That was when I
(re-)discovered the real mess.

Note that comment #0 does not carry an OVMF debug log, so we can't
precisely tell what the issue was. The scenario we've discussed thus far
is valid, and a RAM size strictly larger than 2048MB and strictly
smaller than 2816MB *may* indeed fail the following assertion:

      ASSERT (TopOfLowRam <= PciExBarBase);

in MemMapInitialization() [OvmfPkg/PlatformPei/Platform.c].

However, comment #0 also mentions RAM size 1025MB as a trigger, not just
2049MB. And RAM size 1025MB falls entirely outside of the discussion
thus far; it does not interfere with the PCIEXBAR at 2GB. Instead, we
have a different assertion failure in that case:

> ASSERT_EFI_ERROR (Status = Out of Resources)
> ASSERT OvmfPkg/PlatformPei/MemDetect.c(702): !EFI_ERROR (Status)

which refers to the following code in QemuInitializeRam():

>     //
>     // Set memory range from the "top of lower RAM" (RAM below 4GB) to 4GB as
>     // uncacheable
>     //
>     Status = MtrrSetMemoryAttribute (LowerMemorySize,
>                SIZE_4GB - LowerMemorySize, CacheUncacheable);
>     ASSERT_EFI_ERROR (Status);

The MTRR library in edk2 (UefiCpuPkg/Library/MtrrLib/) implements a
complex algorithm for determining optimal utilization for the fixed and
variable memory type range registers. In some cases however, there
simply aren't enough registers available for splitting the specified
address space into power-of-two sized masks, such that the requested
coverage be produced precisely. In those cases, MtrrSetMemoryAttribute()
fails.

And this code in OVMF is already the result of a painful struggle;
please see:

  https://github.com/tianocore/edk2/commit/79d274b8b6b1

In that commit, we ensured that any particular *large* guest RAM size
would be well covered by MTRR (which was in turn required by the Linux
guest!), however, in exchange, we had to give up coverage for many small
RAM sizes.

And, now I believe that said general theme applies to the prior
discussion too. Large RAM sizes are more likely, and in those cases,
QEMU will also set the 32-bit RAM split at 2GB, and things will just
work.

If customers care much for small guests, we should likely determine a
limited set of well-working "small instance sizes", and document those.
For example, the following all work for me:
- 512 MB
- 768 MB
- 1024 MB
- 1536 MB
- 2048 MB
- 3072 MB
- 4096 MB

It pains me, but I have to close this BZ now as CANTFIX. I'm really
sorry. If necessary, please open a new BZ for the virt docs.

Comment 14 Laszlo Ersek 2019-01-22 21:36:01 UTC
Created attachment 1522513 [details]
reorder the 32-bit PCI hole vs. the  PCIEXBAR on q35 (tarball of INSUFFICIENT edk2 patches)

For posterity, I'm attaching the patches I've written for the PCIEXBAR <-> TopOfLowRam issue. They apply on upstream edk2 commit 8f470eb4768f. As explained above, they do not solve the general issue. There is no change to the CANTFIX resolution.

Comment 15 Michael 2019-01-23 09:13:45 UTC
(In reply to Laszlo Ersek from comment #12)
> Hi Michael,
> 
> (In reply to Michael from comment #10)
> > (In reply to Laszlo Ersek from comment #9)
> 
> > > Michael, can you please repeat the test with the following addition:
> > > 
> > >   -machine max-ram-below-4g=2G
> 
> > I tried 3 scenarios as follow:
> > [...]
> 
> 
> (a) use two -machine options, such as:
> 
>   -machine q35,[other options you usually have here] \
>   -machine max-ram-below-4g=2G \
> 
> (b) or identically, use a single merged -machine option, such as:
> 
>   -machine q35,[other options you usually have here],max-ram-below-4g=2G
> 
> The reason that I couldn't immediately recommend a merged "-machine" option
> was that comment 0 doesn't contain a QEMU command line, so I couldn't modify
> it. I could only suggest to add (= append) "-machine max-ram-below-4g=2G".
> 
> Thanks!


Hi Laszlo:

Sorry for the misunderstanding. I repeat the test in both (a) and (b). Guest work well in both scenarios. 
/usr/libexec/qemu-kvm \

-M q35 -M max-ram-below-4g=2G \

-cpu SandyBridge -enable-kvm -m 2049 -smp 4 -nodefaults \
-global driver=cfi.pflash01,property=secure,value=on \
-drive file=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/tmp/new-edk2-version/OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=off \
-debugcon file:/home/win-OVMF.log -global isa-debugcon.iobase=0x402 \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
-device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-2,addr=0x0 \
-object secret,id=sec0,data=redhat \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/test/personal/choma/guest-nfs/rhel8-edk2-23-Jan.luks,node-name=protocol-node \
-blockdev node-name=format-node,driver=luks,file=protocol-node,key-secret=sec0 \
-device scsi-hd,bus=scsi0.0,drive=format-node \
-drive file=/usr/share/edk2/ovmf/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 -device ahci,id=ahci0 -device ide-cd,drive=cdrom1,id=ide-cd1 \
-device virtio-net-pci,mac=24:be:15:15:d1:91,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-vnc :1 -monitor stdio -vga qxl -boot menu=on,splash-time=5000


Hope this information can give some helps.

Comment 16 Laszlo Ersek 2019-01-23 10:36:48 UTC
Great, thank you. If it were really necessary, in the future we could expose the "max-ram-below-4g" machine property via libvirt, to solve at least a subset of the cases. (It wouldn't solve 1025MB, for example.)

Comment 17 Michael 2019-01-24 05:12:52 UTC
(In reply to Laszlo Ersek from comment #16)
> Great, thank you. If it were really necessary, in the future we could expose
> the "max-ram-below-4g" machine property via libvirt, to solve at least a
> subset of the cases. (It wouldn't solve 1025MB, for example.)

Hi Laszlo:

One more thing want to confirm with you. Does QE need to add "max-ram-below-4g" parameter in the future test? Or this parameter only suitable for this situation? 

Thanks

Comment 18 Laszlo Ersek 2019-01-24 08:27:13 UTC
Hi Michael,

no, please don't include the "max-ram-below-4g" machine property in the QE test plans. It would only help in a subset of the problematic cases. There are multiple issues when the VM RAM size is smaller than 2816 MB, and the machine property in question would help with only some of them. In addition, you would be testing something that libvirt doesn't currently expose.

Instead, please document this BZ as a known issue in your test plans, and whenever you need a "small guest", use one of the RAM sizes from the end of comment 13.

Thanks!
Laszlo

Comment 19 Michael 2019-01-24 08:57:12 UTC
(In reply to Laszlo Ersek from comment #18)
> Hi Michael,
> 
> no, please don't include the "max-ram-below-4g" machine property in the QE
> test plans. It would only help in a subset of the problematic cases. There
> are multiple issues when the VM RAM size is smaller than 2816 MB, and the
> machine property in question would help with only some of them. In addition,
> you would be testing something that libvirt doesn't currently expose.
> 
> Instead, please document this BZ as a known issue in your test plans, and
> whenever you need a "small guest", use one of the RAM sizes from the end of
> comment 13.
> 
> Thanks!
> Laszlo


OK, make sense. Thank you very much.

Comment 20 Laszlo Ersek 2019-05-03 16:04:10 UTC
The mtrr issue might be fixable after all, by lifting an idea from SeaBIOS that I'd have myself considered invalid. It seems to work though.

Comment 21 Laszlo Ersek 2019-05-04 00:10:42 UTC
Posted upstream patches:

[edk2-devel] [PATCH 0/4]
OvmfPkg/PlatformPei: fix two assertion failures with weird RAM sizes

http://mid.mail-archive.com/20190504000716.7525-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/39965
https://www.redhat.com/archives/edk2-devel-archive/2019-May/msg00080.html

Comment 22 Laszlo Ersek 2019-05-16 19:35:50 UTC
Upstream commits:

$ git log --oneline --reverse 3b7a897cd8e3..39b9a5ffe661 | cat -n
     1  60e95bf5094f OvmfPkg/PlatformPei: assign PciSize on both i440fx/q35 branches explicitly
     2  9a2e8d7c65ef OvmfPkg/PlatformPei: hoist PciBase assignment above the i440fx/q35 branching
     3  75136b29541b OvmfPkg/PlatformPei: reorder the 32-bit PCI window vs. the PCIEXBAR on q35
     4  39b9a5ffe661 OvmfPkg/PlatformPei: fix MTRR for low-RAM sizes that have many bits clear

Comment 26 Laszlo Ersek 2019-06-03 18:12:58 UTC
The upstream series for TianoCore#1859 consists of reverting the patches listed in comment 22, then adding two new (replacement) patches:

$ git log --oneline --reverse f03859ea6c8f..49edde15230a | cat -n
     1  305cd4f783fe Revert "OvmfPkg/PlatformPei: fix MTRR for low-RAM sizes that have many bits clear"
     2  eb4d62b0779c Revert "OvmfPkg/PlatformPei: reorder the 32-bit PCI window vs. the PCIEXBAR on q35"
     3  753d3d6f43b2 Revert "OvmfPkg/PlatformPei: hoist PciBase assignment above the i440fx/q35 branching"
     4  d45349841113 Revert "OvmfPkg/PlatformPei: assign PciSize on both i440fx/q35 branches explicitly"
     5  b07de0974b65 OvmfPkg: raise the PCIEXBAR base to 2816 MB on Q35
     6  49edde15230a OvmfPkg/PlatformPei: set 32-bit UC area at PciBase / PciExBarBase (pc/q35)

Comment 30 Miroslav Rezanina 2019-06-11 04:14:24 UTC
Fix included in edk2-20190308git89910a39dcfd-3.el8

Comment 32 Michael 2019-06-26 06:36:55 UTC
Hi all:

I am verifying this Bug. 

First of all, I can reproduce this Bug using 'edk2-ovmf-20190308git89910a39dcfd-2.el8.noarch'. 

Boot the guest with special memory size.

/usr/libexec/qemu-kvm -M q35 -cpu SandyBridge -enable-kvm -m 2048/1025/4097 -smp 4 -nodefaults \
... ...


I got black screen. 



Then, I update the edk2 package. 

kernel:4.18.0-107.el8.x86_64
qemu-kvm-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64
*edk2-ovmf-20190308git89910a39dcfd-4.el8.noarch*

I booted same guest using same commend line. Guest can work well. I *did not using follow command:
         
     -M max-ram-below-4g=2G. 


Thus, I mark this Bug as verified. 



Thanks.

Comment 35 errata-xmlrpc 2019-11-05 20:44:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3338


Note You need to log in before you can comment on or make changes to this bug.