Bug 1838633

Summary: UEFI HTTP out of memory error when booting larger LiveCD
Product: [Fedora] Fedora Reporter: Lukas Zapletal <lzap>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: fmartine, lkundrak, pjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: grub2-2.04-19.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-29 04:09:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
OOM grub error
none
OOM kernel error
none
Screenshot from MEN server (Intel Atom)
none
Screen from libvirt with debug=all none

Description Lukas Zapletal 2020-05-21 13:22:36 UTC
Created attachment 1690644 [details]
OOM grub error

Hello,

we have a customer who would like to do UEFI HTTP Boot over VLAN (tagging). Unfortunately, this does not work. Grub prints "Out of memory" error for a moment (see the attachment) and then kernel prints out an error about not being able to open up root device.

This is booting a livecd over UEFI HTTP boot in libvirt VM for the record with VLAN id 13 set in EFI firmware.

  menuentry 'Foreman Discovery Image EFI' --id discovery {
    linuxefi boot/fdi-image/vmlinuz0 rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=0 rd.neednet=0 nokaslr nomodeset proxy.url=https://sat68.nat.lan proxy.type=foreman BOOTIF=01-$net_default_mac fdi.vlan=13
    initrdefi boot/fdi-image/initrd0.img
  }

I am using latest grub2 from Fedora:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1509008

Comment 1 Lukas Zapletal 2020-05-21 13:23:30 UTC
Created attachment 1690645 [details]
OOM kernel error

Comment 2 Lukas Zapletal 2020-05-21 13:30:29 UTC
I am seeing the same behaviour on native (not tagged) network as well.

Comment 3 Javier Martinez Canillas 2020-05-21 14:19:20 UTC
Hello Lukas,

Could you please set debug=all to get more information on where this out of memory is happening?

Comment 4 Lukas Zapletal 2020-05-25 10:32:36 UTC
Hello Javier,

I was on call with a customer who tried the same on their hardware with the same result. We tried with

set debug="http,efinet,net"

unfortunately there is not much logged. Attaching the screenshot from their hardware but it reads the same. When we tried with debug=all it was never ending and it was rolling for hour, then the error appears, "Press a key to continue" actually just waits few seconds and then it tries to boot the system. So we are not able to capture reliably anything for you. Please advice how to capture the debug output, maybe I could create a video and then seek in the recording. Or maybe if you can give me an option to prevent grub from booting when this error appears.

You can probably reproduce this in libvirt too, you just need to boot big enough live CD. You can use Fedora or in our case Discovery Image which is 300 MB sized RHEL7/CentOS7 created with livecd-creator: http://downloads.theforeman.org/discovery/nightly/fdi-image-latest.tar

Comment 5 Lukas Zapletal 2020-05-25 10:43:29 UTC
Created attachment 1691896 [details]
Screenshot from MEN server (Intel Atom)

Comment 6 Lukas Zapletal 2020-05-25 10:49:24 UTC
Created attachment 1691898 [details]
Screen from libvirt with debug=all

Comment 7 Javier Martinez Canillas 2020-05-25 16:28:30 UTC
I'm changing this to Fedora 32 since is a regression in GRUB 2.04. It doesn't affect RHEL7 and RHEL8.

Comment 8 Javier Martinez Canillas 2020-05-25 16:35:34 UTC
The memory allocation failure happens in the verifiers framework, because it reads the files to be verified as a single chunk when passing to the verifiers modules (i.e: the tpm module). So the issue happens when a initrd image is large (in Lukas' test the initrd size is 237 MiB) and GRUB tries to verify it.

By default GRUB request a quarter of the memory available system for its heap, which is enough for most cases but it is not when using large initrd images and the verifiers framework. One option could be to change GRUB's default to request a bigger size of the available memory for the heap.

Comment 9 Javier Martinez Canillas 2020-05-25 18:17:33 UTC
I found the issue and is that now that the tpm module is built-in the EFI binary, GRUB is allocating two buffers to read the initrd image.

One buffer is allocated in the linux EFI loader and used as a bounce buffer because some machines aren't able to DMA above 4GB during EFI.
The other buffer is allocated by the verifiers framework as mentioned in Comment 8, to read the file and pass it as a single chunk to the
tpm module and other modules using the verifiers API.

Since the initrd image is quite big, there isn't enough memory in the heap to allocate two buffers of that size. But when using the verifiers
framework, the read operation is just a memory copy from the buffer that was used to read the file in the verifiers open handler, so there
is no need for a bounce buffer in the linux EFI loader anymore.

Comment 10 Fedora Update System 2020-05-26 16:44:27 UTC
FEDORA-2020-193b04db8e has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-193b04db8e

Comment 11 Fedora Update System 2020-05-27 02:21:27 UTC
FEDORA-2020-193b04db8e has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-193b04db8e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-193b04db8e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Fedora Update System 2020-05-29 04:09:24 UTC
FEDORA-2020-193b04db8e has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.