Bug 2125069

Summary: Please update grub2 to support PE/COFF decompressor
Product: Red Hat Enterprise Linux 9 Reporter: Jeremy Linton (ARM) <jlinton>
Component: grub2Assignee: Bootloader engineering team <bootloader-eng-team>
Status: CLOSED MIGRATED QA Contact: Release Test Team <release-test-team-automation>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.4CC: jaredz, lersek, mlewando, pbrobinson, raravind
Target Milestone: rcKeywords: MigratedToJIRA, OtherQA, Triaged
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-16 17:08:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2162369    
Bug Blocks: 2103803    

Description Jeremy Linton (ARM) 2022-09-07 21:35:30 UTC
Description of problem: There is a compressed aarch64 kernel patch on the list https://lore.kernel.org/linux-arm-kernel/20220827083850.2702465-2-ardb@kernel.org/T/

That set corrects the the compromise that one either has an uncompressed/signed EFI PE/COFF image or one gzip's it. Forcing bootloaders/etc to decompress the image and tying the supported decompression algorithms to the boot loader. Neither choice is good.

Given the above set, its possible to have a signed EFI/PE/COFF image, that self decompresses and places the kernel correctly in memory. 

The only problem is the fact that grub2 in fedora and rhel has deviated significantly from upstream, which continues to use the EFI bootservices to load and execute the image on arm64 platforms. In this case there are two problems. First, the arm64 magic number that identifies a PE+raw kernel image is no longer in the image and it simply identifies itself as a correct PE/COFF image, which means the grub specific check for it needs to be dropped.

Secondly, grub2 is assuming that the file length is equal to the header+text+initialized data and that the COFF imagesize is filelen+BSS and simply memset's the difference as a replacement for the bootservices loadimage/startimage. This latter assumption works fine only if the raw=virtual size of the text+data/etc all are equal, or there isn't any data following the section's described by the PE/COFF header. Ideally, the boot services should be used to place the PE/COFF sections in memory, but as an alternative, its possible to further "hack" the existing boot flow by computing the raw/on disk image length from the section data. Which seems also to fix the general problem

My understanding is that this image decompression/signature compromise is part of the problem with actually getting a secure boot path signed on arm, although after looking at the grub code, i'm a bit skeptical that its the only issue.

Version-Release number of selected component (if applicable): 9.0+


How reproducible: 100% boot failures due to malloc_ptr failing to be 0'ed properly when Ard's ZBOOT patches are applied.


Steps to Reproduce:
1. Apply linked patches, and attempt to boot the generated/signed kernel utilizing RH based grub.

Expected results:
It boots in all three modes, non-compressed EFI stubbed kernel, gzipp'ed EFI stubbed kernel, and the new EFI stubbed kernel that self decompresses.

Additional info:

So, I have an open PR open against the fedora sources here: https://src.fedoraproject.org/rpms/grub2/pull-request/20#

Which fixes this, and can be moved to github/etc if needed, although... I tend to think the latter PE/COFF section parsing code is maybe closer to an RFC.

Comment 1 Jeremy Linton (ARM) 2022-09-08 12:51:54 UTC
Updated, PR location: https://github.com/rhboot/grub2/pull/110

Comment 2 Robbie Harwood 2022-09-08 20:50:56 UTC
While there's nothing anyone else can do about how you feel about the state of grub code, as you've observed, patches are welcome :)

Applied to Fedora rawhide.

Comment 3 Jared Dominguez 2022-10-11 18:43:48 UTC
(In reply to Jeremy Linton (ARM) from comment #0)
> Description of problem: There is a compressed aarch64 kernel patch on the
> list
> https://lore.kernel.org/linux-arm-kernel/20220827083850.2702465-2-
> ardb/T/

Just to confirm, have you filed a BZ for this?

Comment 4 Jeremy Linton 2022-10-17 18:34:54 UTC
No, IIRC didn't open a bug for the compressed kernel bits in RH, just the grub fix. 

PS: My grub fix: https://github.com/rhinstaller/anaconda/pull/4368/files (lol)

Comment 5 Jeremy Linton 2022-10-17 18:42:50 UTC
More seriously, and maybe it deserves its own defect: https://github.com/rhboot/grub2/pull/107 was one of the things I had just looked at right before posting that. But, along those lines, we really shouldn't be trying to duplicate firmware calls in grub like that. These two PR's are just an example of "hacking" a EFI startimage call in, while ignoring much of the spec. It works until someone decides to utilize some other part of the spec not handed by grub.

Comment 8 Marta Lewandowska 2023-01-20 08:55:13 UTC
Hi Jeremy,
Would you be willing to test that this is working as expected with compressed and self-decompressing kernels once we have a RHEL grub build for you?

Comment 9 Jeremy Linton 2023-01-20 21:53:26 UTC
Yes, I can swap it into a couple of machines I have. Let me know where the build is, and I can pull it from an internal source if that is easier.

Comment 10 Marta Lewandowska 2023-01-23 08:03:41 UTC
Great, thank you. I'll be in touch once we have a build. :)

Comment 11 Laszlo Ersek 2023-03-27 09:44:15 UTC
(In reply to Robbie Harwood from comment #2)
> While there's nothing anyone else can do about how you feel about the state
> of grub code, as you've observed, patches are welcome :)
> 
> Applied to Fedora rawhide.

This had fixed what is now F37 (thanks!), but F36 is still affected (bug 2181825); can you please apply the patch to F36 too? Thanks!

Comment 18 RHEL Program Management 2023-09-16 17:08:10 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 19 RHEL Program Management 2023-09-16 17:08:29 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.