Bug 1453133
| Summary: | arm64: remove useless .reloc section (backport gnu-efi 8581a58e5ba8 or kernel effc7b027aac) | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Pavel Holica <pholica> | |
| Component: | gnu-efi | Assignee: | Peter Jones <pjones> | |
| Status: | CLOSED ERRATA | QA Contact: | Release Test Team <release-test-team-automation> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.4 | CC: | abologna, bhsharma, jbastian, jstodola, lersek, lmiksik, pholica, rjones, rrichter, shuali | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | gnu-efi-3.0.8-2.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1486376 (view as bug list) | Environment: | ||
| Last Closed: | 2018-10-30 10:55:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1174832, 1392968, 1486376 | |||
|
Description
Pavel Holica
2017-05-22 08:58:09 UTC
Ok, so it seems, that the fallback is there, but named as fbaa64.efi in case of aarch64. If I execute that one directly, everything works as should. So, it seems, that BOOTAA64.EFI is trying to execute wrong file. This bug affects x86_64 systems running in UEFI mode and all aarch64 systems. This bug occurs in situations where there is missing boot entry and system tries to use "fallback" boot method using file located at known location BOOT/BOOTX64.EFI (in case of x86_64) or BOOT/BOOTAA64.EFI (in case of aarch64) which in turn should start BOOT/FBX64.EFI or BOOT/FBAA64.EFI, but that doesn't happen. The "fallback" method is used in following occations (sorted with respect to their impact): * Booting from disk image (created by livemedia creator) written on HDD * Booting on system with HDD taken from another system (e.g. HW replacement while keeping HDD with OS) * Booting on system which had some firmware disaster and where boot entries were lost * Booting on system where RHEL boot entry was manually removed Fix of this bug should be imho very easy. > Fix of this bug should be imho very easy.
I don't know how you can possibly say that - there's nothing here that even characterizes what the bug is:
- is it looking for the wrong file? If you name it fallback.efi does it work?
- is fallback.efi built wrong and actually has a bogus size for section 0?
- is the shim loader code incorrectly rejecting the fallback binary?
*** Bug 1485809 has been marked as a duplicate of this bug. *** dppath: \efi\boot\bootaa64.efi path: bootaa64.efi FSOpen: Open '\EFI\BOOT\fbaa64.efi' Success FSOpen: Open '\efi\boot\fbaa64.efi' Success Section 0 has negative size Failed to load image: Unsupported start_image() returned Unsupported I think I know what's wrong. First let me whine a bit about how incredibly annoying it is that there is no easily available PE dumper utility for Fedora / RHEL users. Google found "pev" for me <http://pev.sourceforge.net/>, but it doesn't seem to be packaged for Fedora. So I had to resort to using a web application (shudder), namely <http://pedump.me/>. (Whoever operates pedump.me: thank you.) So I uploaded "fbaa64.efi" to this site, and it lists two sections: name va vsize raw size flags ------ ----- ------ -------- --------------------- .reloc 0 0 0 R-- IDATA DISCARDABLE .text 0x148 0xe130 0xe130 RWX CODE The .reloc section has virtual size (vsize) zero. The shim code in turn reads as follows: > /* > * Copy the executable's sections to their desired offsets > */ > Section = context.FirstSection; > for (i = 0; i < context.NumberOfSections; i++, Section++) { > base = ImageAddress (buffer, context.ImageSize, > Section->VirtualAddress); > end = ImageAddress (buffer, context.ImageSize, > Section->VirtualAddress > + Section->Misc.VirtualSize - 1); > > if (end < base) { > perror(L"Section %d has negative size\n", i); The ImageAddress() function: > /* > * Perform basic bounds checking of the intra-image pointers > */ > static void *ImageAddress (void *image, unsigned int size, unsigned int address) > { > if (address > size) > return NULL; > > return image + address; > } ImageAddress() looks like a correct function -- aside from doing arithmetic on a pointer-to-void, which is undefined behavior --; it can return a pointer pointing one past the last byte in image (as given by "size"). Evaluating such a pointer is *valid* C; dereferencing it is not. In handle_image(): The way "base" is calculated for each section in handle_image() looks correct. The way "end" is calculated for each section in handle_image() looks incorrect. I believe "end" was meant as an inclusive "end" (hence the -1 at the end), but no such thing exists for sections that have zero size. The expression Section->VirtualAddress + Section->Misc.VirtualSize - 1 is evaluated in "unsigned int" (UINT32). The first two addends have this type, and the subtrahend has type "int", but then it is converted to "unsigned int". So basically we have 0u + 0u - 1u Therefore the last argument to ImageAddress() is 0xFFFF_FFFFu. ImageAddress() sees that this is out of bounds, and returns NULL. Then the comparison (end < base) invokes undefined behavior, because C permits relops between pointers only if both pointers point at elements, or one past the last element, in the same array. "end" (==NULL) doesn't point into any array, so the behavior is undefined. Either way, the expression evaluates to 1, and we get the error message. I cannot immediately recommend a patch because I don't understand why "end" is supposed to be an inclusive end. "base" offsets are usually inclusive, while "end" offsets (like sizes) are usually exclusive. Any progress on this? All RHEL 7.5 ALT aarch64 guests are still affected as of the last compose; RHEL 7.5 x86_64 guests don't seem to be affected, despite what was reported in Comment 2. Unless I'm mistaken, this will also affect bare metal aarch64 systems which, starting with RHEL 7.4 ALT, are fully supported. (In reply to Andrea Bolognani from comment #8) > All RHEL 7.5 ALT aarch64 guests are still affected as of the last > compose; RHEL 7.5 x86_64 guests don't seem to be affected, despite > what was reported in Comment 2. Ah, that's very interesting! This should let us compare section lists! Could you please upload both fbaa64.efi and fbx64.efi to <http://pedump.me/>, and see what (different) section lists are printed? (pls. refer to comment 7.) Thanks! (In reply to Laszlo Ersek from comment #9) > (In reply to Andrea Bolognani from comment #8) > > > All RHEL 7.5 ALT aarch64 guests are still affected as of the last > > compose; RHEL 7.5 x86_64 guests don't seem to be affected, despite > > what was reported in Comment 2. > > Ah, that's very interesting! This should let us compare section lists! First, a small note: I used RHEL 7.4 for both architectures instead of 7.5 and 7.5 ALT, so that the divergence would at least in theory be even smaller than last time around. I also made sure the issue reproduced on aarch64 and not on x86_64, as I claimed previously, and indeed that's still the case. One last thing: on x86_64, OVMF_CODE.secboot.fd is used whereas on aarch64 the firmware is AAVMF_CODE.fd, without Secure Boot support. Not sure whether it's relevant at all, but there you have it. > Could you please upload both fbaa64.efi and fbx64.efi to > <http://pedump.me/>, and see what (different) section lists are printed? > (pls. refer to comment 7.) Thanks! Just my luck: pedump.me seems to be having backend issues at the moment, so I couldn't use the Web version :) However, the page also hints at a Ruby version of the tools being available as a gem, so I installed that - in a guest, of course! The section list is quite different: [ fbaa64.efi ] === SECTIONS === NAME RVA VSZ RAW_SZ RAW_PTR nREL REL_PTR nLINE LINE_PTR FLAGS .reloc 0 0 0 0 0 0 0 0 42100040 R-- IDATA DISCARDABLE .text 148 e130 e130 148 0 0 0 0 e0500020 RWX CODE [ fbx64.efi ] === SECTIONS === NAME RVA VSZ RAW_SZ RAW_PTR nREL REL_PTR nLINE LINE_PTR FLAGS "/4" 1000 1f88 2000 400 0 0 0 0 40400040 R-- IDATA .text 3000 820d 8400 2400 0 0 0 0 60500020 R-X CODE .reloc c000 a 200 a800 0 0 0 0 42100040 R-- IDATA DISCARDABLE .data e000 2618 2800 aa00 0 0 0 0 c0600040 RW- IDATA .dynamic 11000 f0 200 d200 0 0 0 0 c0400040 RW- IDATA .rela 12000 ea0 1000 d400 0 0 0 0 40400040 R-- IDATA .dynsym 13000 18c0 1a00 e400 0 0 0 0 40400040 R-- IDATA I can't make head or tails of it, of course, but hopefully it means something to you :) Let me know if I can provide any more information. I promise I'll do my best to provide it in less than a month, next time. Thank you, Andrea! So, this is the key part: > [ fbaa64.efi ] > === SECTIONS === > > NAME RVA VSZ RAW_SZ RAW_PTR nREL REL_PTR nLINE LINE_PTR FLAGS > .reloc 0 0 0 0 0 0 0 0 42100040 R-- IDATA DISCARDABLE This comes from the aarch64 platform enablement in "gnu-efi", which is the EFI foundation for "fbaa64.efi" (the shim project is built with gnu-efi and not edk2). In gnu-efi, there's a file called "gnuefi/crt0-efi-aarch64.S". It's entire history basically consists of two commits: * commit 1525190354f5; "Add support for 64-bit ARM (AArch64)", dated 2014-08-08; part of gnu-efi 3.0.1 * commit 8581a58e5ba8; "emit the code as split .text/.data with R-X/RW- permissions" -- paraphrased from the subject-less message --, dated 2017-02-15; part of gnu-efi 3.0.6 Commit 1525190354f5 introduced aa64 support with code like this: > +section_table: > + > + /* > + * The EFI application loader requires a relocation section > + * because EFI applications must be relocatable. This is a > + * dummy section as far as we are concerned. > + */ > + .ascii ".reloc" > + .byte 0 > + .byte 0 // end of 0 padding of section name > + .long 0 > + .long 0 > + .long 0 // SizeOfRawData > + .long 0 // PointerToRawData > + .long 0 // PointerToRelocations > + .long 0 // PointerToLineNumbers > + .short 0 // NumberOfRelocations > + .short 0 // NumberOfLineNumbers > + .long 0x42100040 // Characteristics (section flags) Note how this matches exactly the reloc section dumped from "fbaa64.efi" above. Referring back to comment 7, this entry is what causes the sum "Section->VirtualAddress + Section->Misc.VirtualSize" to evaluate to zero. We subtract 1 from that, and get the underflow. So basically the gnu-efi platform code for aarch64 is incompatible with the handle_image() function in "shim.c". The second gnu-efi commit, 8581a58e5ba8, changed things a little. The stated purpose was: > For compatibility with an upcoming EDK2 feature that maps UEFI apps > using strict permissions, emit the code as split .text/.data with > R-X/RW- permissions, respectively. However, as a side effect, it *also* removed the .reloc section altogether: the { .reloc, .text } *set* of sections -- which corresponds to the dump in comment 10 -- was replaced by the { .text, .data } *set* of sections. In short, this problem should all go away by rebuilding the shim package against gnu-efi 3.0.6 (which contains gnu-efi commit 8581a58e5ba8) or later. Now, the latest official (development) build of gnu-efi in *Brew*, for RHEL-7, is "gnu-efi-3.0.5-9.el7". This corresponds to dist-git commit a45db2efe71d ("Just don't build the .i686 package at all.", 2017-03-30); on the "rhel-7.5" branch. At said point, the tree does not seem to contain a backport of upstream gnu-efi commit 8581a58e5ba8; hence the bug. I have now also checked Fedora's dist-git repo for gnu-efi. Somewhat incredibly :) , the latest commit on the "master" branch is: * 253342666fc3 ("Don't make .reloc sections on Aarch64 binaries.", 2017-08-24) -- built into "gnu-efi-3.0.5-11.fc27" Unexpectedly, this Fedora-only gnu-efi patch ports a *kernel* patch -- namely effc7b027aac ("arm64: efi: remove pointless dummy .reloc section", 2017-04-04; part of Linux v4.12) -- to Fedora's gnu-efi. It solves the problem differently from upstream gnu-efi commit 8581a58e5ba8: it simply removes the ".reloc" section. In my opinion, *either* approach can work fine: - we can either apply *kernel* commit effc7b027aac to RHEL7 gnu-efi too, and simply kill the .reloc section (which results in a downstream-only gnu-efi patch for RHEL7, imitating Fedora Rawhide), - or else we can specifically backport upstream gnu-efi commit 8581a58e5ba8, which replaces { .reloc, .text } with { .text, .data }, similarly eliminating the .reloc section in the process. This would keep gnu-efi closer to upstream (in RHEL7 anyway). I'm changing the component of this BZ to gnu-efi, updating the title, and requesting "blocker" status. ... Actually, because shim is built statically against gnu-efi, I think this BZ (with the root cause) should be fixed in gnu-efi, but we'll also need a clone BZ for shim, dependent on this BZ. I'll leave the cloning to Peter. Thanks. There's no way it's practical to do this at this point for RHEL 7.5 - any fix would involve rebuilding shim and shim-signed, which means waiting 2+ weeks for signatures on the x86_64 side. This issue shouldn't be a part of normal workflow, it only effects cases where the boot variables aren't set correctly, and there's a straightforward workaround: boot the rescue image and add the variable with efibootmgr. With that in mind, and since this has no acks whatsoever, I'm moving the flag to rhel-7.6.0? and clearing rhel-7.5.0, and adding my devel_ack+ with that change in ind. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3185 |