Description of problem:
On UEFI computers, grubx64.efi searches for grub.cfg only in /boot/efi/EFI/fedora which is a single point of failure. Additionally, upstream has said it should go in /boot/grub on both BIOS and UEFI computers, while also the inconsistency is confusing to users for no apparent benefit.
Enables use case permitting boot even in the face of a device failure when /boot and rootfs are on e.g. raid1. The grub.cfg also needs to be on raid1 to permit booting or it's a single source of boot failure.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
grubx64.efi prefix is set to the ESP, not boot so it's not going to find a /boot/grub2/grub.cfg
Boot fail with an otherwise bootable system.
grub.cfg should be in the same location regardless of firmware and arch.
Well, yes, upstream uses /boot/grub and grub-install. Fedora has for all practical purposes forked grub and do it differently for EFI. One benefit of the Fedora way is that it is possible to ship signed boot loaders and is "secure" (for the "secure boot" definition of secure). Granted, it mostly works.
ESP is inherently a single point of failure on EFI systems. Placing a part of the bootloader elsewhere or on raid will not make it any more resistant to failure.
Remember also that the Fedora fork of grub requires different grub.cfg for EFI and bios. The upstream idea of a generic grub.cfg is thus not applicable.
The Fedora fork is regularly rebased to upstream versions but patches are not upstreamed - there seems to have been some kind of fall-out in the past, probably personality issues and disagreement on how secure boot should be handled.
I guess that the only way to make it move in the direction you ask for is to make sure upstream get a clearly superior solution that conflicts with the Fedora changes.
So because the Fedora grubx64.efi is signed, it can't be changed to point to custom locations, which in effect /boot is because it's specified by UUID. The signed grub can only be static by pointing to a static location which is to search for grub.cfg in the same location grubx64.efi is already in? Is that it?
So one work around is grubby needs to update grub.cfg found on all disks, that way any disk could boot the system. So this is maybe a grubby RFE instead?
Yes, that is pretty much it.
I don't know which problem you see, but I think grubby should be updating both /etc/grub2.cfg and /etc/grub2-efi.cfg if they are found.
The problem I see is that on BIOS hardware, /boot on any kind of raid1/5/6 (md, LVM2, or Btrfs based) always has a /boot/grub2/grub.cfg that core.img can find and boot the system even with a failed disk. This works today.
On EFI this doesn't work because Fedora grub places grub.cfg at /boot/efi/EFI/fedora/ and only one ESP is mounted at /boot/efi. I'm not sure how grubby could be taught to find and mount all other ESP's to modify their grub.cfg, seems heavy handed and possibly prone to failing. Placing the grub.cfg on the ESP breaks or at least complicates booting from degraded raid.
I brough this up on grub-devel: http://lists.gnu.org/archive/html/grub-devel/2014-01/msg00029.html
On my Fedora 20 BIOS systems, /etc/grub2.cfg is a link to /boot/grub2/grub.cfg and on EFI /etc/grub2-efi.cfg links to /boot/efi/EFI/fedora/grub.cfg. If I have both /boot/efi/EFI/fedora/grub.cfg and /boot/grub2/grub.cfg, both are correctly edited by grubby. But then both are mounted file systems. I don't think it works on unmounted file systems.
I should better qualify doesn't work on EFI. It doesn't work by default with Fedora on EFI because its signed grubx64.efi doesn't look to /boot/grub/ rather it looks to the ESP. Running grub2-install creates a grubx64.efi that points to /boot/grub, but then isn't signed.
Without Secure Boot, it can be made to work manually; on Secure Boot it's a problem.
Yes, with bios and software raid you can manually install the bootloader on MBR on several disks, and for some kinds of disk errors the bios will fall back from one to the other.
With efi and fedora grub you can manually copy the necessary files to alternative ESPs, and for some kinds of disk errors the firmware will fall back from one to the other. (If you actually do have efi firmware that can do that?)
I don't think the difference is big. With efi you don't automatically get your grub.cfg updated automatically when you install kernels, but instead you get a simpler setup where (almost) the whole boot loader is stored on the same simple filesystem that you rely on anyway.
If you want to use the same grub.cfg from multiple ESPs then I assume you can put a minimal EFI/fedora/grub.cfg with a simple configfile command on the ESPs and let both configfile and /etc/grub2-efi.cfg and grub2-mkconfig point at some location. (I would have suggested that anaconda always configured the system that way and always used /boot/grub2/grub.cfg ... but the unfortunate linuxefi and initrdefi commands do that it would be a bit of a kludge anyway.)
Further thought on this, we definitely need a generic grub.cfg on the ESP or baked into the core.img/grubx64.efi that points to /boot/grub2.
Bootable md raid1 is broken on UEFI when the ESP contains a custom grub.cfg because grubby doesn't go looking for all of the unmounted ESPs to update their grub.cfg - how would it know what to update? This works on BIOS/MBR computers because the core.img in the MBR gap points to /boot/grub2 which is on an md raid1 device so it's a bootable system even in a one drive failure.
Additionally, md raid10, Btrfs raid1 and raid10 are also not resiliently bootable in the face of a single drive failure; whereas this works on BIOS/MBR.
See related bug 1060576. Either we need grubx64.efi's prefix to search the current disk for /boot/grub2, or we need a generic grub.cfg that can do that search, or we need anaconda to create short forwarder grub.cfg's per physical device, that point to the "real" grub.cfg on the raid at /boot/grub2/grub.cfg.
(In reply to Chris Murphy from comment #7)
> Further thought on this, we definitely need a generic grub.cfg on the ESP or
> baked into the core.img/grubx64.efi that points to /boot/grub2.
That seems to me like a premature conclusion. It sounds like a demand for a specific solution instead of a clear description of the problem you would like to solve.
An attempt at describing the case:
First, it is assumed that EFI systems can be made resilient to disk failures without using hardware raid. It is assumed that it is possible to put ESP on multiple drives, and if the primary drive dies it will use the ESP from another drive. (I don't know to which extent that is the case - and which kinds of drive failure it can handle. Especially the interaction with the efi variables sounds tricky. Some examples and statistics-ish numbers could help.)
Upstream grub kind of supports this by placing /boot on software raid and somehow running grub-install for installing custom built boot loaders on multiple ESPs, hardcoded to load grub.cfg from the /boot partition.
The fedora grub2 also makes it tricky to install on multiple ESPs, but do in addition to that also not have a grubby-compatible way of maintaining the grub.cfg that is placed on ESP.
> Bootable md raid1 is broken on UEFI when the ESP contains a custom grub.cfg
> because grubby doesn't go looking for all of the unmounted ESPs to update
> their grub.cfg - how would it know what to update?
The most obvious answer to that would be to let grubby look for other /etc/grub2-efi*.cfg symlinks.
Not saying that it would be the best solution ... but it would be very simple.
> Additionally, md raid10, Btrfs raid1 and raid10 are also not resiliently
> bootable in the face of a single drive failure; whereas this works on
In what way is that "additionally"? Isn't that the main "problem" you are trying to solve?
(But I think the best way to solve this problem would be to work upstream on improving "secure boot" support so there was a supported way distributions could ship signed EFIs.)
Further assume single boot OS, not multiboot. All bets are off for multiboot.
At least with vbox UEFI, I've already successfully reproduced what I'm suggesting and with the existing shim and grub packages including fallback capability, NVRAM entries aren't even needed. The firmware finds any disk, loads bootx64.efi, and from there grub finds boot files on whatever file systems it supports. I haven't tested a Secure Boot system but I'm working on the premise that it's best to come up with one solution that can work with and without Secure Boot enabled.
I don't agree that specifically Fedora grub2 makes installation onto multiple ESPs tricky. I see it as a flawed design to even have the ESP mounted, which is common to Fedora and upstream grub2's. And grub-install depends on a mounted ESP to install to, likewise flawed. All information is available for grub-install to figure out which physical devices the current system uses, and therefore which ones need bootloader files installed on their ESPs but this isn't how it presently works.
As for grubby, again by convention only one ESP is mounted at /boot/efi and that's the only grub.cfg that exists on Fedora EFI installs, therefore only one grub.cfg is updated. However, if I replace that grub.cfg with a simple forwarding grub.cfg, to the real one at /boot/grub2/grub.cfg on md raid1, 10, 5, 6, etc, this does work and remains bootable in the face of a single device failure. And grubby reliably updates that single grub.cfg instance which is all that really needs to be updated.
The additionally part is to acknowledge there are other use cases than just booting from md raid1. It could be md raid10, 5, 6, Btrfs raid, etc. - anything that grub supports. It's sort of a superfluous comment.
I agree with the comment about Secure Boot and upstream. I've posted an inquiry on how they imagine resilient (degraded) booting of raided systems in a UEFI Secure Boot context. If it works for Secure Boot, it ought to work without as well.
Another point. Any implementation that depends on regularly writing to the ESP is flawed. It really should be modified as little as possible, considering it's FAT32 and rather prone to breakage in the effect of crash or power failure when being written to. It doesn't support barriers, and doesn't have journal.
So it makes more sense for the regularly modified grub.cfg to be anywhere other than on the ESP. Whether this means a prebaked/signed grubx64.efi that can smartly locate that grub.cfg, or a simple immutable grub.cfg on the ESP that merely forwards to the regularly modified one (using configfile).
In fact Windows and OS X don't even keep the ESP mounted read only, let alone read write. I think depending on the ESP being persistently mounted at /boot/efi is inappropriate design.
Note: grub2 will boot off multi-device BTRFS IF, and only if, all devices in the defined for the btrfs filesystem are present and working!
There's appears no reliable way for core.img to point to the real grub.cfg in a location other than its own directory. Therefore I suggest that anaconda create a /boot/efi/EFI/fedora/grub.cfg placeholder file that causes grub to search for the boot disk UUID and uses configfile to load the /boot/grub2/grub.cfg file there.
That way regardless of BIOS or UEFI, the grub.cfg is in the same location. And I'd expect the same thing for aarch64.
An example grub.cfg that does this:
# DO NOT EDIT THIS FILE
# THIS FILE FORWARDS TO ACTUAL grub.cfg
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt3 --hint-efi=hd0,gpt3 --hint-baremetal=ahci0,gpt3 d372e5d1-386f-460c-b036-611469e0155e
The hints aren't necessary (and in this example are wrong anyway as created by grub2-mkconfig) what's important is search by volume uuid works and then the real grub.cfg is found and loaded. Since it's the same volume the kernel is found on, core.img is guaranteed to be able to read and locate that file system.
I've tested this on ext4, XFS and Btrfs (boot directory on root subvolume, and also boot subvolume).
This also means that anaconda should put the real grub.cfg at /boot/grub2 not at /boot/efi/EFI/fedora/ on UEFI systems; i.e. the grub2-mkconfig -o command that anaconda uses should point to the same place on UEFI and BIOS systems.
The purpose of this is, there's no good reason why we deviate from upstream on this issue, and there's no good reason why the grub.cfg's need to be in different locations just because the firmware versions differ.
The advantages of what I describe in comment 13:
1. The exact same files go on each ESP in a multiple disk installation, ensuring resilient bootable raid on UEFI just like on BIOS, without having to come up with some overcomplicated way to sync the ESPs.
2. We'd have separate grub.cfg files for each Fedora install. Right now a Fedora Rawhide install obliterates the Fedora 20 grub.cfg, and the resulting combined grub.cfg is mangled (it works, it's just I have no idea what kernel I'm booting for the F20 system).
I seriously doubt that two OS's sharing one grub.cfg, either managed by grub2-mkconfig or grubby is a good idea let alone supportable. The alternative is strictly no support for dual installed Fedoras.
This has essentially been accomplished by the BLS by default feature, starting with Fedora 30.