Red Hat Bugzilla – Bug 864198
grubby fatal error updating grub.cfg when /boot is btrfs
Last modified: 2017-08-08 14:30:48 EDT
Description of problem:
If /boot is on btrfs (as tested, specifically a btrfs subvolume mounted on /boot), grubby fails to update grub.cfg.
Version-Release number of selected component (if applicable):
100% so far
Steps to Reproduce:
1. F18 TC1 anaconda creates root and home subvols, while boot goes on ext4.
2. Create a boot subvol on the existing btrfs volume.
3. cp -a contents of ext4 boot to the btrfs boot subvol.
4. Update fstab, update grub.cfg with grub2-mkconfig; on one attempt I had to relabel with restorecon.
[at this point there is a working bootable system]
5. yum update that inclues a kernel update
Error message within yum updating:
Installing : kernel-3.6.0-3.fc18.x86_64 96/190
grubby fatal error: unable to find a suitable template
The resulting grub.cfg lacks an entry for this new kernel.
dlehman has said /boot as a btrfs subvol should be possible for F18, instead of only allowing ext4; in which case I'd expect grubby can handle updating grub.cfg.
Otherwise, this is somewhat cosmetic because I can manually run grub2-mkconfig which takes care of the problem.
I have not regressed to /boot being btrfs on its own partition, sans any subvolumes, but it seems either grubby should be able to update grub.cfg for /boot on a btrfs subvol, or it should call grub2-mkconfig which does work.
Bug 530108 might be related.
Please attach your grub.cfg . Also, if you could, run "rpm -q --scripts -p $KERNEL_PACKAGE", and manually run each of the grubby commands from %post and %posttrans with "--debug" until it shows the failure. Then attach the output from that to this bug.
Created attachment 632180 [details]
grubby debug output
# bash -x /sbin/new-kernel-pkg --package kernel --install 3.6.2-2.fc18.x86_64
produces this line resulting in the error:
/sbin/grubby --grub2 -c /boot/grub2/grub.cfg --add-kernel=/boot/vmlinuz-3.6.2-2.fc18.x86_64 --copy-default --make-default --title 'Fedora (3.6.2-2.fc18.x86_64)' '--args=root=UUID=780b8553-4097-4136-92a4-c6fd48779b0c ' '--remove-kernel=TITLE=Fedora (3.6.2-2.fc18.x86_64)'
Attached is the result of the above with --debug added.
Created attachment 632183 [details]
Huh. So it's trying to find kernels in /boot/boot/ . Can you show me /proc/mounts and the output of "stat /" and "stat /boot"?
Created attachment 632845 [details]
info requested from comment 5
Summary from cat /proc/mounts
/dev/sda1 / btrfs rw,seclabel,relatime,space_cache 0 0
/dev/sda1 /home btrfs rw,seclabel,relatime,space_cache 0 0
/dev/sda1 /boot btrfs rw,seclabel,relatime,space_cache 0 0
Size: 152 Blocks: 0 IO Block: 4096 directory
Device: 1fh/31d Inode: 256 Links: 1
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Size: 1470 Blocks: 0 IO Block: 4096 directory
Device: 26h/38d Inode: 256 Links: 1
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
subvols show up in mountinfo
# cat /proc/self/mountinfo | grep btrfs
34 1 0:29 /root / rw,relatime shared:1 - btrfs /dev/sda1 rw,seclabel,space_cache
41 34 0:29 /home /home rw,relatime shared:26 - btrfs /dev/sda1 rw,seclabel,space_cache
43 34 0:29 /boot /boot rw,relatime shared:27 - btrfs /dev/sda1 rw,seclabel,space_cache
The fstab also shows subvol boot is mounted at /boot. But for some reason with grubby it's imagining the btrfs volume mounted at it's default subvol? And then following the boot subvol as if it were a folder? In that case it would be /boot/boot.
UUID=780b8553-4097-4136-92a4-c6fd48779b0c / btrfs subvol=root 1 1
UUID=780b8553-4097-4136-92a4-c6fd48779b0c /boot btrfs subvol=boot 1 2
UUID=780b8553-4097-4136-92a4-c6fd48779b0c /home btrfs subvol=home 1 2
I'm not sure how grubby looks for subvolumes, if at all? But there was a change in subvolume list format, which affected anaconda in Bug 868468.
This bug is still alive, and anaconda will let the user create a layout as describe in comment 7; meaning post install users who produce this layout will need to manually run grub2-mkconfig to update grub.cfg after any kernel updates.
Bug 895606 seems like a DUP of this one.
I have the same issue, though my grubby seems to be trying to access /root/boot/*:
DBG: Image entry failed: access to /root/boot/vmlinuz-3.7.9-205.fc18.x86_64 failed
When I ran anaconda, I used the default btrfs partitioning scheme iirc, which gives me the following ftab line (I added compression later):
UUID=b67f524b-ae87-4409-ba81-4c1d1ede4c64 / btrfs subvol=root,compress=lzo 1 1
(In reply to comment #11)
default btrfs partition scheme puts / on btrfs and /boot on ext4. So your install is custom, which is why you experience this problem also. FWIW if you use one of these, you'll get a correct grub.cfg that includes the new kernel:
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
(In reply to comment #12)
> (In reply to comment #11)
> default btrfs partition scheme puts / on btrfs and /boot on ext4. So your
> install is custom, which is why you experience this problem also. FWIW if
> you use one of these, you'll get a correct grub.cfg that includes the new
> grub2-mkconfig -o /boot/grub2/grub.cfg
> grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
If this is indeed the default with respect to Btrfs managment, can it be concluded that Fedora does not really support the Btrfs in the true sense of the word?
(In reply to comment #13)
Fedora is a project, it's ever changing and evolving, so I don't know what you mean by "support". I also don't know what you mean by "this" or "management". So if you want to try to write a more specific and coherent question, I'll try to answer it.
There are some valid reasons for putting /boot on ext4 by default, even though GRUB2 can handle /boot on btrfs or as a btrfs subvol. As for the grub.cfg being in different locations depending on BIOS vs UEFI, I think that's a legitimate gripe, and I'd like to know more why there's a difference in location, as it's inconsistent with upstream GRUB.
(In reply to comment #14)
I do understand, the Fedora is limited by the upstream tools capabilities.
At the moment of playfulness I forgot what it's all about. :)
Besides even Extlinux can't cope with multi device btrfs.
Syslinux 5.10 2013 - syslinux-5.10-pre2:
Booting from Hard Disk...
SYSLINUX 4.05 EDD 2011-12-09 Copyright (C) 1994-2011 H. Peter Anvin et al
warning: only support single device btrfs
EDD: Error 0100 reading sector <lotto numbers?> :)
(In reply to comment #15)
> I do understand, the Fedora is limited by the upstream tools capabilities.
> At the moment of playfulness I forgot what it's all about. :)
> Besides even Extlinux can't cope with multi device btrfs.
For this reason, I'm only talking about ext2/3/4 for Extlinux in Fedora 19.
Still reproducible in F19, both with /boot on its own mount point set to Btrfs device type, and also with just a / mount point (which is likely what happened in comment 11).
1. Installation destination > Choose a new blank disk.
2. Installation options > set partition scheme to Btrfs, choose to modify/review.
3. Manual partitioning > create a new mount point /. By default this will be Btrfs device type, and / will be installed to a subvolume named root.
4. Proceed with the installation.
6. yum update kernel
grubby fatal error: unable to find a suitable template
Setting version to 19.
Still a bug in F20 alpha and also with rawhide kernels, so I'm flipping version to rawhide.
Proposing as a Fedora 20 beta blocker because it's the actual cause of bug 1012646. "A system installed without a graphical package set must boot to a working login prompt without any unintended user intervention, and all virtual consoles intended to provide a working login prompt must do so." And also "No part of any release-blocking desktop's panel (or equivalent) configuration may crash on startup or be entirely non-functional."
Brief summary: The installer allows the user to place boot on Btrfs subvolumes, but because of this bug grubby doesn't add the necessary initrd line to the installer created grub.cfg, so on reboot from a successfully installed system, it kernel panics.
There are a number of "small" but significant problems that makes using btrfs difficult (this one included). They really need to be addressed sooner rather than later. While some folks might not want to used BTRFS because it is still experimental, others will be precluded from using it because they cannot get their system installed on it.
Adding to btrfs tracker.
Discussed this in the 2013-10-02 Blocker Review Meeting . This has been voted a RejectedBlocker. This does not clearly violate any F20 beta criteria and is a corner case. Therefore, it does not qualify as a release blocking bug for f20 beta.
While it is not a blocker, I believe that it should be a higher priority to be addressed for fixing that it obviously is.
Fortunately, you can run grub2-mkconfig -o /boot/grub2/grub.cfg (or UFI version) after updating the kernel and have everything still work properly.
I'm uncertain how this doesn't clearly violate the F20 beta criterion that the system needs to boot to either a desktop or a working login prompt. Proposing as a final blocker for the same reasoning; it is also a regression from F19, the system as installed is bootable there.
Rejecting blocker on the basis that boot on btrfs is unsupported is a better explanation; but I haven't heard anyone clearly say that boot on btrfs is unsupported.
Chris: "I'm uncertain how this doesn't clearly violate the F20 beta criterion that the system needs to boot to either a desktop or a working login prompt."
We can't, practically speaking, accept absolutely any bug that can cause some extremely obscure configuration not to boot as a blocker bug. There are always going to be obscure bugs in kernel drivers and things.
There is some tension between that note and this language in the criteria right before the 'Expected installed system boot behavior' section:
"Except where otherwise specified, each of these requirements applies to all supported configurations described above."
which we should resolve a bit more clearly, but in practice, it has always been the case that blocker status is not guaranteed for very configuration-dependent bugs.
I do think this wasn't a complete slam-dunk 'non blocker' for Beta, it could possibly have been taken as one, since we do have some reasonably clear requirements for custom partitioning to work at Beta. But rejecting it as a blocker was acceptable under the current policy. I could see it possibly being accepted as a Final blocker, though. It is a pretty subjective call with the way things are currently written. It would be good if we could make it clearer, but it's very very hard to come up with an 'objective' written policy that covers every possible scenario sensibly :/
We should definitely document this at least, noting that before I forget.
(In reply to Adam Williamson from comment #25)
The problem with the locality exception is that this is universal; it's not at all analogous to the video card example. It always happens when /boot is on btrfs for everyone, and the installer makes it easy to arrive at this configuration. How is this configuration obscure?
If this were ext4 or XFS, the controversy would be it taking more than 12 hours to fix, rather than 12 months. The real issue isn't the configuration, which clearly have legitimate technical/resource limitations to make work, but whether btrfs is to have problem resolution parity with other file systems or if there's an exception.
*** Bug 1012646 has been marked as a duplicate of this bug. ***
Chris: it is dependent upon a specific configuration: the configuration that /boot is on a btrfs volume. The determination is entirely based on how common/important that configuration is considered to be.
OK, I cannot help myself.
Anything involving BTRFS seems to take a low priority. Yes, if you remember to run grub2-mkconfig after the kernel update or you have /boot on a regular partition or even a logical volume, then things work. Why is there reluctance to addressing this problem? Twelve months is a little silly isn't it?
Another example is https://bugzilla.redhat.com/show_bug.cgi?id=892747 where the existing anaconda code simply ignores any existing btrfs subvolumes if they are specified in kickstart. You can add some code to your post-install script to fixup your /etc/fstab but you should not have to do that.
Anyway, concerning grubby, if someone has the time, I suggest they take a look at the code and propose a fix.
Adam: The bug prevents the configuration from being common or important. The workaround is esoteric. Therefore the determination is entirely based on circular reasoning. And that is a self-justification recipe for any btrfs bug being put on an infinite back burner with no plan of how to move forward.
Chris, you're exaggerating drastically. A bug is not 'on an infinite back burner' because it was rejected as a release blocker. I'm only concerned with the release blocker determination, here. What priority the bug is assigned is up to the anaconda team. Obviously they kind of have to give blocker bugs a high priority, but the converse is not true: they are not required to give *non* blocker bugs a *low* priority.
Adam: Except it's a grubby bug. And except you're confused about my complaint, which is usage of a logical fallacy to justify the rejection. And I don't want it applied to either this or future similar bugs: oh well, it's a rare configuration, so we won't block because of that, even though the bug is what causes it to be a rare configuration and it otherwise meets the requirements for blocker status.
I don't find that a convincing argument for any bug, let alone this one.
*** Bug 895606 has been marked as a duplicate of this bug. ***
This was rejected as a beta blocker and re-proposed as a final blocker, changing whiteboard and blocks field to reflect this
I followed the steps in comment 17 in a VM and it resulted in a functioning system. The updated kernel doesn't show as a boot option, but the system boots fine.
Running: grub2-mkconfig -o /boot/grub2/grub.cfg
Fixes the boot options and results in a working system with the F20 Beta DVD image.
"The updated kernel doesn't show as a boot option, but the system boots fine. "
That would be the expected outcome, yes.
I haven't tested it, but you could wind up in trouble after several kernel updates, when yum starts 'cleaning up' ones more than three releases old - though I believe it would never 'clean' the currently-running kernel.
Oh, sorry, I forgot that the bug as reported is that grubby creates a new entry but doesn't add the initrd line.
Discussed at 2013-11-13 blocker review meeting - http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-13/f20-final-blocker-review-1.2013-11-13-17.01.log.txt . Accepted as a blocker as a conditional violation of criterion "The installed system must be able to download and install updates with the default console package manager." - the update is not correctly 'installed' in the case of /boot being on btrfs (the updated kernel is not bootable), and this may have security implications (user believes a kernel that fixes a security bug has been installed and should be in use, but this is not the case).
summary: this bug has two consequences, a.) Fedora 20 installation when /boot is on Btrfs fails to boot, kernel panics, due to lack of initrd entry in the grub.cfg; b.) Fedora 20 kernel updates fail to be added to the grub.cfg.
I do not know what the status is for "fixing" grubby. Nothing has been posted to the grubby git repository so I must assume, "not much." I have a patch for grubby itself which corrects what I (anyway) consider a potential security problem: in findTemplate() a candidate may be found but is never used; the p[atch corrects that. Also, the code at the end of findTemplate() merrily goes through the entire grub.cfg file looking for a candidate ... including the os-prober menuentry and 40_custom and any others ... I added a limit of two.
The second patch is a hack: it "hurts" to use grubby is btrfs is the root file system. OK, so do not use grubby is the root file system is btrfs AND this is being done for grub2. The "hack" its to the bach script new-kernel-pkg.
I have done manual testing so far and will be doing more extensive testing which I will report on later. See commit comments for more info.
Created attachment 827180 [details]
address potential security problem in grubby findtTemplate()
Created attachment 827181 [details]
if grub2 and btrfs, use grub2-mkconfig
See commit comments
When/if we have full support in the tools for /boot on btrfs subvolume feel free to open a bug against python-blivet to get automatic partitioning changed accordingly.
Either I will or Chris will. This is really silly because you already support installing /boot on btrfs ... just not in a separate subvolume. I just completed an install and reboot of a two partition system: 1 is swap and the second is a BTRFS volume with two subvolumes: root ("/") and home ("/home"). Installed with no problems and works fine.
I have been kickstart installing on single-device and multi-device btrfs volumes with /boot on a ext4 partition, with /boot on a subvol, and with /boot simply as a directory on the root subvol. They all install fine [except for that UUID thing that I would like to see sooner rather than later].
No, most of the problems are not with the installer (anaconda and friends) but with some of the support tools such as grubby, grub2, and os-prober. There are "fixes" now available to address all the problems.
I don't think the 'hack' is the way to go, here. we just need to bug pjones hard enough to fix grubby.
Adam, have you ever looked at that code? I do not envy pjones trying to fix this problem inside of grubby itself. IMHO, a new problem could be caused just as easily as fixing this one.
For example, just a quick look at findTemplate() resulted in the attached patch. Another example is the new-kernel-ppkg --rminitrd flat does not work ... it does nothing but, because apparently rpm cleans up /boot for all files with the version id of the kernel being removed, there is no problem.
Now, I was just about to report some test results. Using the patches attached to this bugzilla report, I created an updated grubby-8.28-1 rpm. I then installed using the Fedora-20-Beta DVD on single and dual device BTRFS volumes with /boot on a subvol and /boot being a directory on the root subvol.
For each of the test system siturations:
1. Running on the 3.11.6 kernel
2. apply updated rpms for grubby, grub2, grub2-tools, and os-prober (the last three have btrfs support fixes applied)
3. edit /bin/kernel-install to add "-v" to new-kernel-pkg so they are more verbose
4. with yum, update (install) kernel 3.11.8 and reboot to run 3.11.8
5. OK, it runs, reboot back to 3.11.6
6. running on 3.11.6, yum remove the 3.11.8 kernel
Everything works. There are some differences since grub2-mkconfig can be somewhat verbose.
Created attachment 827953 [details]
when rootfs is btrfs, use grub2-mkconfig instead of grubby
This version has been through additional testing. Also, unless "-v" (verbose) is specified for new-kernel-pkg to messages from grub2-mkconfig is redirected to /dev/null and all of the debug messages os-prober sends to syslog and disabled. Therefore, there is little to see that grub2-mkconfig is used.
And, best of all, it works.
After doing a lot more research, I have come to the conclusion that this is a lost cause. I am not sure where (or if) a plan is laid out but the direction seems to be systemd with kernel-install and the bootloader spec:
Although Fedora does not use it yet, the code is starting to get added so that kernel-install will handle dracut creating the initramfs file and depmod getting run. The question in my mind was, OK, so what. The bootloader configuration still needed updating and then I found BootLoaderSpec.
A patch is in grub2 to support this and I read that an initial attempt has been made for zipl (s390). Something for extlinux is still needed. I can sort-of see the goal but the path getting there is not so clear.
There are still al lot of questions in my mind as to how this is going to work. For example, I have a system with a rootfs mounted on "/". Then I have the new, special, common to everyone and writable by all systems /boot partition which is either ext2/3/4 or vfat. Then (most likely) I have another partition/LVM/btrfs-subvol mounted on a directory in /boot (directory named my machine-id).
I also wonder what release is the target: F21? F22?
My suggestion for grubby and btrfs: at this point, either implement something like I did or state the restriction that /boot cannot be on a BTRFS filesystem (and anaconda needs to stop supporting it). The last option is consistent with the future kernel-install/BootLoaderSpec "plans" as I now understand them.
As dlehman said, this needs to be fixed if we want to enable installing to such a configuration, but right now we don't.
I'm all for keeping this open so we can track it and fix it, but this shouldn't be an F20 blocker at all.
After having done some research, I believe that pjones and I are now in agreement. This should not be a blocker and the release notes should say that /boot in BTRFS is not currently supported. If you want consistency, say to put it on a separate ext234 partition like extlinux requires.
That said, there really needs to be some documentation which lays out the plan for kernel updating and bootloader configuration. If it is going to be systemd/kernel-install/BootLoaderSpec then that should be spelled out.
Also, why is anaconda installing into an unsupported configuration?
Apparently pjones wasn't aware that anaconda allows this configuration. I've confirmed Gene's report that it does: custom part happily allowed me to create a layout with btrfs /boot , btrfs / , and swap, no warnings or anything.
So, it sounds like pjones considers this an unsupported configuration and it's not realistic to fix it in F20 time frame, so the alternative is simply to fix anaconda not to allow it. We should be able to do that in time.
Adam, you put swap on BTRFS??? Now that is something that I believe will not work and I believe that the BTRFS-folks have no plans on ever supporting swap on their filesystem.
Now, if anaconda lets you put swap on btrfs, that is a anaconda bug ... fortunately one that I believe needs to be fixed maybe before F21.
And again, let me say that if the direction is systemd/BootLoaderSpec, then maybe we should force a separate partition either ext234 or vfat.
gene: no, I didn't. I wrote "btrfs /boot , btrfs / , and swap". Note the word 'btrfs' occurs next to '/boot' and again next to '/', but not next to 'swap'.
Take is good! However, I am going one step further and installing with just btrfs / --subvol --name=root seven
You can install onto a system with no swap space defined and it even runs. I did that accidentally and don't really know what would happen with no swap.
off-topic, this isn't a disk layout discussion forum. bug's 56 comments long already, please help keep it focused. thanks.
Bureaucracy note: as changing anaconda not to allow this layout doesn't really "fix" this bug, we should keep it open but drop the blocker status when we get an anaconda build that doesn't allow the layout any more.
Patch to disallow /boot on btrfs subvol has been posted for review.
anaconda-20.25.14-1.fc20 has been submitted as an update for Fedora 20.
Package anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-20.25.14-1.fc20 python-blivet-0.23.8-1.fc20'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
The 'fix' here is the big-hammer-workaround of 'no /boot allowed on btrfs', for the record, so take that into account in testing. The grubby bug is not fixed, but you should not be able to hit it on a clean install of F20 because we won't let you put /boot on btrfs.
We shouldn't close the bug, because it still exists, but we can drop blocker status, assuming the 'fix' works, when anaconda goes stable.
I tested TC5 and tried to put /boot on btrfs.
I tried many things, to see if i found a way (or a crash)... but no.
So the patch to prevent /boot on btrfs does work and i did not find any crash or weird stuff. So it looks stable. (Anaconda 20.25.14-1)
As per confirmation in c#62, the blocker issue is now addressed, so dropping blocker status.
Dropping commonbugs as this is unlikely to be common any more, since we don't allow it.
*** Bug 1045790 has been marked as a duplicate of this bug. ***
(In reply to Peter Jones from comment #50)
> As dlehman said, this needs to be fixed if we want to enable installing to
> such a configuration, but right now we don't.
Why don't we want to enable installing to such a configuration? Ubuntu, openSUSE, Debian, Arch, all support /boot on Btrfs. The Fedora installer is leading in this area, but grubby is seriously holding Fedora well behind the capability of other distributions.
is there actually someone working on solving this?
This is still the long-pole-in-th-tent with respect to btrfs support on Fedora.
My fix/patch is attached. The package maintainer does not like my solution of doing the fix in bash-shell code and using grub2-mkconfig to do the updating.
As I understand it, his reasoning is that grub2-mkconfig rebuilds all of grub.cfg whereas gruppy updates grub.cfg leaving any use edits to grub.cfg in place. HOWEVER, there is a longstanding statement by grub2 that the grub.cfg file should never be user modified.
that line's in grub.cfg from upstream, on the assumption that the distro will call grub-mkconfig on kernel installs. it doesn't entirely apply to fedora's design...
Adam, this is the first time that someone mentioned that Fedora's design is different. In the interest of moving this forward, I see two possibilities:
1. fix grubby.c ... having examined this code, this is no small task!
2. An alternate package to grubby. The problem here is that this package would need to be specified, installed, and used by anaconda. I am skeptical that this would be accepted even though I like this option.
As mentioned by cmurph, all other the other major distributions support /boot on btrfs. Why is Fedora (and RHEL) so resistant to fixing this? [Other than digging into grubby.c looks to be a bit of a nightmare]
One other question: Why is it Fedora's design to allow direct user modifications to grub.cfg?
As I recall, one of Fedora's goals is to keep packages as close to upstream as possible.
Yet another question: Where is it articulated that it is a Fedora design goal to allow/support direct user modifications to grub.cfg (to preserve those changes over the event of a new kernel install)?
by 'different design', I simply mean the use of grubby. it's not really that we intend to allow user changes to grub.cfg, it's just a side effect.
"Why is Fedora (and RHEL) so resistant to fixing this?"
we aren't, at all. we just don't consider it a particularly high-priority issue. and pjones won't take a fix he doesn't consider correct/appropriate, and no-one else feels inclined to overrule him.
There are some folks (such as myself and Chris Murphy) who believe that the priority of addressing this "problem" should be raised. I propose that this discussion should be moved to the devel mailing list to get comments from more individuals. Comments?
BTW, maybe this would raise the profile of grubby.c so that someone would step up to making it work correctly or, perhaps, there would be a concensus that using bash-script code in place of grubby.c is acceptable afterall.
Since many folks supporting Fedora (and Red Hat) products seem to be in love with python, would it be acceptable to replace grubby.c with python code?
discussions should be on mailing lists, yes.
Any update on this issue? What is the conclusion of the mailing list discussion? Despite the possible workaround using grub2-mkconfig directly, I wonder why nothing is happening here, despite the willingness of Gene to fix this and the fact that restricting the installer is not the most clever idea (the installer supports subvolumes that trigger this problem too, forbidding that, would be similar to forbidding btrfs at all imho.).
A bit too late for Fedora 21. Peter (pjones) is working on it for Fedora 22.
If you need the update, take a look at http://czarc.org
Fedora 21 installer permits /boot on btrfs subvolume. However fedup upgrade of Fedora 21 to Fedora 22 will fail in this case due to this bug. Therefore Fedora 21 installations where /boot is on Btrfs can't be fedup'd unless there's a backport to grubby.fc21.
grubby-8.35-8.fc21.x86_64 still has this behavior but now the failure is silent.
Proposed as a Blocker for 22-beta by Fedora user chrismurphy using the blocker tracking app because:
Same reasoning as in comment 39. What's changed is that Anaconda permits /boot on Btrfs by working around this bug at install time by calling grub2-mkconfig twice:
And tested patches to fix this in grubby have been available, just not yet merged.
Discussed at 2015-03-23 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-23/f22-blocker-review.2015-03-23-16.02.log.txt .
This is a long and complex bug and we're not clear on what exactly is being proposed for blocker status. Chris, can you please clarify exactly what aspect of this bug you'd like to make a blocker, what its practical impact is, and what the expected resolution is?
dnf/yum updates, gnome-software+systemd offline updates, and fedup upgrades that include new kernels don't have an updated grub menu entry at all due to this bug.
Recommending blocker under the same logic in comment 39 that made this bug a blocker back then: "The installed system must be able to download and install updates with the default console package manager" the update is not completely installed, and can also have security implications since the newer kernel isn't being booted.
What's new since the previous blocker was reverted:
1. Anaconda chose to fix bug 1200539 (a variant on this bug that affects live installs being unbootable) by running grub3-mkconfig twice, rather than what they've done in the past which is to prevent the user from putting /boot on Btrfs (comment 58).
2. Gene Czarcinski has supplied tested patches (tested by him, AdamW, and me at least), and last we hear is that Peter is working on this for Fedora 22, that is in comment Nov 3.
Basically jones needs to merge Gene's patches into grubby.
The alternatives are: back to David's comment 58 patch, ergo regress back to Fedora 19/20 state where /boot on Btrfs isn't allowed; or do nothing and accept this bug as-is indefinitely.
Beta criteria includes Btrfs support without exceptions for /boot; and it says the default graphical package manager must install updates, which as comment 39 correctly argued, doesn't happen.
However, a further detail. Using the default graphical package manager, this error is silent. The user only sees "grubby fatal error" with CLI updates.
I think any "not a blocker" suggestions require some explanation; or a challenge (proposed revision) to the release criteria.
So the executive summary is that anaconda now allows /boot on btrfs again, but grubby was never fixed for its known bugs in this case. OK.
+1 blocker: we should either fix grubby or disallow /boot-on-btrfs once again.
Discussed at 2015-03-30 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-30/f22-blocker-review.2015-03-30-16.04.log.txt . Accepted as a blocker as a violation of criterion "The installed system must be able to download and install updates with the default console package manager."
As before, we'll note that disallowing /boot-on-btrfs is a valid way to make this bug no longer a blocker.
*** Bug 1207390 has been marked as a duplicate of this bug. ***
https://github.com/rhinstaller/anaconda/pull/63/files is an anaconda PR which disables /boot-on-btrfs again. That would 'address' this blocker (without resolving the bug).
Being btrfs the proposed default filesystem for F23, wouldn't be wiser to, if this bug can't be fixed, just forbid the /boot to sit as a subvolume of a btrfs tree?
This bug isn't an issue if we create a btrfs filesystem for / and /home, and another btrfs filesystem just for /boot, is it?
This would kind of break the idea of a single snapshot containing a entirely functioning system, but wouldn't prevent the use o snapshots and other btrfs features on /boot.
Just a thought.
I've verified that this is 'addressed' in https://admin.fedoraproject.org/updates/libblockdev-0.9-1.fc22,python-blivet-1.0.7-1.fc22,anaconda-22.20.9-1.fc22 , by /boot-on-btrfs being disallowed.
Should then existing F21 users of /boot on btrfs subvolume change the partition layout in order to perform an upgrade to F22?
AFAIK this bug is no different from F21, grubby is in the same state in both. So if you've got things working in F21 somehow, then you should be fine with F22.
That's the issue: I can't update to F22 because grubby isn't able to write the fedup entry on grub (also, kernel updates aren't possible either).
Yes yes, I know how to do manually, but there should be an official set of steps to transition from a /boot btrfs subvolume to a standalone non-btrfs partition, shouldn't it?
Meaning: this HOWTO should be in the F22 release notes.
(In reply to Bernardo Donadio from comment #92)
> That's the issue: I can't update to F22 because grubby isn't able to write
> the fedup entry on grub (also, kernel updates aren't possible either).
Yes, see comment 78. The behavior you describe is also blocker.
(In reply to email@example.com from comment #89)
> I've verified that this is 'addressed' in
> 1.0.7-1.fc22,anaconda-22.20.9-1.fc22 , by /boot-on-btrfs being disallowed.
That only addresses new Fedora 22 installs. It doesn't address Fedora 21 updates or upgrades, so that cat's already out of the bag, the only way to actually address this for all three failure types is to fix grubby. And as we know there are tested patches avaiable that have just not been merged for reasons unknown.
Fixing grubby in F21 is not an F22 release blocker.
"The installed system must be able to download and install updates with the default console package manager."
With this bug, a system with a valid F21 configuration is unable to update to F22 trough normal means.
Should the update process be broken, forcing a very disruptive change (as any repartition is) or a complete reinstall onto systems to be updated?
IMHO, this is a blocker per the definition above.
The more appropriate criterion is the one for upgrades "must be possible to successfully complete an upgrade". This has been used to block version n in order to compel fixes in n-1 components necessary to complete the upgrade. This just happened with bug 1185604.
We do that when upgrade is completely broken, not conditional failures. Upgrade's never been 100%, and we've never considered it as such.
21 Final went out with this broken, no-one died. 22 Beta is going to go out with it less broken. I don't see the value in blocking the Beta release for this.
I don't see the value in holding up the merging of a tested fix without comment, and no alternative mechanism to achieve that goal has been presented.
So I checked with pjones, and the reason he's not taking the patches is because he thinks they are a bad approach. The correct way to divine information about the filesystem is not to parse path names, but to ask the filesystem. (I suspect the bit of the patch that's *explicitly labelled as being a big hack* might also be a problem.)
Created attachment 1013654 [details]
Suggestion for grubby (based on 031447907d40441e7c8778bb4b9feb496659a632)
Previous suggestions included changing new-kernel-pkg or calling grub2-mkconfig directly. I have taken a look at the grubby code. I'm not too familiar with grubby, I hope I'm not repeating something.
One affected system is a default Fedora 21 installation (except that btrfs was selected), so the root filesystem is on a subvolume called "root". I get the same errors: "access to /root/boot/vmlinuz... failed" / "unable to find a suitable template"
So grubby is checking if the old image exists, which is a path starting with the volume name (/root/...) rather than a volume-relative path because grub sees the whole btrfs array, in which each volume is a top-level directory.
So I have written a wrapper function for access() that tries to find out if the image path is on a subvolume if it's not directly accessible. It cuts the path into pieces, checks if the first piece is a subvolume and if the second piece exists and returns true if both conditions are met. In the case of "/root/boot/vmlinuz", it cuts out "root", checks if / is a subvolume and if /boot/vmlinuz exists. On my affected system, both conditions are met, so the old image has been found and grubby does not abort. Also, it remembers (yes, it's ugly, but it's a hack) that volume name "root", which is later prepended to the kernel path. I haven't checked other paths like that of initramfs, but with those modifications, the error message is gone and the full image path is written to the condig file.
This is by no means a complete patch, I just wanted to see if I could easily get rid of the error (and I did not want to call /usr/sbin/btrfs and parse output). For example, I don't check if the path, which might be a btrfs subvolume, is a btrfs filesystem at all (that might be necessary). Remembering the root volume when checking an image path is bad. And I'm sure there are more parts of the program that have to be changed as well.
Anyway, I'm putting my test diff here, hoping it will be helpful for solving this bug.
Build call to get more debug output:
./grubby --debug --grub2 -c /boot/grub2/grub.cfg --add-kernel=/boot/vmlinuz-3.19.1-201.fc21.x86_64 --copy-default --make-default --title "Fedora ($(date +%T)) 21 (Twenty One)" --args=root=UUID=8cbc4d66-0b24-4e2d-835f-6145446e82da --remove-kernel="TITLE=Fedora (3.19.1-201.fc21.x86_64) 21 (Twenty One)"
Debug output if image found on subvolume:
image /root/boot/vmlinuz-3.19.1-201.fc21.x86_64 exists in subvolume / (root)
Discussed at today's blocker review meeting .
we agreed on the meeting that disabling /boot on btrfs is enough to address this blocker. We are just waiting on confirmation that it is disabled on RC1/RC2.
Confirmed that in RC1 /boot is not allowed to be on a btrfs subvol. Per the decision of Ze Committee, dropping blocker status.
I get that this was not a blocker for F22, but it's quite aggravating that after all this time, we *STILL* can't have /boot as a btrfs subvolume. openSUSE is able to pull it off, and has been doing so for over a year and a half.
How are we going to better test out btrfs in Fedora and get the ball rolling on integrating it better if we can't even do stuff like that?
I'm not saying it wouldn't be handy, but is it really that important? You can test btrfs just fine by using it for your actual data, is there any benefit to using it for the boot partition?
Yes, actually. Being able to snapshot /boot (whether it's a subvolume or not) is useful if you want to be able to actually do full rollback capabilities. Without being able to boot to a Btrfs volume, it's not possible to do something like SUSE's "boot to older snapshot in read-only mode" feature, which comes in quite handy when things get accidentally torched beyond normal levels of screwed up.
I use full system snapshot/rollback, so it is useful. In order to achieve the setup I wanted I made it from scratch.
Neal, In order to get something close to SUSE you might want to follow the steps I did (for Fedora 22):
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.
More information and reason for this action is here:
I'm also affected on Fedora 23.
I have been using the following partition setup on Ubuntu for a number of years:
- /dev/sda1: VFAT EFI partition.
- /dev/sda4: BTRFS Linux Boot partition.
- /dev/sda5: LUKS Encrypted Disk
Inside /dev/sda5 is an LVM volume group with with two logical volumes:
- /dev/mapper/vg-root: A BTRFS partition.
- /dev/mapper/vg-swap: A swap partition.
I have two subvolumes on /dev/mapper/vg-root:
- @: my root subvolume, which will be mounted at /.
- @home: my home subvolume, which will be mounted at /home.
I'm not able to use the BTRFS filesystem at /dev/mapper/vg-root, subvolume @, as a root partition, it will not accept it.
Also, I'm using rEFInd as a boot manager, not GRUB, so I think that there should be a way to install without a boot loader and let it pass.
(In reply to Neal Gompa from comment #105)
> Yes, actually. Being able to snapshot /boot (whether it's a subvolume or
> not) is useful if you want to be able to actually do full rollback
> capabilities. Without being able to boot to a Btrfs volume, it's not
> possible to do something like SUSE's "boot to older snapshot in read-only
> mode" feature, which comes in quite handy when things get accidentally
> torched beyond normal levels of screwed up.
agree with this! It really defeats the potential virtues of BTRFS to have to maintain a separate restoration process for /boot when an update set that includes a new kernel hoses your system.
Issue still exists 5 years later (testing f26 alpha).
I also use rEFInd to boot. If this issue is grub related, then an option to install without a bootloader (and allow /boot on btrfs) might be a nice workaround.
You already can install without a bootloader (it's on the 'advanced' screen you can find from a text link on INSTALLATION DESTINATION, IIRC) but I dunno if it disables the /boot-on-btrfs-subvol check.
Indeed, it doesn't.
Consider fixing that issue?
a. Bug is not GRUB related. It's strictly a grubby bug. As I understand it, grubby doesn't parse the file system from the top level of the file system, rather it goes looking based on what's mounted; and what's mounted on Fedora systems is not the top level of the file system but rather each subvolume (boot or root in this case), so it never sees the top level, and can't find the true path to the kernel and initramfs.
b. The existence of the bootloader UI for UEFI computers is nonsensical; the bootloader is installed in any case, all unticking that option will do is prevent grub.cfg from being created, and an NVRAM entry from being set.
c. I think the location to get this fixed is the upstream bug, there's no activity happening on this here. The upstream bug has current debug info.
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.
Is it likely this bug is going to exist for another 5 years?
I'd really love to run F26, but the installer needs to work...
Is the F26 dev cycle locked off now? I guess this didn't make it?
Any chance this might get looked at for F27?