I've just done a heap of updates from F29 -> F30 as Xen DomU guests. If GRUB_ENABLE_BLSCFG=true is set in /etc/default/grub, then none of the systems can boot after the upgrade. Setting GRUB_ENABLE_BLSCFG=false restores the boot menu - however GRUB_DEFAULT=0 does not seem to correctly apply when generating the grub.cfg. If there are two boot entries - 1) the kernel, 2) the rescue image, even with GRUB_DEFAULT=0, the second entry (the rescue image) becomes the default boot target. Current F30 grub packages installed: # rpm -qa | grep grub | sort grub2-common-2.02-78.fc30.noarch grub2-pc-2.02-78.fc30.x86_64 grub2-pc-modules-2.02-78.fc30.noarch grub2-tools-2.02-78.fc30.x86_64 grub2-tools-efi-2.02-78.fc30.x86_64 grub2-tools-extra-2.02-78.fc30.x86_64 grub2-tools-minimal-2.02-78.fc30.x86_64 grubby-8.40-30.fc30.x86_64
Sorry, I can't edit the comment - but forgot one part: After setting GRUB_ENABLE_BLSCFG=false, you then need to run grub2-mkconfig -o /boot/grub2/grub.cfg. After doing that and rebooting, the default entry (GRUB_DEFAULT=0) is not the first entry in the list.
this is not limited to Xen, looks like GRUB_ENABLE_BLSCFG=true causes issues on real hardware systems also... see my comment here https://bugzilla.redhat.com/show_bug.cgi?id=1652806#c64
Peter: this bug was specifically filed to be about the Xen case, not about any others. BLS is a big change in F30, it is entirely possible for there to multiple different bugs in it (in fact there have been at least a dozen different ones so far). Just because you both have a case where the system fails to boot and it seems to be BLS-related does not mean you are hitting the same bug. Please either follow up on https://bugzilla.redhat.com/show_bug.cgi?id=1652806 or file a new bug, but unless Javier determines that you and Steven are actually hitting the same problem, let's not assume you are...
Thinking about this further - and noticing it being referenced on xen-devel mailing list, I would like to suggest the following - which may have been overlooked right now... If the grub %post scripting checked to see if it was installing / upgrading in a Xen DomU, it could set 'GRUB_ENABLE_BLSCFG=false' in /etc/default/grub automatically. This would fix both new installs and upgrades. The final fix would be figuring out why pygrub currently boots the *second* entry in the resulting grub.cfg - unlike how F29 worked. This may be either a fix on the grub2-mkconfig or pygrub side - I'm not quite sure yet. This would likely restore functionality completely. At least until something else more suitable is done?
For what its worth, newer kernels still don't appear in the grub menu. I'm required to run 'grub2-mkconfig -o /boot/grub2/grub.cfg' manually every time a new kernel is installed. I have tried with the grubby-depreciated package installed also with no resolution.
(In reply to Steven Haigh from comment #5) > For what its worth, newer kernels still don't appear in the grub menu. > > I'm required to run 'grub2-mkconfig -o /boot/grub2/grub.cfg' manually every > time a new kernel is installed. > > I have tried with the grubby-depreciated package installed also with no > resolution. And did you disabled BLS (GRUB_ENABLE_BLSCFG=false in /etc/default/grub and re-generate your grub.cfg with grub2-mkconfig) when installing the grubby-deprecated package?
Yes. I set: GRUB_ENABLE_BLSCFG=false I've had to recover many VMs that fail to boot because of a new kernel install - but after finding an old kernel that is still present on the disk, a manual run of grub2-mkconfig causes things to be fine again. Until the next kernel update.
As further reference, even an upgrade to kernel 5.1.18 across the board has had me run grub2-mkconfig on several machines that fail to boot. Most were last booted with 5.1.16 - which still appeared in the menu - but either didn't have 5.1.18, or the boot failed. Booting into 5.1.16, running grub2-mkconfig and then rebooting allows a successful boot into kernel 5.1.18. Any suggestions would be good, as this is a royal pain in the butt.
# rpm -qa | grep grub | sort grub2-common-2.02-81.fc30.noarch grub2-pc-2.02-81.fc30.x86_64 grub2-pc-modules-2.02-81.fc30.noarch grub2-tools-2.02-81.fc30.x86_64 grub2-tools-efi-2.02-81.fc30.x86_64 grub2-tools-extra-2.02-81.fc30.x86_64 grub2-tools-minimal-2.02-81.fc30.x86_64 grubby-8.40-31.fc30.x86_64 grubby-deprecated-8.40-31.fc30.x86_64 # cat /etc/default/grub GRUB_TIMEOUT=1 GRUB_DEFAULT=0 GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0" GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=false
(In reply to Steven Haigh from comment #8) > As further reference, even an upgrade to kernel 5.1.18 across the board has > had me run grub2-mkconfig on several machines that fail to boot. > > Most were last booted with 5.1.16 - which still appeared in the menu - but > either didn't have 5.1.18, or the boot failed. > > Booting into 5.1.16, running grub2-mkconfig and then rebooting allows a > successful boot into kernel 5.1.18. > > Any suggestions would be good, as this is a royal pain in the butt. Do you have the grubby-deprecated package installed?
> Do you have the grubby-deprecated package installed? Affirm. I added config + installed packages in Comment #9.
(In reply to Steven Haigh from comment #11) > > Do you have the grubby-deprecated package installed? > > Affirm. I added config + installed packages in Comment #9. I see. Then new entries should be added to your grub.cfg by the old grubby tool (that's installed by the grubby-deprecated package). Can you please share your grub.cfg ?
Does this need to be from a machine that is 'faulty' or after running grub2-mkconfig?
Created attachment 1601869 [details] grub.cfg which was not updated with new kernel packages.
Created attachment 1601870 [details] grub.cfg after booting into the only working kernel and running grub2-mkconfig -o /boot/grub/grub.cfg
Looking into this further as its still an issue.... I have removed everything to do with grubby, as looking at the kernel scripts, we run: posttrans scriptlet (using /bin/sh): /bin/kernel-install add 5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz || exit $? Running kernel-install manually with the verbose step: # /bin/kernel-install --verbose add 5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/00-entry-directory.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/20-grub.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/20-grubby.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/50-depmod.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz Running depmod -a 5.2.14-200.fc30.x86_64 +/usr/lib/kernel/install.d/50-dracut.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/51-dracut-rescue.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/90-loaderentry.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz +/usr/lib/kernel/install.d/99-grub-mkconfig.install add 5.2.14-200.fc30.x86_64 /boot/ebdacf59978342fdb2b3d376662bb059/5.2.14-200.fc30.x86_64 /lib/modules/5.2.14-200.fc30.x86_64/vmlinuz The resulting grub entry has the following: menuentry 'Fedora (5.2.14-200.fc30.x86_64) 30 (Thirty)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.2.13-200.fc30.x86_64-advanced-a22c8698-28c0-44c9-87d7-58d7c88c0ea2' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root a22c8698-28c0-44c9-87d7-58d7c88c0ea2 else search --no-floppy --fs-uuid --set=root a22c8698-28c0-44c9-87d7-58d7c88c0ea2 fi linux //boot/vmlinuz-5.2.14-200.fc30.x86_64 root=UUID=a22c8698-28c0-44c9-87d7-58d7c88c0ea2 ro audit=0 selinux=0 console=hvc0 xen_blkfront.max_indirect_segments=128 LANG=en_AU.UTF-8 initrd //boot/initramfs-5.2.14-200.fc30.x86_64.img } A manual run of grub2-mkconfig -o /boot/grub2/grub.cfg results in the following: menuentry 'Fedora (5.2.14-200.fc30.x86_64) 30 (Thirty)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.2.14-200.fc30.x86_64-advanced-a22c8698-28c0-44c9-87d7-58d7c88c0ea2' { load_video set gfxpayload=keep insmod gzio insmod part_msdos insmod ext2 if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root a22c8698-28c0-44c9-87d7-58d7c88c0ea2 else search --no-floppy --fs-uuid --set=root a22c8698-28c0-44c9-87d7-58d7c88c0ea2 fi linux /boot/vmlinuz-5.2.14-200.fc30.x86_64 root=UUID=a22c8698-28c0-44c9-87d7-58d7c88c0ea2 ro audit=0 selinux=0 console=hvc0 xen_blkfront.max_indirect_segments=128 initrd /boot/initramfs-5.2.14-200.fc30.x86_64.img }
It seems this bit of code in the kills the running of grub2-mkconfig: # Is only needed for ppc64* since we can't assume a BLS capable bootloader there if [[ $ARCH != "ppc64" && $ARCH != "ppc64le" ]]; then exit 0 fi As such - grub2-mkconfig never gets run.
Sorry, the above code is in the file: /usr/lib/kernel/install.d/99-grub-mkconfig.install Without running grub2-mkconfig, the menu entry is *almost* correct, but the // at the start causes parsing of the grub.cfg file to fail - as the file //boot/vmlinuz-5.2.14-200.fc30.x86_64 is not found. Running grub2-mkconfig fixes this. Up for debate is if this should be fixed in the earlier scripts, or just let grub2-mkconfig fix the bug...
For avoidance of doubt, here are the currently installed packages: # rpm -qa | grep grub | sort grub2-common-2.02-81.fc30.noarch grub2-pc-2.02-81.fc30.x86_64 grub2-pc-modules-2.02-81.fc30.noarch grub2-tools-2.02-81.fc30.x86_64 grub2-tools-efi-2.02-81.fc30.x86_64 grub2-tools-extra-2.02-81.fc30.x86_64 grub2-tools-minimal-2.02-81.fc30.x86_64
I'm super confused by this bug because clearly the leading '//' is going to confuse anything, whether GRUB or pygrub. And that leading extra '/' as well as the trailing 'LANG=en_AU.UTF-8' strikes me as very much like the old real grubby being involved. But you're saying all traces of grubby are gone? So...maybe there's some post install script in the kernel package now editing grub.cfg's? I find that hard to believe. Peter, Javier, what do you think about changing the GRUB_ENABLE_BLSCFG=false to just run 'grub2-mkconfig' making it more like upstream and other distros? That paradigm has never cared about the historic value in grub.cfg, it's always been about obliterating it in favor of the current truth - whatever that is. I don't really see how maintainable this is otherwise. And also it's decently likely, in the near term at least, that pygrub is going to learn how to parse grub.cfg+grubenv+bls snippets, and the sane legacy approach is to just use grub2-mkconfig after every kernel update.
There is still the file: /usr/lib/kernel/install.d/20-grubby.install That comes from: systemd-udev-241-12.git1e19bcd.fc30.x86_64 : Rule-based device node and kernel event manager Repo : updates Matched from: Filename : /usr/lib/kernel/install.d/20-grubby.install Looking at the code in that file however, I'm not exactly sure how that could result in what we're seeing. Yes, a fix would be to just run grub2-mkconfig and overwrite any previous entries generated by any other script. I would be happy with this.
Proposed as a Blocker for 31-final by Fedora user crcinau using the blocker tracking app because: The grub.cfg generated for kernels includes a double / on the initrd / vmlinuz lines which causes the entry to be unbootable. Hopefully a quick fix - but depends on further investigation / fixes.
After testing with downgrading the kernel in F31, I ended up with the following differences between grub.cfg and the installed packages: menuentry 'Fedora (5.3.1-300.fc31.x86_64) 31 (Thirty One)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.3.1-300.fc31.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' { menuentry 'Fedora (5.2.14-200.fc30.x86_64) 31 (Thirty One)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.2.14-200.fc30.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' { # rpm -qa | grep kernel | sort kernel-5.3.0-1.fc31.x86_64 kernel-5.3.1-300.fc31.x86_64 kernel-core-5.3.0-1.fc31.x86_64 kernel-core-5.3.1-300.fc31.x86_64 kernel-headers-5.3.1-100.fc31.x86_64 kernel-modules-5.3.0-1.fc31.x86_64 kernel-modules-5.3.1-300.fc31.x86_64 Fixing via: # grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-5.3.1-300.fc31.x86_64 Found initrd image: /boot/initramfs-5.3.1-300.fc31.x86_64.img Found linux image: /boot/vmlinuz-5.3.0-1.fc31.x86_64 Found initrd image: /boot/initramfs-5.3.0-1.fc31.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-46e72612de204d5d8d6a9fe68e255ba3 Found initrd image: /boot/initramfs-0-rescue-46e72612de204d5d8d6a9fe68e255ba3.img done Generated entries are now correct: menuentry 'Fedora (5.3.1-300.fc31.x86_64) 31 (Thirty One)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.3.1-300.fc31.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' { menuentry 'Fedora (5.3.0-1.fc31.x86_64) 31 (Thirty One)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-5.3.0-1.fc31.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' {
For the sake of the review, this issue seems to break in one of two ways: 1) The grub.cfg file is not update at all (as per comment 23); or 2) The entries for initrd / kernel lines have // at the start of the path (as per comment 20). I have been unable to replicate which specific actions trigger which specific method of failure. In all cases, running 'grub2-mkconfig -o /boot/grub2/grub.cfg' will fix the problem. In this configuration, the file /etc/default/grub contains similar to: GRUB_TIMEOUT=1 GRUB_DEFAULT=0 GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0" GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=false There are no grubby packages installed.
Discussed during the 2019-09-30 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as the details of the issue here are not yet entirely clear so it's hard to decide if it's a blocker, also we're not sure yet if we intend to complete the criterion change to block only on ec2 rather than all xen guest functionality. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-09-30/f31-blocker-review.2019-09-30-16.00.txt
Basic criterion "The installed system must be able appropriately to install, remove, and update software with the default console tool for the relevant software type (e.g. default console package manager). This includes downloading of packages to be installed/updated." There's a note about "New kernels not default (and similar cases)" Final criterion "The release must boot successfully as Xen DomU with releases providing a functional, supported Xen Dom0 and widely used cloud providers utilizing Xen." 1. It does boot successfully on installation 2. kernel (critical path package) update fails to update the bootloader configuration: whether this is done by post-install script or by grubby-deprecated, I don't know, but I also don't think it matters per criterion, the resulting system fails to boot again 3. upgrades and updates are supposed to work 4. the latest kernel installed is expected to be used by default I'm +1 final blocker. Any contra-arguments?
The way we deploy any Fedora edition/spin/remix, since version 30, is without "real" grubby (a.k.a. grubby-deprecated). That means out of the box, kernel updates on Xen DomU's has not been possible (they're incomplete and the system is left unbootable). It requires rather substantial post-install work before the first kernel update: a. remove grubby, install grubby-deprecated b. edit /etc/default/grub such that GRUB_ENABLE_BLSCFG=false c. grub2-mkconfig -o /boot/grub2/grub.cfg to apply the change in b. That arguably constitutes a separate bug, and I think it's also a blocker per the release criteria. But it wasn't caught during the Fedora 30 development process that Xen DomU's would break as a part of the BLS by default feature. Ergo, even if *this* bug is fixed via a grubby-deprecated update, it doesn't solve the out of the box problem above. And I'm not sure what to do about that in the given time frame.
While I've gone as far as I can think of in troubleshooting this, I'll share my current workaround - as this applies to both F30 and F31 in its current states. In the installation kickstart, I run the following: ----------------------------- sed -i 's/GRUB_ENABLE_BLSCFG=true/GRUB_ENABLE_BLSCFG=false' /etc/default/grub grub2-mkconfig -o /boot/grub2/grub.cfg ## Hot patch for screwed up grub config scripts... cat << 'EOF' > /usr/lib/kernel/install.d/99-xx-force-grub2-mkconfig.install #!/bin/bash [[ -f /etc/default/grub ]] && . /etc/default/grub COMMAND="$1" case "$COMMAND" in add|remove) grub2-mkconfig --no-grubenv-update -o /boot/grub2/grub.cfg >& /dev/null ;; *) ;; esac EOF chmod +x /usr/lib/kernel/install.d/99-xx-force-grub2-mkconfig.install ----------------------------- I do not have any grubby* packages installed - and I didn't see them assist or make the problem worse in any way. In the last week of testing, this seems to have worked properly - but I need a few more kernel updates to be pushed out to test if this is a complete workaround. I don't pretend to think that this is a solution - as it could well have other effects in other environments... I did think that it may be useful to use 'virt-what' in the installer - and then disable BLS if a Xen DomU is detected... but then there's more edge cases that might be caused by this - ie what if you use HVM (which works with BLS) but switch to PV or PVH etc...
For the record, I want to note that this bug was brought up in the discussion about dropping the Xen criterion *back in May*, and Lars Kurth promised to get something done about it then. Full quote: "== On [B1] / grub2-switch-to-blscfg == This issue is about Fedora _domU_ and breaks the release criterion. And looks like, it wasn't tested at all. "blscfg is okay in _dom0_ - it looks like the xen setup still gets put in non-blscfg format, and doesn't seem to matter in HVM _domU_." "The big issue is _domU_ in PV which would need a fair amount of work in pygrub to fix properly, including reading variables from grubenv and extracting details from the loader files. This is really something to be fixed on the Xen side ... I do keep intending to have a look at it myself though I may not get around to it." Instead of fixing pygrub, it would be better, more future proof and easier to "use pvgrub2 instead. To be honest, its very unclear to me why would anyone want to use pygrub, when pvgrub2 exists. pygrub is much more fragile (as it needs to re-implement a parser for 3rd-party configuration format, without stable specification) and less secure - it does that in dom0, including mounting domU controlled disk. That said, the pvgrub2 option also requires some work, because: - Fedora grub2 packages do not include the "xen" target platform - Non-Fedora grub2 package don't have blscfg support - If we'd talk about PVH (which isn't the case here), it requires grub 2.04, which is at RC1 and isn't packaged for Fedora yet" That would be much simpler, if blscfg was upstreamed into grub2 by Fedora community members. Do you know whether the Fedora has plans to do this? In any case, I have taken an action to get this resolved (aka find someone to do the work)." However, it doesn't seem like he did find anyone to "do the work", as the bug is still sitting here.
Discussed during the 2019-10-07 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as we are going to pull the xen folks in on the bug this week and see if any progress can be made before finally deciding what to do next week. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-10-07/f31-blocker-review.2019-10-07-16.02.txt
Hello Chris, (In reply to Chris Murphy from comment #20) > I'm super confused by this bug because clearly the leading '//' is going to > confuse anything, whether GRUB or pygrub. And that leading extra '/' as well > as the trailing 'LANG=en_AU.UTF-8' strikes me as very much like the old real > grubby being involved. But you're saying all traces of grubby are gone? > So...maybe there's some post install script in the kernel package now > editing grub.cfg's? I find that hard to believe. > > Peter, Javier, what do you think about changing the GRUB_ENABLE_BLSCFG=false > to just run 'grub2-mkconfig' making it more like upstream and other distros? > That paradigm has never cared about the historic value in grub.cfg, it's > always been about obliterating it in favor of the current truth - whatever > that is. I don't really see how maintainable this is otherwise. And also > it's decently likely, in the near term at least, that pygrub is going to > learn how to parse grub.cfg+grubenv+bls snippets, and the sane legacy > approach is to just use grub2-mkconfig after every kernel update. Yes, agreed. As Steven mentioned we also force to re-generate a grub.cfg with grub2-mkconfig for ppc64le in /usr/lib/kernel/install.d/99-grub-mkconfig.install (we did that because even when there's BLS support in Petitboot since 1.8.0, we couldn't ensure that all the ppc64le OPAL machines wouldn't have an older Petitboot without BLS support). So I think that a workaround could be what you are proposing, to extend the /usr/lib/kernel/install.d/99-grub-mkconfig.install to not only cover ppc64le machines but also Xen VMs running as DomU guests. And also add what Steven did in Comment 28, to set GRUB_ENABLE_BLSCFG=false before re-generating the grub.cfg with grub2-mkconfig. Steven, is there a way for user-space to check if the machine is running as a DomU guest? I read that this could be achieved by checking if /sys/hypervisor/uuid exists and the UUID is not all zeros. Is that correct?
Looking at the proposal for checking /sys/hypervisor/uuid, I can confirm the following: Xen PVH DomU: 09fc5229-191a-42ae-a3d1-f3bad5ba6836 Xen Domain-0: 00000000-0000-0000-0000-000000000000 Xen HVM DomU: 2ec9fc0b-15e6-4b97-bd0b-8789b3c93234 This is probably bad - as a HVM (fully emulated) host can use BLS - as it loads grub from the boot sector. If we want to look at values in /sys/hypervisor/, I would suggest: 1) Check that /sys/hypervisor/type contains 'xen'; and 2) Check that /sys/hypervisor/guest_type contains 'PVH'. If these two conditions are met, BLS will fail. Extending this logic further, it may be good to also put this conditional logic in the grub logic that enables BLS in the first place. Maybe set GRUB_ENABLE_BLSCFG=false in /etc/default/grub if the above two conditions are also met... I have not been able to test this using pvgrub bootloader with Xen - as Fedora doesn't build grub with xen options to create said bootloader.
Just had a further thought here... When checking /sys/hypervisor/guest_type for the Domain-0, it returns PV. I guess it is also a valid use case for PV - which will also fail under BLS - however the Domain-0 *can* use BLS. Another option may be to use the 'virt-what' command. While this may mean adding a dep on it - it would resolve a few of these issues as: Domain-0: xen xen-dom0 Xen HVM: xen xen-hvm Xen PVH: xen xen-domU .... Or I guess figure out how virt-what knows the difference between the above types and implement similar logic...
I figure I've got a lot of assumed knowledge - so to remove all doubt, I'll clarify the situation: Xen Domain-0: * the Xen host * Can run BLS * /sys/hypervisor/uuid = 00000000-0000-0000-0000-000000000000 * /sys/hypervisor/guest_type = PV Xen PVH Domain: * PVH guest * Cannot run BLS * /sys/hypervisor/uuid = non-zero * /sys/hypervisor/guest_type = PVH Xen HVM Domain: * HVM Guest * Can run BLS * /sys/hypervisor/uuid = non-zero * /sys/hypervisor/guest_type = HVM Xen PV Domain: * PV Guest * Cannot run BLS * /sys/hypervisor/uuid = non-zero * /sys/hypervisor/guest_type = PV Both PVH and PV domains (except Domain-0) are normally used with pygrub as the bootloader - which doesn't support BLS at all (yet - unknown future ETA). I guess a valid scenario would be to check: * If /sys/hypervisor/type == xen and /sys/hypervisor/guest_type == PV* (PV or PVH) and /sys/hypervisior/uuid != 00000000-0000-0000-0000-000000000000 - then run grub2-mkconfig with BLS disabled. This would lead to being able to use BLS on both Xen HVMs or the Xen Domain-0 - which in theory should work fine. It does leave the edge case of somewhat changing the a Xen HVM config to PVH - which is also a valid thing some people do - but I'm not sure how that would be resolved other than just disabling BLS for anything Xen until pygrub catches up (if / when?) and allows BLS booting.
Sounds like /sys/hypervisor/guest_type alone can be relied upon, and if it contains PVH, HVM, PV - then set BLSCFG=false. If this could be done in Anaconda that would be badass and solve the problem entirely just by avoiding it in the first place. That HVM could support BLS, I suggest ignoring in favor of consistency, and avoiding end user confusion why some VMs use BLS and others use traditional grub.cfg. If there's anything else uniquely Xen available in sysfs, it might be useful to check for that too, mostly as just a sanity test. What if what's in /sys/hypervisor/guest_type isn't guaranteed to be unique to xen? But this sort of deconfliction is not my expertise, I'm just throwing it out there.
Just realized I missed this from comment 34: Xen Domain-0 * /sys/hypervisor/guest_type = PV Xen PV Domain: * /sys/hypervisor/guest_type = PV Is there another way to distinguish between them? Otherwise it suggests the Dom0 also needs BLSCFG=false, which is not the end of the world but does cause a fragmentation on baremetal (some use BLS and some don't).
Ahh OK so maybe ignore guest_type, and only check type and UUID. type=xen + UUID=zeros = Dom0 and thus BLS OK type=xen + UUID=nonzero = guest and thus BLS not OK (one type is OK for BLS but ignore it and just step on grub.cfg anyway for consistency).
Have been talking about this matter with cmurf on #fedora-qa and debating options... Currently, it seems that we can deduce the following two scenarios: in /sys/hypervisor: 1) type == xen && uuid == all zeros, then this is BLS safe (the Domain-0). 2) type == xen && uuid != all zeros, then this is BLS *unsafe* (covers PV, HVM and PVH guests). This may be the most sane / simple test to do for the moment...
Posted the question to the xen-devel mailing list to see if we've missed any other combinations / obvious matters. https://lists.xen.org/archives/html/xen-devel/2019-10/msg00697.html
Created attachment 1623769 [details] [PATCH] 99-grub-mkconfig: Disable BLS usage for Xen DomU guests Thanks a lot for the comments Steven and Chris, I've attached a (untested) patch that does what we discussed in this bz and over irc. Please let me know if I'm missing anything. I also did a scratch grub2 build that contains the attached patch for you to test: https://koji.fedoraproject.org/koji/taskinfo?taskID=38168862 If this works correctly for you then I will also backport the patch for F30.
(In reply to Adam Williamson from comment #29) > For the record, I want to note that this bug was brought up in the > discussion about dropping the Xen criterion *back in May*, and Lars Kurth > promised to get something done about it then. Full quote: I thought I had replied earlier in the week, but that didn't come through. I did promise and there has been some progress: however it was not as quick as I hoped. There is a patch posted which should fix the immediate issue through a workaround which will hopefully make it into Xen 4.13 (and if not should be back-ported to 4.13.1). And there is a rough plan in place to change pygrub to support BLS. But all of this would have to be backported to supported versions of Xen. Even if we had a fix we still need to deal with versions of Xen that are out there in the wild. I parked any of the testing related stuff which was discussed as the underlying issue has to be addressed first. We will discuss this bug (and the related stuff) in tomorrow's Xen community call. Regards Lars
Created attachment 1623842 [details] [PATCH] 99-grub-mkconfig: Disable BLS usage for Xen DomU guests After a conversation with the Xen folks it was concluded that the best approach to test if a machine is a Xen Dom0 host or a DomU guest is by checking if /sys/hypervisor/type is set to xen and /proc/xen/capabilities contains the control_d string. I've attached the latest patch that was tested by Steven and did a grub2-2.02-99.fc31 build including this fix.
FEDORA-2019-591c552fba has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-591c552fba
grub2-2.02-99.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-591c552fba
I've noticed an issue here... I can confirm that GRUB_ENABLE_BLSCFG does get set to false in the correct conditions, but the ARCH / Xen check fails - meaning the exit 0 runs and grub2-mkconfig never gets called. Modified script: https://paste.centos.org/view/1124f1ed When running via the CLI, I get: # KERNEL_INSTALL_MACHINE_ID=BLAH ./99-grub-mkconfig.install add hv_type = xen and XEN_DOM0 = Exiting on ARCH / HV_TYPE / XEN_DOM0 check Adding 'set -x' to the top of the script results in: # KERNEL_INSTALL_MACHINE_ID=BLAH ./99-grub-mkconfig.install add + [[ -n BLAH ]] + [[ -e /sys/hypervisor/type ]] + read HV_TYPE + [[ -e /proc/xen/capabilities ]] + [[ xen = \x\e\n ]] + [[ '' != \t\r\u\e ]] + grep -q '^GRUB_ENABLE_BLSCFG="*true"*\s*$' /etc/default/grub ++ uname -m + ARCH=x86_64 + echo 'hv_type = xen and XEN_DOM0 = ' hv_type = xen and XEN_DOM0 = + [[ x86_64 != \p\p\c\6\4 ]] + [[ x86_64 != \p\p\c\6\4\l\e ]] + echo 'Exiting on ARCH / HV_TYPE / XEN_DOM0 check' Exiting on ARCH / HV_TYPE / XEN_DOM0 check + exit 0 As such, we never get to the Xen checks. Hate to return to sender on this one, but seems to be still buggy.
FEDORA-2019-1265db97c0 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-1265db97c0
FEDORA-2019-ad706bc4b9 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-ad706bc4b9
grub2-2.02-100.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1265db97c0
Given we have a fix for this now, I'm at least +1 FE, let's get it in. Other votes?
Yeah, +1 FE
+1 FE
grub2-2.02-83.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ad706bc4b9
That's +4, marking accepted FE.
grub2-2.02-100.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 1679759 has been marked as a duplicate of this bug. ***
grub2-2.02-83.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
pbrobinson caught that we regressed this on F31: we pushed an *older* grub2 (-98) stable for another FE bug, it got pushed over the -100 that fixed this :( It seems the -98 update was never unpushed or obsoleted... mboddu, we need to re-push -100 to fix this.
-100 has been re-tagged and should show up in the next composes, so closing again.
so...I don't know if I'm missing something here, but I ran an install of RC-1.9 in a Xen guest and it won't boot after install, with /var/log/xen/bootloader logs indicating that pygrub is "Unable to find partition containing kernel". Inspecting the image with guestfish, it seems to have been installed with BLS active - there's a populated /boot/loader/entries and /boot/grub/grub2.cfg doesn't contain any 'kernel' or 'initrd' lines. RC-1.9 does have grub2 -100 in it...
grubby-deprecated is not installed, /etc/default/grub says GRUB_ENABLE_BLSCFG=true, and running grub2-mkconfig -o /boot/grub2/grub.cfg still writes a BLS-y config.
So with Javier's anaconda PR: https://github.com/rhinstaller/anaconda/pull/2201 I seem to get a non-BLS installed system - /boot/grub2/grub.cfg contains actual boot entries - but the VM still fails to run with the same error from pygrub logged in a /var/log/xen/bootloader.N.log file: "Unable to find partition containing kernel". Not really sure what's going on there. It'd be good if someone more Xen expert than me could test, both with and without the updates image...
(In reply to Adam Williamson from comment #62) > So with Javier's anaconda PR: > > https://github.com/rhinstaller/anaconda/pull/2201 > > I seem to get a non-BLS installed system - /boot/grub2/grub.cfg contains > actual boot entries - but the VM still fails to run with the same error from > pygrub logged in a /var/log/xen/bootloader.N.log file: "Unable to find > partition containing kernel". Not really sure what's going on there. It'd be > good if someone more Xen expert than me could test, both with and without > the updates image... You could try running pygrub directly, eg /usr/libexec/xen/bin/pygrub --debug /path/to/vm to see if it gives any more information.
That gets me: Traceback (most recent call last): File "/usr/libexec/xen/bin/pygrub", line 902, in <module> fs = xenfsimage.open(file, offset, bootfsoptions) OSError: [Errno 95] Operation not supported Traceback (most recent call last): File "/usr/libexec/xen/bin/pygrub", line 902, in <module> fs = xenfsimage.open(file, offset, bootfsoptions) OSError: [Errno 95] Operation not supported Traceback (most recent call last): File "/usr/libexec/xen/bin/pygrub", line 931, in <module> raise RuntimeError("Unable to find partition containing kernel") RuntimeError: Unable to find partition containing kernel Trying to do `xenfsimage.open('/var/lib/libvirt/images/guest.img')` in a Python shell fails the same way, but I'm not sure why.
Created attachment 1628930 [details] strace of the pygrub attempt Here's strace output for the pygrub attempt.