Bug 2118287 - Updating kernel arguments with grubby --update-kernel=ALL results in kernel not booting after kernel upgrade
Summary: Updating kernel arguments with grubby --update-kernel=ALL results in kernel n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: grubby
Version: 36
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2120053 2123827 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-15 11:51 UTC by grumpey0
Modified: 2022-09-23 20:51 UTC (History)
11 users (show)

Fixed In Version: grubby-8.40-67.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-01 09:39:50 UTC
Type: Bug


Attachments (Terms of Use)
Patch (1.45 KB, patch)
2022-08-19 22:46 UTC, grumpey0
no flags Details | Diff

Description grumpey0 2022-08-15 11:51:33 UTC
Description of problem:
Updating kernel arguments with grubby --update-kernel=ALL results in incorrect kernel arguments that no longer boot upon kernel upgrade. 

Version-Release number of selected component (if applicable):
grub2-common.noarch                       1:2.06-47.fc36                    @updates               
grub2-efi-x64.x86_64                      1:2.06-47.fc36                    @updates               
grub2-tools.x86_64                        1:2.06-47.fc36                    @updates               
grub2-tools-minimal.x86_64                1:2.06-47.fc36                    @updates               
grubby.x86_64                             8.40-64.fc36                      @updates

How reproducible:
each kernel update after change is made. 


Steps to Reproduce:
1. Modify kernel arguments with grubby
- sudo grubby --args="preempt=full" --update-kernel=ALL
- check /etc/kernel/cmdline verify ro rootflags=subvol=root is missing
2. Update kernel 
3. Verify kernel no longer has ro rootflags=subvol=root in arguments
- sudo grubby --info=ALL

The same happens for --remove-args=

Actual results:
Update kernel does not have args "ro rootflags=subvol=root" and will not boot. 

Expected results:
kernel arguments are maintained 

Additional info:
running grub2-mkconfig /boot/grub2/grub.cfg appears to correct boot entries. 
If /etc/kernel/cmdline is removed, it appears to regenerate it with double entries for rhgb quiet: ro rootflags=subvol=root rhgb quiet preempt=full  rhgb quiet

Comment 1 Robbie Harwood 2022-08-16 18:30:39 UTC
Where are these arguments set?

Comment 2 Bojan Smojver 2022-08-17 02:29:50 UTC
On F36 AMI, for example, this is in /boot/loader/entries/*.conf file. I can also see it in /boot/efi/EFI/fedora/grub.cfg.rpmsave file. Not in /etc/default/grub. Not seeing it anywhere else either.

Comment 3 Fedora Update System 2022-08-17 17:28:46 UTC
FEDORA-2022-a3480ad0d3 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-a3480ad0d3

Comment 4 Fedora Update System 2022-08-18 02:55:43 UTC
FEDORA-2022-a3480ad0d3 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-a3480ad0d3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-a3480ad0d3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Fedora Update System 2022-08-19 01:12:38 UTC
FEDORA-2022-a3480ad0d3 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-a3480ad0d3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-a3480ad0d3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 6 grumpey0 2022-08-19 03:27:39 UTC
Installed versions: 
grub2-common.noarch 1:2.06-52.fc36
grub2-efi-ia32.x86_64 1:2.06-52.fc36
grub2-efi-ia32-cdboot.x86_64 1:2.06-52.fc36
grub2-efi-x64.x86_64 1:2.06-52.fc36
grub2-efi-x64-cdboot.x86_64 1:2.06-52.fc36
grub2-pc.x86_64 1:2.06-52.fc36
grub2-pc-modules.noarch 1:2.06-52.fc36
grub2-tools.x86_64 1:2.06-52.fc36
grub2-tools-efi.x86_64 1:2.06-52.fc36
grub2-tools-extra.x86_64 1:2.06-52.fc36
grub2-tools-minimal.x86_64 1:2.06-52.fc36
grubby.x86_64 8.40-66.fc36

---- 
- This only occurs if /etc/kernel/cmdline exists, sorry for not catching that earlier. 
On a fresh F36 install. 
- sudo grub2-mkconfig -o /boot/grub2/grub.cfg
* This creates /etc/kernel/cmdline, it is correct at this point. 
cat /etc/kernel/cmdline 
root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 ro rootflags=subvol=root rhgb quiet

- sudo grubby --args="preempt=full" --update-kernel=ALL
cat /etc/kernel/cmdline 
root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 rhgb quiet preempt=full

After the next kernel update: (options from /boot/loader/entries)
801a1bb85314d45879eb5a35e8ea189-0-rescue.conf:options root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 ro rootflags=subvol=root rhgb quiet preempt=full
4801a1bb85314d45879eb5a35e8ea189-5.17.5-300.fc36.x86_64.conf:options root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 ro rootflags=subvol=root rhgb quiet preempt=full
4801a1bb85314d45879eb5a35e8ea189-5.18.17-200.fc36.x86_64.conf:options root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 ro rootflags=subvol=root rhgb quiet preempt=full
4801a1bb85314d45879eb5a35e8ea189-5.18.18-200.fc36.x86_64.conf:options root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 rhgb quiet preempt=full

Comment 7 Bojan Smojver 2022-08-19 07:18:06 UTC
Yeah, I bumped into a very similar thing as described in comment #6, with the latest update. Essentially, reinstalling kernels on ARM64 F36 AMI left me with the broken VM.

Comment 8 grumpey0 2022-08-19 15:00:30 UTC
This appears to be start with 507: sed -i "s/\(root=[^ ]*\) .*/\1 ${opts}/" /etc/kernel/cmdline

It matches the root=UUID=e1b2eceb-4054-4380-8da4-a9a8993a8db5 but not ro rootflags=subvol=root

Comment 9 grumpey0 2022-08-19 22:46:47 UTC
Created attachment 1906607 [details]
Patch

This appears to resolve the issue but probably needs more testing.

Comment 10 Bojan Smojver 2022-08-22 02:35:05 UTC
Tested the patch on F36:

- patched grubby-bls
- removed /etc/kernel/cmdline
- ran grubby with --args and with --remove-args
- /etc/kernel/cmdline still contained correct command line (i.e. both ro and rootflags were there)
- reinstalled Rawhide kernel from koji, boot entry still had both ro and rootflags

The patch looks good to me.

Comment 11 Benjamin Herrenschmidt 2022-08-22 07:19:39 UTC
I'm not fan of this "fix", it perpetuates the mess we are in where the is no canonical location for the command line and we are in an extremely fragile setup where you rely on the existing entries to be able to build new ones (or whatever is in /proc/cmdline), which is all very wrong IMHO and will break in different, new and unexpected ways.

We used to live in a world where /etc/default/grub had the "canonical" version of the command line, and grub2-mkconfig would "splat"
over anything that was done with grubby. Grubby itself was always "wrong" in so many ways though but at least grub2-mkconfig was always
the safe bet.

Now we have introduced /etc/kernel/cmdline. Either we make it the new "canonical" location, in which case we need to stop consuming
the command line from /etc/default/grub completely, or we make it a subsidiary and keep it in sync. All good.

But either way, it doesn't make sense if what it contains (and by extension /etc/default/grub which we use to create it) doesn't
contain everything necessary to successfully boot a kernel.

BLS kind of changes this as grub2-mkconfig will not re-generate the BLS, but it does hack their command line. It's gross, but here
too, we are trying to keep things such that user expectations as to where the source of truth resides don't change.

Comment 12 Robbie Harwood 2022-08-22 21:53:37 UTC
*** Bug 2120053 has been marked as a duplicate of this bug. ***

Comment 13 Robbie Harwood 2022-08-22 22:13:06 UTC
@bojan thanks for the suggestion - I think it looks good too.  Was trying to be too clever :)  Please retest.

@benh.org As you've pointed out elsewhere, the /etc/default/grub workflow is very entrenched - I doubt people will be willing to walk away from it anytime soon.  I'm not sure what you mean by "it doesn't make sense if what it contains (and by extension /etc/default/grub which we use to create it) doesn't contain everything necessary to successfully boot a kernel" - maybe you have it backwards?  The contents of /etc/default/grub are insufficient (missing e.g. root device), while /etc/kernel/cmdline matches what's in BLS, which is closer to what eventually shows up as the cmdline (and the BLS entry is itself basically sufficient).

Comment 14 Benjamin Herrenschmidt 2022-08-22 23:47:24 UTC
Yes /etc/default/grub lacks the root device which grub-mkconfig adds by running some device discovery. Then grub tries to dupdate the command lines in the BLS, which is fragile as heck...

It means we still have no canonical source of truth.

I would love to make /etc/kernel/cmdline the canonical one instead and ensure it contains everything needed, though the question remains, what generates it, especially where is the root=<....> coming from and where are the rootflags coming from ? If they are established by anaconda and kept forever, then fine, but traditionally grub-mkconfig was capable of figuring out the necessary root= arguments. This got broken with btrfs, maybe that's what need to be fixed ?

Otherwise, if we move to making /etc/kernel/cmdline the "real thing"  then we should probably deprecate /etc/default/grub  command line, putting some fat comment in there or a macro that makes it source /etc/kernel/cmdline

Otherwise we are back with a mix match of stuff coming from different places that may or may not result in functional BLS entries and definitely a whole lot of confusion

Comment 15 Benjamin Herrenschmidt 2022-08-22 23:52:36 UTC
To be more specific, I think the real bug here is that the grub-probe dance done by grub2-mkconfig, which generates "root=" need to be extended to do the right thing for btrfs with subvol at least, and possibly "ro" as well though I think that's more policy and probably should have been in /etc/default/grub

Comment 16 Fedora Update System 2022-08-23 01:15:58 UTC
FEDORA-2022-a3480ad0d3 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-a3480ad0d3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-a3480ad0d3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2022-09-01 09:39:50 UTC
FEDORA-2022-a3480ad0d3 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 18 Robbie Harwood 2022-09-07 17:40:05 UTC
*** Bug 2123827 has been marked as a duplicate of this bug. ***

Comment 19 Chris Murphy 2022-09-23 20:51:17 UTC
>If they are established by anaconda and kept forever, then fine, but traditionally grub-mkconfig was capable of figuring out the necessary root= arguments. This got broken with btrfs, maybe that's what need to be fixed ?

No. Both grub-mkconfig (upstream) and grub2-mkconfig (Fedora) can figure out both root= and rootflags=subvol arguments, for btrfs, since a long time.

Before BLS, grub2-mkconfig never touched the grub.cfg of other distros on the system. Recently, grub2-mkconfig not only wipes the current distro's BLS snippet's kernel command line, but also steps on all BLS snippets making it fundamentally incompatible with the Boot Loader Spec. And hence bug 2120845.

In a multiboot or multiroot context, grub2-mkconfig, grubby, and kernel-install are all incapable of determining the root file system. And it's not worth extending it because tracking snapshots and rollbacks is not something the bootloader should be doing anyway. So we need a different tool to handle the BLS snippets separate from the tool that creates the GRUB specific configuration - i.e. splitting out menu entry creation or modification from grub2-mkconfig, because it's not fail safe, but rather fail danger. i.e. it *will* cause system-level data loss for multiboot or multiroot workflows, and render the system unbootable. We've basically cornered ourselves, we can't eat our cake and have it too. We either have to remove BLS modification from grub2-mkconfig, or we have to remove grub2-mkconfig - and find another way to create the static grub.cfg in a harmless way.


Note You need to log in before you can comment on or make changes to this bug.