2243068 – kdump is enabled by default on desktops

Bug 2243068 - kdump is enabled by default on desktops

Summary: kdump is enabled by default on desktops

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kexec-tools
Sub Component:
Version:	39
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Coiby
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	AcceptedBlocker
Depends On:
Blocks:	F39FinalBlocker
TreeView+	depends on / blocked

Reported:	2023-10-10 14:26 UTC by Chris Murphy
Modified:	2023-10-25 15:40 UTC (History)
CC List:	14 users (show)
Fixed In Version:	kexec-tools-2.0.27-2.fc39
Clone Of:
Environment:
Last Closed:	2023-10-22 08:24:35 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Fedora Package Sources	kexec-tools pull-request 18	0	None	None	None	2023-10-10 18:55:42 UTC
Red Hat Issue Tracker	FC-996	0	None	None	None	2023-10-10 14:29:39 UTC

Description Chris Murphy 2023-10-10 14:26:04 UTC

During a `dnf update` that included a kernel update, I saw this message:
```
kdump: For kernel=/boot/vmlinuz-6.5.6-300.fc39.x86_64, crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M now. Please reboot the system for the change to take effet. Note if you don't want kexec-tools to manage the crashkernel kernel parameter, please set auto_reset_crashkernel=no in /etc/kdump.conf.
```

And the new kernel (only) BLS drop-in snippet does have this modification to the kernel command line, and it's added persistently to /etc/kernel/cmdline. I think this is unintended, but even if intentional it's not an announced change and Workstation WG requests it be reverted.

Reproducible: Always

Steps to Reproduce:
1. Clean install Fedora-Workstation-Live-x86_64-39-20231009.n.0.iso
2. Reboot and `dnf update`
3.

Actual Results:  
A new kernel package is installed, 6.5.6 - and a `crashkernel=`boot parameter is added to its BLS snippet, thereby configuring kdump subsequent to installation (i.e. not out of the box, the original installed kernel doesn't have this addition, but it does get enabled without user opting in).

Expected Results:  
kdump shouldn't be enabled or become enabled, without user expressly opting in.

Comment 1 Fedora Blocker Bugs Application 2023-10-10 14:33:42 UTC

Proposed as a Freeze Exception for 39-final by Fedora user chrismurphy using the blocker tracking app because:

 Probably not a blocker, mainly because I can't find an applicable criterion. 

While it can be fixed with an update, the update to kexec-tools would need to be applied before the first kernel is installed.

The consequence of not fixing it is excessive  memory consumption, which might negatively affect some users and workloads more than others, possibly causing them to fail.

The workaround would be to remove the crashkernel entry (using grubby) and follow the script's recommendation to disable the autoconfiguration in /etc/kdump.conf

Comment 2 Chris Murphy 2023-10-10 15:22:41 UTC

Reverting to a Fedora 38 (btrfs snapshot) I see /etc/kdump.conf likewise contains `auto_reset_crashkernel yes` but I've never seen kernel updates result in BLS snippets having crashkernel= boot parameter added until Fedora 39...

I'm confused what suppressed this on Fedora 38, and/or what's activating it on Fedora 39.

Comment 3 Chris Murphy 2023-10-10 15:43:08 UTC

On Fedora 39 Workstation, toggling the /etc/kdump.conf setting auto_reset_crashkernel yes/no, followed by kernel reinstall, does have the expected behavior. 'yes' results in crashkernel parameter being added to the BLS snippet, whereas 'no' results in the parameter not being added.

I guess regardless of why this is happening in Fedora 39 and not 38, we need /etc/kdump.conf to contain `auto_reset_crashkernel no` on the installation media. I think this is blocker worthy but it lacks a criterion.

Comment 4 Adam Williamson 2023-10-10 17:32:48 UTC

There's been a whole ton of changes to the implementation of this in kexec-tools spec over the last several months, any one of which could've caused this. If you diff the rawhide branch against the f38 branch you can see it's hugely different.

I had noticed this but for some reason thought it was an intended change...can't find any Change for it, though.

I don't think `auto_reset_crashkernel no` is actually the right fix, as I understand the intended design here. The point of that argument is to say "leave my custom crashkernel arg as it is, please". The correct fix should be for the scripts to not *add* a crashkernel parameter if there wasn't one there already...

Comment 5 Adam Williamson 2023-10-10 18:54:02 UTC

https://src.fedoraproject.org/rpms/kexec-tools/pull-request/18 is my best effort to diagnose and fix this.

Honestly, this whole setup seems *wildly* overengineered, but hey, bootloaders!

Comment 6 Chris Murphy 2023-10-10 19:19:11 UTC

draft commonbugs: If you installed or upgraded to a pre-release version of Fedora 39, you will have a crashkernel boot parameter set persistently, even after the fix. The fix will prevent the parameter from being added if it doesn't already exist. But it won't remove it once it exists. To remove it, run: sudo grubby --remove-args="crashkernel" --update=ALL

Comment 7 Adam Williamson 2023-10-10 21:35:45 UTC

+3 in https://pagure.io/fedora-qa/blocker-review/issue/1398 , marking accepted.

Comment 8 Philipp Rudo 2023-10-12 18:43:03 UTC

Hi all,

I took a look must say this is a total mess.

First let me give you some background. We added 5b31b09 ("Simplify the
management of the kernel parameter crashkernel") because the old way caused too
many problems. The change in there is intentional and I want to keep it this
way. I admit that we should have documented this change better. And maybe we
should have changed the default of auto_reset_crashkernel to 'no' but even that
is not as straight forward as you might imagine.

kexec-tools was designed with the goal to provide a kdump setup that "just work
out of the box". But trying to start kdump without having the crashkernel
parameter on the kernel command line will result in a failure. That means
kexec-tools need to set a reasonable crashkernel value to work. The reasoning
behind this is that by installing the kexec-tools a user signals that he/she/it
wants to use kdump and automatically opts-in into the package managing the
crashkernel value. If a user does not want the package to manage the parameter
he/she/it can set auto_reset_crashkernel to 'no' and set the parameter to
whatever he/she/it likes.

So the big question for me is _why_ are the kexec-tools installed per default
in the first place?

I ran some install tests and if I'm not mistaken there are two* packages
installed per default that have a dependency on kexec-tools 1) abrt and 2)
anaconda.

In that abrt (more precisely abrt-addon-vmcore) has a hard dependency in its
spec file as well as in the abrt-vmcore.service on kexec-tools/the
kdump.servie. Meaning that if kdump.service fails to start abrt-vmcore.service
also fails to start. Which means that abrt requires to have the crashkernel
parameter set to work as expected!

And for anaconda I'm totally puzzled. I fully understand why anaconda (more
precisely the anaconda-install-env-deps) has the dependency to kexec-tools. But
why the hell is anaconda installed at all? I could even start the installer on
my laptop where I do my daily work! (BTW I wouldn't recommend doing this as the
installer reboots when you try to close it killing everything that's currently
running). That looks totally broken to me.

Anyway, if desired we can switch the default to 'auto_reset_crashkernel no' on
the risk that abrt no longer works properly. But IMHO that will only hide the
most of the problem.

Thanks
Philipp

* For the desktop edition, the server edition only has abrt.

Comment 9 Philipp Rudo 2023-10-12 18:47:11 UTC

Just for completeness this is what I get when I try to remove kexec-tools from a freshly installed F38 workstation (identical for F39). 
Having all those packages with a (indirect) dependency on kexec-tools is completely broken IMHO.



$ sudo dnf remove kexec-tools
[sudo] password for prudo: 
Dependencies resolved.
==============================================================================================================================================================
 Package                                          Architecture                Version                                    Repository                      Size
==============================================================================================================================================================
Removing:
 kexec-tools                                      x86_64                      2.0.26-3.fc38                              @anaconda                      1.3 M
Removing dependent packages:
 abrt-cli                                         x86_64                      2.16.1-1.fc38                              @anaconda                        0  
 abrt-desktop                                     x86_64                      2.16.1-1.fc38                              @anaconda                        0  
 anaconda-install-env-deps                        x86_64                      38.23.4-2.fc38                             @anaconda                        0  
Removing unused dependencies:
 abrt-addon-ccpp                                  x86_64                      2.16.1-1.fc38                              @anaconda                      296 k
 abrt-addon-kerneloops                            x86_64                      2.16.1-1.fc38                              @anaconda                       89 k
 abrt-addon-pstoreoops                            x86_64                      2.16.1-1.fc38                              @anaconda                       19 k
 abrt-addon-vmcore                                x86_64                      2.16.1-1.fc38                              @anaconda                       37 k
 abrt-addon-xorg                                  x86_64                      2.16.1-1.fc38                              @anaconda                       71 k
 abrt-gui                                         x86_64                      2.16.1-1.fc38                              @anaconda                       99 k
 abrt-gui-libs                                    x86_64                      2.16.1-1.fc38                              @anaconda                       36 k
 abrt-plugin-bodhi                                x86_64                      2.16.1-1.fc38                              @anaconda                       40 k
 abrt-tui                                         noarch                      2.16.1-1.fc38                              @anaconda                       84 k
 bcache-tools                                     x86_64                      1.1-4.fc38                                 @anaconda                      157 k
 boost-regex                                      x86_64                      1.78.0-11.fc38                             @anaconda                      281 k
 ctags                                            x86_64                      6.0.0-1.fc38                               @anaconda                      2.1 M
 cxl-libs                                         x86_64                      76.1-1.fc38                                @anaconda                      128 k
 dracut-squash                                    x86_64                      059-2.fc38                                 @anaconda                      3.0 k
 elfutils                                         x86_64                      0.189-1.fc38                               @anaconda                      2.7 M
 gdb                                              x86_64                      13.1-2.fc38                                @anaconda                      421 k
 gdb-headless                                     x86_64                      13.1-2.fc38                                @anaconda                       15 M
 gnome-abrt                                       x86_64                      1.4.2-4.fc38                               @anaconda                      540 k
 iniparser                                        x86_64                      4.1-11.fc38                                @anaconda                       27 k
 isomd5sum                                        x86_64                      1:1.2.3-19.fc38                            @anaconda                       59 k
 libbabeltrace                                    x86_64                      1.5.11-2.fc38                              @anaconda                      520 k
 libblockdev-kbd                                  x86_64                      2.28-5.fc38                                @anaconda                       36 k
 libblockdev-lvm-dbus                             x86_64                      2.28-5.fc38                                @anaconda                       77 k
 libblockdev-nvdimm                               x86_64                      2.28-5.fc38                                @anaconda                       24 k
 libblockdev-plugins-all                          x86_64                      2.28-5.fc38                                @anaconda                        0  
 libipt                                           x86_64                      2.0.5-3.fc38                               @anaconda                      115 k
 libreport-fedora                                 x86_64                      2.17.9-1.fc38                              @anaconda                       53 k
 libreport-plugin-kerneloops                      x86_64                      2.17.9-1.fc38                              @anaconda                       44 k
 libreport-plugin-logger                          x86_64                      2.17.9-1.fc38                              @anaconda                       52 k
 lvm2-dbusd                                       noarch                      2.03.18-2.fc38                             @anaconda                      645 k
 ndctl                                            x86_64                      76.1-1.fc38                                @anaconda                      347 k
 python3-abrt-addon                               noarch                      2.16.1-1.fc38                              @anaconda                       19 k
 python3-argcomplete                              noarch                      2.0.0-6.fc38                               @anaconda                      271 k
 python3-beautifulsoup4                           noarch                      4.12.0-1.fc38                              @anaconda                      1.3 M
 python3-humanize                                 noarch                      3.13.1-6.fc38                              @anaconda                      188 k
 python3-lxml                                     x86_64                      4.9.2-2.fc38                               @anaconda                      5.0 M
 python3-soupsieve                                noarch                      2.3.2.post1-8.fc38                         @anaconda                      317 k
 source-highlight                                 x86_64                      3.1.9-16.fc38                              @anaconda                      3.1 M
 squashfs-tools                                   x86_64                      4.5.1-3.fc38                               @anaconda                      574 k
 tmux                                             x86_64                      3.3a-3.fc38                                @anaconda                      1.2 M
 udisks2-iscsi                                    x86_64                      2.9.4-6.fc38                               @anaconda                       56 k

Transaction Summary
==============================================================================================================================================================
Remove  45 Packages

Freed space: 37 M
Is this ok [y/N]: n
Operation aborted.

Comment 10 Adam Williamson 2023-10-12 21:16:38 UTC

All install environment dependencies will be installed on any system deployed from a live image, because that's how live installs have to be. Live installs just dump the live environment image onto the target filesystem(s), essentially. There is no package management happening.

We have to include all install environment requirements in the live environment because the live environment *is* an install environment. So if kexec-tools has to be in the install environment - and the reason for that is that can configure *kexec* through the installer, not kdump - then that means we will include it in the live environment, and thus it will be on the installed system.

To me it's a problem that you say "kexec-tools was designed with the goal to provide a kdump setup that "just work out of the box"" - but the package is k*exec*-tools, not k*dump*-tools. anaconda doesn't want it for any reason to do with kdump. It wants it to back this code:

https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/kexec.py

that's all about kexec. Nothing to do with kdump. It's a feature that lets you pass an option to have anaconda reboot after install using kexec, to bypass the regular system startup process:

https://github.com/rhinstaller/anaconda/commit/1eeae684ea0ea29a2bf7c36ec0ebbda1f0adddb3

There *is* some kdump support written for anaconda, but that is shipped in a separate addon package - kdump-anaconda-addon - which we could potentially leave off live images if we didn't want to pull kexec-tools into them. But we can't really drop the anaconda-install-env-deps requirement for kexec-tools as things stand, because the kexec feature - which *is* part of core anaconda - really does need it. We could, as an ugly mitigation, forcibly drop it from the live images, and just accept that the feature will break if anyone tries to use it on live images. But that doesn't feel great.

Could we separate the kdump bits of the package from the kexec bits, perhaps? Because I don't think, right now, your assumption that "by installing the kexec-tools a user signals that he/she/it wants to use kdump" is warranted.

Comment 11 Adam Williamson 2023-10-12 21:17:30 UTC

And of course, all of this ignores the problem of existing systems being upgraded. Because things have been this way for years, probably every existing system out there installed from a live image has kexec-tools installed. As things stand, they are *all* going to get the kernel arg when they upgrade to F39. That is clearly not what we intended to happen.

Comment 12 Philipp Rudo 2023-10-13 15:19:20 UTC

(In reply to Adam Williamson from comment #10)
> All install environment dependencies will be installed on any system
> deployed from a live image, because that's how live installs have to be.
> Live installs just dump the live environment image onto the target
> filesystem(s), essentially. There is no package management happening.
> 
> We have to include all install environment requirements in the live
> environment because the live environment *is* an install environment. So if
> kexec-tools has to be in the install environment - and the reason for that
> is that can configure *kexec* through the installer, not kdump - then that
> means we will include it in the live environment, and thus it will be on the
> installed system.

To be honest that sounds completely wrong to me. The Server Edition apparently is able to do package management during installation. Why can't the Desktop Edition do the same? And even if there is a good reason I'm missing, anaconda could still be removed at first boot. Having anaconda on disk after the installation finished is a bug and needs to be fixed!
 
> To me it's a problem that you say "kexec-tools was designed with the goal to
> provide a kdump setup that "just work out of the box"" - but the package is
> k*exec*-tools, not k*dump*-tools. anaconda doesn't want it for any reason to
> do with kdump. It wants it to back this code:
> 
> https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/kexec.py
> 
> that's all about kexec. Nothing to do with kdump. It's a feature that lets
> you pass an option to have anaconda reboot after install using kexec, to
> bypass the regular system startup process:
> 
> https://github.com/rhinstaller/anaconda/commit/
> 1eeae684ea0ea29a2bf7c36ec0ebbda1f0adddb3
> 
> There *is* some kdump support written for anaconda, but that is shipped in a
> separate addon package - kdump-anaconda-addon - which we could potentially
> leave off live images if we didn't want to pull kexec-tools into them. But
> we can't really drop the anaconda-install-env-deps requirement for
> kexec-tools as things stand, because the kexec feature - which *is* part of
> core anaconda - really does need it. We could, as an ugly mitigation,
> forcibly drop it from the live images, and just accept that the feature will
> break if anyone tries to use it on live images. But that doesn't feel great.
> 
> Could we separate the kdump bits of the package from the kexec bits,
> perhaps? Because I don't think, right now, your assumption that "by
> installing the kexec-tools a user signals that he/she/it wants to use kdump"
> is warranted.

Of course we can split out the kdump bits. In fact we are working on it right now

https://pagure.io/packaging-committee/issue/1303

I just don't believe that it will be ready in time for F39.

And please believe me that I don't like the assumption as well. But that's how it was done since the kexec-tools were included with FC5.

> And of course, all of this ignores the problem of existing systems being
> upgraded. Because things have been this way for years, probably every
> existing system out there installed from a live image has kexec-tools
> installed. As things stand, they are *all* going to get the kernel arg when
> they upgrade to F39. That is clearly not what we intended to happen.

Then we should work on fixing those systems and remove anaconda on them as well. Only because a bug exists for a long time doesn't mean that it doesn't need to be fixed.

Pleas also keep in mind that there is still the problem with abrt-addon-vmcore. When we remove the crashkernel parameter it will break their requirement on the kdump.service. Personally I don't have a problem in breaking abrt-addon-vmcore as for me having this dependency was bogus in the first place. But that is just me and needs to be agreed on by others as well. Ideally including the abrt maintainers.

FYI, Coiby and I also discuss off-list on what is the best fix this issue. He suggested instead of changing the default for auto_reset_crashkernel to only update/set the parameter if the kdump.service is enabled. That would allow to have the desired behavior depending on the systemd presets. We are currently checking whether this has some unintended side effect.

Comment 13 Adam Williamson 2023-10-13 15:52:21 UTC

"To be honest that sounds completely wrong to me. The Server Edition apparently is able to do package management during installation. Why can't the Desktop Edition do the same?"

Because Server is a traditional install image that deploys packages, and Workstation is a live image.

"And even if there is a good reason I'm missing, anaconda could still be removed at first boot. Having anaconda on disk after the installation finished is a bug and needs to be fixed!"

This turns out to be more difficult than you'd think. It was actually a proposed feature for F39, but had to be abandoned: https://fedoraproject.org/wiki/Changes/AutoFirstBootServices#Feedback . Partly this was due to elements of the feature that don't involve removing anaconda, but one of the blockers - how do you handle errors? - did affect that part of the feature.

Also, what we're talking about here is not really "removing anaconda", is it? Removing anaconda would not automatically result in the removal of kexec-tools. What you're proposing is, kinda, "removing the dependencies of anaconda-install-env-deps". But that is not straightforward. We can't just wholesale remove all of them after all live installs, because sometimes anaconda *would have installed packages from that set as part of the installation*. For instance, if you *do* use anaconda's kexec feature, it adds kexec-tools to the set of packages to be installed to the system (if it's doing package installs). So it's not correct to simply unconditionally wipe all anaconda-install-env-deps packages after install. We would instead have to make anaconda, on live installs, keep a record of "packages it would have installed", and only remove things that *aren't* in that set, post-install.

"And please believe me that I don't like the assumption as well. But that's how it was done since the kexec-tools were included with FC5."

It simply is *not*, though. As I said, every live image install (including Workstation installs, which are by far the most common Fedora installs) since the dependency was added to anaconda in 2015 will have had kexec-tools in it, yet those installs did not get a crashkernel= parameter at install time or on system update and cannot be assumed to want one. The assumption as stated *is not true* and cannot be relied upon.

"FYI, Coiby and I also discuss off-list on what is the best fix this issue. He suggested instead of changing the default for auto_reset_crashkernel to only update/set the parameter if the kdump.service is enabled. That would allow to have the desired behavior depending on the systemd presets. We are currently checking whether this has some unintended side effect."

This seems like a promising approach, if we can be sure there hasn't been any case where something has auto-enabled the service for folks who didn't specifically want it in the past.

Comment 14 Philipp Rudo 2023-10-13 17:15:21 UTC

(In reply to Adam Williamson from comment #13)
> "To be honest that sounds completely wrong to me. The Server Edition
> apparently is able to do package management during installation. Why can't
> the Desktop Edition do the same?"
> 
> Because Server is a traditional install image that deploys packages, and
> Workstation is a live image.

Ok, then I obviously need to do my homework to better understand the differences. Anyway that's not the real problem we should discuss here.

> "And even if there is a good reason I'm missing, anaconda could still be
> removed at first boot. Having anaconda on disk after the installation
> finished is a bug and needs to be fixed!"
> 
> This turns out to be more difficult than you'd think. It was actually a
> proposed feature for F39, but had to be abandoned:
> https://fedoraproject.org/wiki/Changes/AutoFirstBootServices#Feedback .
> Partly this was due to elements of the feature that don't involve removing
> anaconda, but one of the blockers - how do you handle errors? - did affect
> that part of the feature.
> 
> Also, what we're talking about here is not really "removing anaconda", is
> it? Removing anaconda would not automatically result in the removal of
> kexec-tools. What you're proposing is, kinda, "removing the dependencies of
> anaconda-install-env-deps". But that is not straightforward. We can't just
> wholesale remove all of them after all live installs, because sometimes
> anaconda *would have installed packages from that set as part of the
> installation*. For instance, if you *do* use anaconda's kexec feature, it
> adds kexec-tools to the set of packages to be installed to the system (if
> it's doing package installs). So it's not correct to simply unconditionally
> wipe all anaconda-install-env-deps packages after install. We would instead
> have to make anaconda, on live installs, keep a record of "packages it would
> have installed", and only remove things that *aren't* in that set,
> post-install.

No, I really mean "removing anaconda". It's a separate issue than what was reported in this Bug. But I firmly believe that it should be treated with the same priority.

In my opinion the fact that anaconda is installed and can be run to overwrite my harddrive, accidentally or on purpose, on the system I rely on for my daily work is a big problem. 

For the dependencies I would prefer if they could be removed as well but that's more my preference to have a lean system.

> "And please believe me that I don't like the assumption as well. But that's
> how it was done since the kexec-tools were included with FC5."
> 
> It simply is *not*, though. As I said, every live image install (including
> Workstation installs, which are by far the most common Fedora installs)
> since the dependency was added to anaconda in 2015 will have had kexec-tools
> in it, yet those installs did not get a crashkernel= parameter at install
> time or on system update and cannot be assumed to want one. The assumption
> as stated *is not true* and cannot be relied upon.

That's one of the problems I have. Nobody told us that kexec is used to reboot the system. All use cases we knew of were kdump related. So there was no reason for us to question this assumption (which again exists since 2005).

One more comment. We don't support the kexec reboot case as there are simply too many broken (out-of-tree) drivers that cause problems for the kernel being booted. If a --kexec doesn't work as expected you are on your own.

> "FYI, Coiby and I also discuss off-list on what is the best fix this issue.
> He suggested instead of changing the default for auto_reset_crashkernel to
> only update/set the parameter if the kdump.service is enabled. That would
> allow to have the desired behavior depending on the systemd presets. We are
> currently checking whether this has some unintended side effect."
> 
> This seems like a promising approach, if we can be sure there hasn't been
> any case where something has auto-enabled the service for folks who didn't
> specifically want it in the past.

That's why I would prefer to change the default as that requires explicit user interaction to change. On the other hand that probably will cause problems for systems that are upgrade and still have the old /etc/kdump.conf...

Comment 15 Chris Murphy 2023-10-14 19:05:08 UTC

(In reply to Philipp Rudo from comment #12)
>Having anaconda on disk after the installation finished is a bug and needs to be fixed!

Summary of that is "several insurmountable technical issues" implementing it outside of Anaconda, and "difficult to implement" inside of Anaconda.

https://pagure.io/fedora-workstation/issue/242
https://bugzilla.redhat.com/show_bug.cgi?id=2218466

One possible work around for this: https://pagure.io/fedora-btrfs/project/issue/62

>On the other hand that probably will cause problems for systems that are upgrade and still have the old /etc/kdump.conf...

I think systems being upgraded to Fedora 39 having this kernel parameter added upon the next kernel being updated+installed is a problem. It sounds like the kernel parameter does actually pilfer memory, even if kdump.service is not enabled. Is that correct? If so, given lack of a release criterion that clearly makes this a block, we need to consider the alternative: ask FESCo to make it a blocker. We just can't have hapless users run into this just because they do a system upgrade. As bad as it is, it'd be better to just wholesale step on everyone's kdump.conf with the new (safer) default.

But anyway, I'm hopeful kdump.service enabled/disabled can be used as a condition to check for whether the kernel parameter should be inserted.

Comment 16 Adam Williamson 2023-10-16 18:53:32 UTC

Proposed fix: https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/thread/MRSNRKY4JQJQ6ZD3ANQY6D5FLJC5FNUH/

Comment 17 Coiby 2023-10-17 07:16:35 UTC

Hi Adam,

I've released kexec-tools-2.0.27-3.fc40 to fix this issue. I guess I'll need to make a release for F39 as well, right?

Comment 18 Michael Catanzaro 2023-10-17 13:41:58 UTC

Yes, we need an F39 update in bodhi.

Comment 19 Adam Williamson 2023-10-17 15:49:31 UTC

Right, do an F39 build, create an update, and mark the update as fixing this bug. Thanks!

Comment 20 Adam Williamson 2023-10-17 23:42:39 UTC

Per https://pagure.io/fesco/issue/3083 , FESCo has designated this bug as an F39 Final blocker, marking as such.

No change for you, Coiby - still the same plan (do an F39 update, mark it as fixing this bug).

Comment 21 Coiby 2023-10-18 02:40:57 UTC

(In reply to Adam Williamson from comment #20)
> Per https://pagure.io/fesco/issue/3083 , FESCo has designated this bug as an
> F39 Final blocker, marking as such.
> 
> No change for you, Coiby - still the same plan (do an F39 update, mark it as
> fixing this bug).

Thanks for the info! I've created an update https://bodhi.fedoraproject.org/updates/FEDORA-2023-93fab3be92

Comment 22 Philipp Rudo 2023-10-18 14:55:22 UTC

(In reply to Chris Murphy from comment #15)
> (In reply to Philipp Rudo from comment #12)
> >Having anaconda on disk after the installation finished is a bug and needs to be fixed!
> 
> Summary of that is "several insurmountable technical issues" implementing it
> outside of Anaconda, and "difficult to implement" inside of Anaconda.
> 
> https://pagure.io/fedora-workstation/issue/242
> https://bugzilla.redhat.com/show_bug.cgi?id=2218466
> 
> One possible work around for this:
> https://pagure.io/fedora-btrfs/project/issue/62

Ok, I see...

> >On the other hand that probably will cause problems for systems that are upgrade and still have the old /etc/kdump.conf...
> 
> I think systems being upgraded to Fedora 39 having this kernel parameter
> added upon the next kernel being updated+installed is a problem. It sounds
> like the kernel parameter does actually pilfer memory, even if kdump.service
> is not enabled. Is that correct? If so, given lack of a release criterion
> that clearly makes this a block, we need to consider the alternative: ask
> FESCo to make it a blocker. We just can't have hapless users run into this
> just because they do a system upgrade. As bad as it is, it'd be better to
> just wholesale step on everyone's kdump.conf with the new (safer) default.

Yes, the crashkernel parameter will always reserve the memory regardless of the kdump.service running. You have to see it this way that the crashkernel parameter and the memory it reserves are a prereq for the kdump.service. But the kernel doesn't have any information whether user space will use it later on or not.
The problem with the crashkernel memory is that it is the memory the kdump kernel _runs_ in. That means it has to be continuous in physical memory and cannot be used for "normal" memory management. So it needs to be reserved early during boot and cannot be allocated when a kdump kernel is loaded.
 
> But anyway, I'm hopeful kdump.service enabled/disabled can be used as a
> condition to check for whether the kernel parameter should be inserted.

As said, the parameter is a prereq for the service to do it's job. So I believe it's the best knob we have.

Comment 23 Fedora Update System 2023-10-18 16:41:21 UTC

FEDORA-2023-93fab3be92 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-93fab3be92

Comment 24 Adam Williamson 2023-10-18 16:43:11 UTC

Thanks, I've marked the update as fixing this bug.

Comment 25 Fedora Update System 2023-10-19 02:16:39 UTC

FEDORA-2023-93fab3be92 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-93fab3be92`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-93fab3be92

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 ltao 2023-10-19 02:32:09 UTC

(In reply to Philipp Rudo from comment #14)
> (In reply to Adam Williamson from comment #13)
> > "To be honest that sounds completely wrong to me. The Server Edition
> > apparently is able to do package management during installation. Why can't
> > the Desktop Edition do the same?"
> > 
> > Because Server is a traditional install image that deploys packages, and
> > Workstation is a live image.
> 
> Ok, then I obviously need to do my homework to better understand the
> differences. Anyway that's not the real problem we should discuss here.
> 
> > "And even if there is a good reason I'm missing, anaconda could still be
> > removed at first boot. Having anaconda on disk after the installation
> > finished is a bug and needs to be fixed!"
> > 
> > This turns out to be more difficult than you'd think. It was actually a
> > proposed feature for F39, but had to be abandoned:
> > https://fedoraproject.org/wiki/Changes/AutoFirstBootServices#Feedback .
> > Partly this was due to elements of the feature that don't involve removing
> > anaconda, but one of the blockers - how do you handle errors? - did affect
> > that part of the feature.
> > 
> > Also, what we're talking about here is not really "removing anaconda", is
> > it? Removing anaconda would not automatically result in the removal of
> > kexec-tools. What you're proposing is, kinda, "removing the dependencies of
> > anaconda-install-env-deps". But that is not straightforward. We can't just
> > wholesale remove all of them after all live installs, because sometimes
> > anaconda *would have installed packages from that set as part of the
> > installation*. For instance, if you *do* use anaconda's kexec feature, it
> > adds kexec-tools to the set of packages to be installed to the system (if
> > it's doing package installs). So it's not correct to simply unconditionally
> > wipe all anaconda-install-env-deps packages after install. We would instead
> > have to make anaconda, on live installs, keep a record of "packages it would
> > have installed", and only remove things that *aren't* in that set,
> > post-install.
> 
> No, I really mean "removing anaconda". It's a separate issue than what was
> reported in this Bug. But I firmly believe that it should be treated with
> the same priority.
> 
> In my opinion the fact that anaconda is installed and can be run to
> overwrite my harddrive, accidentally or on purpose, on the system I rely on
> for my daily work is a big problem. 
> 
> For the dependencies I would prefer if they could be removed as well but
> that's more my preference to have a lean system.
> 
> > "And please believe me that I don't like the assumption as well. But that's
> > how it was done since the kexec-tools were included with FC5."
> > 
> > It simply is *not*, though. As I said, every live image install (including
> > Workstation installs, which are by far the most common Fedora installs)
> > since the dependency was added to anaconda in 2015 will have had kexec-tools
> > in it, yet those installs did not get a crashkernel= parameter at install
> > time or on system update and cannot be assumed to want one. The assumption
> > as stated *is not true* and cannot be relied upon.
> 
> That's one of the problems I have. Nobody told us that kexec is used to
> reboot the system. All use cases we knew of were kdump related. So there was
> no reason for us to question this assumption (which again exists since 2005).
> 
> One more comment. We don't support the kexec reboot case as there are simply
> too many broken (out-of-tree) drivers that cause problems for the kernel
> being booted. If a --kexec doesn't work as expected you are on your own.

Yeah, I agree. Or it should be documented in anaconda that, using kexec reboot instead of normal machine reboot is likely to have broken drivers, so some hardware may not work properly. We have encountered several cases of graphic cards, network cards etc won't work well after kexec reboot into a new kernel. E.g: https://lore.kernel.org/kexec/CAO7dBbV=D3N31L-VPS=2Vtreqc-4drKYHT1xWrKphT3J_G5Ndw@mail.gmail.com/. A simple explaination is, the old kernel should shutdown the drivers then new kernel can reinitialze and bringup the drivers. However the thing is not always as easy. Many out-of-tree/in-tree drivers don't implement .shutdown function, or it is very difficult to reinitialize a hardware without power off...

kexec reboot is widely used in kdump, since we only want to have a vmcore generated, not really care if the video work normal during the period. However it would be unacceptable for desktop users when their display frozen after anocanda kexec reboot.


> 
> > "FYI, Coiby and I also discuss off-list on what is the best fix this issue.
> > He suggested instead of changing the default for auto_reset_crashkernel to
> > only update/set the parameter if the kdump.service is enabled. That would
> > allow to have the desired behavior depending on the systemd presets. We are
> > currently checking whether this has some unintended side effect."
> > 
> > This seems like a promising approach, if we can be sure there hasn't been
> > any case where something has auto-enabled the service for folks who didn't
> > specifically want it in the past.
> 
> That's why I would prefer to change the default as that requires explicit
> user interaction to change. On the other hand that probably will cause
> problems for systems that are upgrade and still have the old
> /etc/kdump.conf...

Comment 27 Lukas Ruzicka 2023-10-19 08:45:01 UTC

The new update works for me as expected.

Comment 28 Fedora Update System 2023-10-22 08:24:35 UTC

FEDORA-2023-93fab3be92 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 29 Villy Kruse 2023-10-23 09:26:20 UTC

(In reply to Philipp Rudo from comment #9)
> Just for completeness this is what I get when I try to remove kexec-tools
> from a freshly installed F38 workstation (identical for F39). 
> Having all those packages with a (indirect) dependency on kexec-tools is
> completely broken IMHO.
> 

Try it with "--noautoremove".  Most of the packages were required by kexec-tools,
and thus would have become leaf packages which were not marked as user-installed.

Comment 30 Kamil Páral 2023-10-24 12:44:06 UTC

I wanted to document this problem (it was proposed as a Common Issue), but I found out that, on my machine, crashkernel= is no longer present in kernel-6.5.8-300.fc39 BLS config. It is present in previous 6.5.7-300.fc39. I didn't do anything to "fix" this situation, so the proposed update must have fixed it automatically.

Is it accurate to say that this is automatically resolved for people once they have kexec-tools-2.0.27-2.fc39 and install a new kernel?

Comment 31 Chris Murphy 2023-10-24 14:10:25 UTC

>Is it accurate to say that this is automatically resolved for people once they have kexec-tools-2.0.27-2.fc39 and install a new kernel?

I think it's correct. Once `/etc/kdump.conf` contains `auto_reset_crashkernel no` then the crashkernel parameter isn't *subsequently* added during new kernel installation. But previously installed kernels will still have the crashkernel parameter.

Comment 32 Adam Williamson 2023-10-24 15:51:05 UTC

But the update didn't change anything about /etc/kdump.conf . It just changes the conditions under which kdumpctl's reset_crashkernel modes actually *do* anything to also check whether kdump.service is enabled, as well as checking the value in that file.

The result seems a bit unexpected to me, but...the logic about how cmdline gets set is twisty. I kinda thought it was still more or less just copied from the current kernel to the newly-installed one, so I was expecting the arg to stick around if it was already there. But I can believe it's more complicated than that.

Comment 33 Martin Pitt 2023-10-25 04:56:12 UTC

Before this change, kdumping after a crash just worked, as the kernel command line contained "crashkernel=auto". With this update, the current F39 cloud images now does not have any crashkernel option any more:

    BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.5.6-300.fc39.x86_64 root=UUID=bcb812e5-520b-4e3b-894b-07979c42e113 ro rootflags=subvol=root no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8

and consequently you can try to enable kdump.service all you want, or even reboot, but it will still refuse to start "because of an unmet condition check (ConditionKernelCommandLine=crashkernel)".

I found that

    systemctl enable kdump; kdumpctl reset-crashkernel --reboot

works, but that took me a while to figure out. Is that the official recipe now? That's at least much better than fiddling around with grubby.

Comment 34 Adam Williamson 2023-10-25 05:57:13 UTC

Pretty much, yes. We do not want kdump after a crash to "just work" by default because this involves consuming quite a lot of memory. At least, doing so should go through the Change process for FESCo to evaluate the tradeoffs, not just suddenly appear as it did in this case.

Comment 35 Kamil Páral 2023-10-25 08:28:47 UTC

(In reply to Adam Williamson from comment #32)
> The result seems a bit unexpected to me, but...the logic about how cmdline
> gets set is twisty. I kinda thought it was still more or less just copied
> from the current kernel to the newly-installed one, so I was expecting the
> arg to stick around if it was already there. But I can believe it's more
> complicated than that.

I also verified in a VM that crashkernel= is really lost with a new kernel update. So I don't think we need to document this as a Common Bug, removing the flags. I also verified that new system installations don't have crashkernel= set.

Comment 36 Philipp Rudo 2023-10-25 13:55:37 UTC

(In reply to Kamil Páral from comment #30)
> I wanted to document this problem (it was proposed as a Common Issue), but I
> found out that, on my machine, crashkernel= is no longer present in
> kernel-6.5.8-300.fc39 BLS config. It is present in previous 6.5.7-300.fc39.
> I didn't do anything to "fix" this situation, so the proposed update must
> have fixed it automatically.
> 
> Is it accurate to say that this is automatically resolved for people once
> they have kexec-tools-2.0.27-2.fc39 and install a new kernel?

Actually this is expected. In 8175924 ("kdumpctl: Stop updating grub config in reset_crashkernel") we stopped setting the crashkernel parameter on the grub default command line. We did that because there are now different kernel variants on aarch64 which can be installed in parallel but need to have a different crashkernel value set. That's why we have to set it for each kernel individually. Which also means that it won't be set if the preconditions aren't met.

I'm not entirely sure how the grub default command line translates into the BLS entry. But 90-loaderentry.install that creates the entry looks on multiple other locations first before taking the command line from /proc/cmdline. On my system it's from /etc/kernel/cmdline.

Comment 37 Villy Kruse 2023-10-25 15:40:11 UTC

(In reply to Philipp Rudo from comment #36)

> I'm not entirely sure how the grub default command line translates into the
> BLS entry. But 90-loaderentry.install that creates the entry looks on
> multiple other locations first before taking the command line from
> /proc/cmdline. On my system it's from /etc/kernel/cmdline.

90-loaderentry.install is for sd-boot entries and 20-grub.install is for grub
entries.  But both takes the command options from /etc/kernel/cmdline.

Note You need to log in before you can comment on or make changes to this bug.