2305968 – Kdumping to NFS fails to unpack the initramfs: write error

Bug 2305968 - Kdumping to NFS fails to unpack the initramfs: write error

Summary: Kdumping to NFS fails to unpack the initramfs: write error

Keywords:
Status:	NEW
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kdump-utils
Sub Component:
Version:	41
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Lichen Liu
QA Contact:
Docs Contact:
URL:
Whiteboard:	CockpitTest
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-20 10:11 UTC by Marius Vollmer
Modified:	2025-02-25 06:55 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kdumpctl estimate trace (94.48 KB, text/plain) 2024-08-21 08:37 UTC, Marius Vollmer	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FC-1481	0	None	None	None	2025-02-17 10:11:25 UTC

Description Marius Vollmer 2024-08-20 10:11:41 UTC

kdump-utils-1.0.44-2.fc41.x86_64

Configuring kdump to dump over NFS fails when the kernel boots into the kdump initrd with these messages:

[    2.188806] Run /init as init process
/usr/bin/sh: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
[    2.195521] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00

The config file contains this:

    auto_reset_crashkernel yes
    core_collector makedumpfile -l --message-level 7 -d 31
    nfs 10.111.113.2:/srv/kdump

Dump to local storage works ok, no idea what is different about NFS.

This happens in the Cockpit integration tests, and I can quite easily do any kind of experiments and dig out more details, but I don't know where to start.


Reproducible: Always

Comment 1 Marius Vollmer 2024-08-20 13:14:08 UTC

Here is another way that it fails:

[    2.134937] Run /init as init process
/init: line 17: modprobe: command not found
/init: line 18: modprobe: command not found
/init: line 19: modprobe: command not found
mount: /squash/root: failed to setup loop device for /squash-root.img.
mount: /newroot: unknown filesystem type 'overlay'.
       dmesg(1) may have more information after failed mount system call.
mount: /newroot/squash: mount point does not exist.
       dmesg(1) may have more information after failed mount system call.
/init: line 34: exec: switch_root: not found
[    2.191049] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00

This looks very much like a incomplete initrd, and our test does indeed crash the kernel right after building that initrd. However, whenever I have inspected that initrd manually, it looked complete.

I even put a couple of syncs in there before crashing the kernel, but that didn't help either. I try more things.

Comment 2 Marius Vollmer 2024-08-20 13:45:52 UTC

Aha:

[    0.343042] Initramfs unpacking failed: write error

(This is from a different run, disregard time stamps.)

Comment 3 Marius Vollmer 2024-08-20 14:18:10 UTC

Increasing the memory of the virtual machine from 1 GiB to 3 GiB didn't help.

Comment 4 Marius Vollmer 2024-08-20 15:07:37 UTC

Ok, I think I figured it out, from the comments in https://github.com/cockpit-project/bots/pull/6706. Thanks a lot for those!

I put "grubby --update-kernel=ALL --args='crashkernel=256M'" into the test run (after kdumpctl reset-crashkernel and before the next reboot). This helps and the initramfs is unpacked correctly.

So here is my vote for bumping the crashkernel values. If you ask me, just make it 512 MiB everywhere. It's really not clear at all when you run into this limit and what to do about it.

Unfortunately, the kdump initramfs then runs into a variant of bug 2306035, and waits for some disk that never shows up.

Comment 5 Martin Pitt 2024-08-21 06:14:44 UTC

> If you ask me, just make it 512 MiB everywhere.

But that would be prohibitively expensive for embedded or cloud machines which only have 1 GiB (or even less) of RAM. It could perhaps be made *conditionally* larger if you enabled kdump on NFS -- plus a "just don't do kdump NFS then" for small devices.

The other option would be to give the NFS-enabled initrd a diet again to get back to the size it had in F39/40. That's harder to do of course, but would also have the benefit of continuing to work with such devices.

Comment 6 Martin Pitt 2024-08-21 06:23:59 UTC

OTOH, if you meant "make it a *constant* everywhere", and not specifically "512 MiB", then as far as I understand now how kdump works, that makes sense -- This value only depends on how much RAM booting the kernel and unpacking the initrd needs, but *not* how much RAM the machine actually has, right? So what is the reason for these crashkernel= "zones", of providing increasingly more RAM for kdump depending on the host RAM? Thanks!

Comment 7 Marius Vollmer 2024-08-21 06:49:32 UTC

I went into this way too naive, not knowing what "crashkernel" is about, and "man kdumpctl" didn't make much sense to me.

Now that I know what crashkernel is, I think we need to teach Cockpit about it and take care of it in the UI. 

Ideally, the user wouldn't be aware of crashkernel. Cockpit would take care that the kernel is booted with an appropriate value. For example, after changing the config and rebuilding the initrd, Cockpit would also run "kdumpctl estimate" to get a suitable crashkernel value, change the kernel command lines if necessary and tell people to reboot "to ensure that kdump works properly".

Comment 8 Martin Pitt 2024-08-21 06:54:07 UTC

I agree that Cockpit can help, and should connect the dots. But "how big should it be" is domain specific knowledge which *has* to come from kdump-utils. This demonstrably evolves, and there is no single number or OS independent algorithm that Cockpit could use, nor should it -- all of this should work for admins on the CLI as well.

So, reading "kdumpctl estimate" and plugging it into crashkernel= seems fine -- but why doesn't "kdumpctl reset-failed" do exactly that? This still feels like a bug in kdumpctl to me.

Comment 9 Marius Vollmer 2024-08-21 07:02:34 UTC

> So, reading "kdumpctl estimate" and plugging it into crashkernel= seems fine -- but why doesn't "kdumpctl reset-failed" do exactly that? This still feels like a bug in kdumpctl to me.

Yeah, there should be "kdumpctl rebuild --set-crashkernel" or similar.

Comment 10 Marius Vollmer 2024-08-21 07:20:53 UTC

Here is the output of "kdumpctl estimate" for the config that triggered this bug report:

    # kdumpctl estimate
    Reserved crashkernel:    192M
    Recommended crashkernel: 175M

    Kernel image size:   61M
    Kernel modules size: 8M
    Initramfs size:      42M
    Runtime reservation: 64M
    Large modules:
        nfsv4: 1302528
        kvm: 1449984

So the estimate is too low. The man page warns against this, but this means we either fix kdumpctl to produce reliable numbers, or we can't use it in Cockpit. Still, I think it's a missing feature in the sense that Cockpit lets you edit the kdump config, but the very important crashkernel stuff needs to be done outside of Cockpit.

I am not sure how to proceed here... I think for now we just stop using "kdumpctl reset-crashkernel" in our tests and hardcode crashkernel=256M. Martin said we have done this earlier, I'll dig out the history.

Comment 11 Martin Pitt 2024-08-21 07:32:35 UTC

@Marius: The last bit of hardcoding disappeared in https://github.com/cockpit-project/cockpit/commit/b74151bc076a0b865d , as a reaction to changes in Fedora 39.

Hardcoding the crashkernel= value would fix our tests, but that's IMHO beside the point: That just papers over the bug. Cockpit/our tests should uses what admins are supposed to use, and the documentation [1] says the primary thing is reset-crashkernel (of course with an appropriate amount of caveats)

[1] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel#configuring-kdump-memory-usage-on-rhel-9_configuring-kdump-on-the-command-line

Comment 12 Lichen Liu 2024-08-21 07:38:43 UTC

Thanks for your attention to this issue.

The crashkernel is based on physical memory for historical reasons. In previous implementations, the memory occupied by makedumpfile would change with the size of physical memory. This problem has been solved in the current version. However, the new problem is that peripheral devices, drivers, firmware, etc. take more and more memory in the second kernel. `kdumpctl estimate` estimates the memory size required by the second kernel based on the size of kernel and initrd, but the results obtained so far are not accurate enough.

We are looking for a more accurate and friendly way to get a more reasonable crashkernel value. The actual environment is diverse, we cannot set a universal value. It must be calculated based on the actual runtime.

In fact, it is common to find that kdump does not work when a panic occurs, so we recommend that customers test kdump before using it to confirm that the current configuration is effective. We are tring to simply this work, such as adding [kdumpctl test](https://github.com/rhkdump/kdump-utils/pull/20). But this is more of a manual way to solve this problem, and we will continue to explore how to automatically complete these tasks during installation or first boot - Obviously, virtual machines created from the same base image are likely to have the same kdump configuration, but their runtimes are different.

Comment 13 Coiby 2024-08-21 07:52:02 UTC

(In reply to Marius Vollmer from comment #10)
> Here is the output of "kdumpctl estimate" for the config that triggered this
> bug report:
> 
>     # kdumpctl estimate
>     Reserved crashkernel:    192M
>     Recommended crashkernel: 175M
> 
>     Kernel image size:   61M
>     Kernel modules size: 8M
>     Initramfs size:      42M
>     Runtime reservation: 64M
>     Large modules:
>         nfsv4: 1302528
>         kvm: 1449984
> 
> So the estimate is too low. The man page warns against this, but this means
> we either fix kdumpctl to produce reliable numbers, or we can't use it in
> Cockpit. Still, I think it's a missing feature in the sense that Cockpit
> lets you edit the kdump config, but the very important crashkernel stuff
> needs to be done outside of Cockpit.
> 
> I am not sure how to proceed here... I think for now we just stop using
> "kdumpctl reset-crashkernel" in our tests and hardcode crashkernel=256M.
> Martin said we have done this earlier, I'll dig out the history.

It surprises me that the Recommended value is smaller than 192M. Can you attach the output of "bash -x kdumpctl estimate"? Thanks!

Comment 14 Marius Vollmer 2024-08-21 08:33:26 UTC

(In reply to Lichen Liu from comment #12)

> In fact, it is common to find that kdump does not work when a panic occurs,

That's not good. I would hope that "reliability" is one of the requirements for debugging tools. Kdump is surprisingly complex and fragile.

Anyway, can we try to figure out why the estimate is too low in this particular case?

This is the memory for the second kernel:

    Memory: 82036K/196340K available
            (20480K kernel code, 4316K rwdata, 15384K rodata, 4748K init, 5220K bss, 110308K reserved,
             0K cma-reserved)

The Internet tells me that the inittmpfs is allowed to use 50% of the memory, so this might be 50% of the 82036K available, which is 41018K. The initramfs kdump image is 43M and larger than that. Is that already the explanation? We know that a 39M initramfs kdump image works.

Comment 15 Marius Vollmer 2024-08-21 08:37:46 UTC

Created attachment 2044528 [details]
kdumpctl estimate trace

Comment 16 Marius Vollmer 2024-08-21 08:43:28 UTC

> Cockpit/our tests should uses what admins are supposed to use, and the documentation [1] says the primary thing is reset-crashkernel (of course with an appropriate amount of caveats)

Yeah, and it's good to test whether the defaults actually work. I just hope that the defaults are not permanently broken, so that we get to test the actual dumping process as well. (Which is currently broken as well, see bug 2306035)

Comment 17 Marius Vollmer 2024-08-21 08:54:11 UTC

Just for kicks, here is the memory of the second kernel on rhel-9-5, where the test passes just fine:

    Memory: 92392K/197236K available
            (16384K kernel code, 5676K rwdata, 12892K rodata, 3972K init, 5688K bss, 104588K reserved,
             0K cma-reserved)

So 50% of available memory is enough for the 43M initramfs.

The estimate is also low, however:

    # kdumpctl estimate
    Reserved crashkernel:    192M
    Recommended crashkernel: 175M

    Kernel image size:   54M
    Kernel modules size: 13M
    Initramfs size:      42M
    Runtime reservation: 64M
    Large modules:
        xfs: 2584576
        nfsv4: 1257472
        kvm: 1400832

Comment 18 Martin Pitt 2024-10-30 07:17:41 UTC

This also affects RHEL 9.6:
https://cockpit-logs.us-east-1.linodeobjects.com/pull-7035-03cd81e7-20241028-080556-rhel-9-6-expensive-cockpit-project-cockpit/log.html#5

Does it make sense to create a jira issue for that, or is this bugzilla enough on the Fedora side?

Comment 19 Lichen Liu 2025-02-19 06:44:20 UTC

Hi Martin, kdump initramfs size will be reduced after this patch merged https://github.com/rhkdump/kdump-utils/commit/27df3385b059693e983190c8ad80bd1f7e12db6c.
It depends on dracut --add-confdir, this feature is available in RHEL9.6 and RHEL10.0. For fedora, I think it should be available after dracut-105 rebase in the future.

Thanks,
Lichen

Comment 20 Martin Pitt 2025-02-19 09:59:10 UTC

Thanks Lichen! That commit certainly helps to shrink the initramfs, but that's not really what this bug report is about. Rather, that `kdumpctl estimate` and the default reserved space are not computed properly.

Comment 21 Lichen Liu 2025-02-25 06:55:37 UTC

Hi Martin, 

Yes, you are right, and we plan to improve `kdumpctl estimate` in RHEL-10.1. Any suggestions and discussions are welcome.

Thanks!

Note You need to log in before you can comment on or make changes to this bug.