Bug 2099256 - kdump dump to nfs fails
Summary: kdump dump to nfs fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 36
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Pavel Valena
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-20 12:08 UTC by Matej Marušák
Modified: 2022-07-01 01:07 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-07-01 01:07:33 UTC
Type: Bug
Embargoed:
ruyang: needinfo-
ltao: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2100668 0 unspecified CLOSED "Apply all sysctl settings when NFS-related modules are loaded" causes error messages in early boot of images, prevents ... 2022-07-31 15:28:17 UTC
Red Hat Issue Tracker FC-485 0 None None None 2022-06-22 01:21:32 UTC

Description Matej Marušák 2022-06-20 12:08:42 UTC
Description of problem:

In Cockpit CI we noticed that the newest refresh of Fedora-36 image fails with nfs kdump.

Version-Release number of selected component (if applicable):
Likely this update: nfs-utils (1:2.6.1-2.rc5.fc36 -> 1:2.6.1-2.rc6.fc36)
For all updated packages see at the end of the report.

How reproducible:
1. Boot fedora-36 twice, to one I will refer as X1 and to other as X2. X1 is the machine on which kernel will crash, X2 is the NFS storage. X1 is on 10.111.113.1/24, X2 on 10.111.113.2/24

2. on X2: `echo -ne "/srv/kdump 10.111.113.0/24(rw,no_root_squash)\n" > /etc/exports`
3. on X2: `mkdir -p /srv/kdump/var/crash; firewall-cmd --add-service nfs; systemctl restart nfs-server`

4 on X1: `systemctl disable kdump`                                              
5. on X1: `grubby --args=crashkernel=256M --update-kernel=ALL`                  
6. <reboot>                                                                     
7. on X1: `echo -ne "auto_reset_crashkernel yes\ncore_collector makedumpfile -l --message-level 7 -d 31\nnfs 10.111.113.2:/srv/kdump" > /etc/kdump.conf`
8. on X1: `systemctl enable --now kdump`
8. on X1: `echo 1 > /proc/sys/kernel/sysrq`                                     
9. on X1: `echo c > /proc/sysrq-trigger`                                        
10. <boot X1 again>                                                             
11. on X2 `file /srv/kdump/var/crash/10.111.113.1*/vmcore` should show some content

Actual results:
Nothing in `/srv/kdump/var/crash/`

Expected results:
Crash dump in `/srv/kdump/var/crash/`


Additional info:
All changed packages:
  binutils (2.37-30.fc36 -> 2.37-31.fc36)
  binutils-gold (2.37-30.fc36 -> 2.37-31.fc36)
  cockpit (270-1.fc36 -> 271-1.fc36)
  cockpit-bridge (270-1.fc36 -> 271-1.fc36)
  cockpit-system (270-1.fc36 -> 271-1.fc36)
  cockpit-ws (270-1.fc36 -> 271-1.fc36)
  edk2-ovmf (20220526git16779ede2d36-1.fc36 -> 20220526git16779ede2d36-3.fc36)
  kernel-core (5.17.13-300.fc36 -> 5.17.14-300.fc36)
  libipa_hbac (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  libnfsidmap (1:2.6.1-2.rc5.fc36 -> 1:2.6.1-2.rc6.fc36)
  libsss_autofs (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  libsss_certmap (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  libsss_idmap (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  libsss_nss_idmap (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  libsss_sudo (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  nfs-utils (1:2.6.1-2.rc5.fc36 -> 1:2.6.1-2.rc6.fc36)
  ntfs-3g (2:2021.8.22-5.fc36 -> 2:2022.5.17-1.fc36)
  ntfs-3g-libs (2:2021.8.22-5.fc36 -> 2:2022.5.17-1.fc36)
  ntfs-3g-system-compression (1.0-8.fc36 -> 1.0-9.fc36)
  ntfsprogs (2:2021.8.22-5.fc36 -> 2:2022.5.17-1.fc36)
  python-srpm-macros (3.10-17.fc36 -> 3.10-18.fc36)
  python-unversioned-command (3.10.4-1.fc36 -> 3.10.5-2.fc36)
  python3 (3.10.4-1.fc36 -> 3.10.5-2.fc36)
  python3-libipa_hbac (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  python3-libs (3.10.4-1.fc36 -> 3.10.5-2.fc36)
  python3-sss (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  python3-sss-murmur (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  python3-sssdconfig (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  qemu-block-curl (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-char-spice (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-common (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-device-usb-host (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-device-usb-redirect (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-guest-agent (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-img (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-kvm-core (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-system-x86-core (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-ui-opengl (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  qemu-ui-spice-core (2:6.2.0-10.fc36 -> 2:6.2.0-12.fc36)
  sssd (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-ad (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-client (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-common (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-common-pac (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-dbus (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-ipa (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-kcm (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-krb5 (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-krb5-common (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-ldap (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-nfs-idmap (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-proxy (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  sssd-tools (2.7.1-1.fc36 -> 2.7.1-2.fc36)
  xen-libs (4.16.1-1.fc36 -> 4.16.1-2.fc36)
  xen-licenses (4.16.1-1.fc36 -> 4.16.1-2.fc36)

Comment 1 Matej Marušák 2022-06-20 13:07:30 UTC
Forgot to mention, while booting X1 we see:

[    2.203647] systemd[1]: Reached target remote-fs-pre.target - Preparation for Remote File Systems.
[    2.204734] systemd[1]: Mounting kdumproot.mount - /kdumproot...
[    2.205713] systemd[1]: dracut-pre-mount.service - dracut pre-mount hook was skipped because all trigger condition checks failed.
[    2.207193] systemd[1]: Reached target initrd-root-fs.target - Initrd Root File System.
[  OK  ] Reached target initrd-root…get - Initrd Root File System.
[    2.210848] systemd[1]: Starting initrd-parse-etc.service - Reload Configuration from the Real Root...
         Starting initrd-parse-etc.…onfiguration from the Real Root...
[    2.217995] systemd[1]: Reloading.
[    2.323573] FS-Cache: Loaded
[    2.281620] mount[417]: mount.nfs: No such device
[    2.288337] systemd[1]: /usr/lib/systemd/system/kdump-capture.service:23: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
[    2.290485] systemd[1]: /usr/lib/systemd/system/kdump-capture.service:24: Standard output type syslog+console is obsolete, automatically updating to journal+console. Please update your unit file, and consider removing the setting altogether.
[    2.330516] systemd[1]: kdumproot.mount: Mount process exited, code=exited, status=32/n/a
[    2.331437] systemd[1]: kdumproot.mount: Failed with result 'exit-code'.
[    2.334214] systemd[1]: Failed to mount kdumproot.mount - /kdumproot.
[FAILED] Failed to mount kdumproot.mount - /kdumproot.
See 'systemctl status kdumproot.mount' for details.

Comment 2 Dave Young 2022-06-22 01:13:40 UTC
Hi Matej, the comment #1 looks like the kdump kernel bootup log.  Could you attach the whole kernel log?  It could be some networking issue, either in the scripts or some device driver issue, anyway kernel log will be helpful.

Moved to kexec-tools component, we can move back to nfs if it is a nfs bug later.

Comment 3 ltao 2022-06-22 09:04:17 UTC
nfs-utils added a new file "/usr/lib/modprobe.d/50-nfs.conf" in rc6 patch [1], which contains lines as:

install sunrpc /sbin/modprobe --ignore-install sunrpc $CMDLINE_OPTS && /sbin/sysctl -q --pattern sunrpc --system

However /sbin/sysctl is not exist in kdump initramfs image, which will fail during dracut-pre-udev:

[    8.834385] dracut-pre-udev[366]: sh: line 1: /sbin/sysctl: No such file or directory
[    8.844594] dracut-pre-udev[365]: modprobe: ERROR: libkmod/libkmod-module.c:990 command_do() Error running install command '/sbin/modprobe --ignore-install sunrpc  && /sbin/sysctl -q --pattern sunrpc --system' for module sunrpc: retcode 127
[    8.867523] dracut-pre-udev[365]: modprobe: ERROR: could not insert 'sunrpc': Invalid argument

Thus nfs modules are not loaded before kdump-capture service starts. Here is the result of lsmod before kdump, we can see no nfs modules presents:

[   27.506599] kdump.sh[599]: Module                  Size  Used by
[   27.514522] kdump.sh[599]: lockd                 122880  0
[   27.521524] kdump.sh[599]: grace                  16384  1 lockd
[   27.529523] kdump.sh[599]: fscache               372736  0
[   27.536530] kdump.sh[599]: netfs                  57344  1 fscache
[   27.544533] kdump.sh[599]: crct10dif_pclmul       16384  1
[   27.551530] kdump.sh[599]: crc32_pclmul           16384  0
[   27.558531] kdump.sh[599]: crc32c_intel           24576  0
[   27.565536] kdump.sh[599]: ghash_clmulni_intel    16384  0
[   27.572534] kdump.sh[599]: ice                   851968  0
[   27.579550] kdump.sh[599]: tg3                   192512  0
[   27.586546] kdump.sh[599]: mgag200                40960  0
[   27.593544] kdump.sh[599]: sunrpc                651264  2 lockd
[   27.601550] kdump.sh[599]: ipmi_devintf           20480  0
[   27.608545] kdump.sh[599]: ipmi_msghandler       122880  1 ipmi_devintf
[   27.616546] kdump.sh[599]: overlay               151552  1
[   27.623529] kdump.sh[599]: squashfs               69632  1
[   27.630515] kdump.sh[599]: loop                   32768  2

As a result, when mount.nfs without nfs modules, it will report errors as: mount.nfs: No such device

A quick fix is to append the following line to kdump.conf, then kdump works fine:

extra_bins /sbin/sysctl

I will work out a better way for formal fix.

[1]: https://src.fedoraproject.org/rpms/nfs-utils/c/d6281e4f6ed7560f723a9fbba5ecae7f329078f9?branch=rawhide

Comment 4 Dave Young 2022-06-22 09:12:08 UTC
Hi Tao,

Thanks! good finding.   Sounds like sysctl is needed in dracut nfs module, so not only a kdump issue if people use nfs in initramfs they will have this bug, so probably the right component should be "dracut"?

Comment 5 Dave Young 2022-06-22 09:13:14 UTC
(In reply to Dave Young from comment #2)
> Hi Matej, the comment #1 looks like the kdump kernel bootup log.  Could you
> attach the whole kernel log?  It could be some networking issue, either in
> the scripts or some device driver issue, anyway kernel log will be helpful.
> 
> Moved to kexec-tools component, we can move back to nfs if it is a nfs bug
> later.

Matej, since Tao has identified the root cause, so please ignore the above request.

Comment 6 ltao 2022-06-22 09:29:59 UTC
(In reply to Dave Young from comment #4)
> Hi Tao,
> 
> Thanks! good finding.   Sounds like sysctl is needed in dracut nfs module,
> so not only a kdump issue if people use nfs in initramfs they will have this
> bug, so probably the right component should be "dracut"?

Yes, I agree it's better to be fixed in dracut, should the bz be assigned to dracut team?

Thanks,
Tao Liu

Comment 7 ltao 2022-06-22 09:31:01 UTC
Sorry, un-needinfo from bhe

Comment 8 Dave Young 2022-06-23 01:35:39 UTC
(In reply to ltao from comment #6)
> (In reply to Dave Young from comment #4)
> > Hi Tao,
> > 
> > Thanks! good finding.   Sounds like sysctl is needed in dracut nfs module,
> > so not only a kdump issue if people use nfs in initramfs they will have this
> > bug, so probably the right component should be "dracut"?
> 
> Yes, I agree it's better to be fixed in dracut, should the bz be assigned to
> dracut team?
> 

Yes, just reassigned. thanks!

Comment 9 Pavel Valena 2022-06-28 16:32:11 UTC
Possibly related to: https://github.com/dracutdevs/dracut/issues/1857

Comment 10 Fedora Update System 2022-06-28 19:53:56 UTC
FEDORA-2022-38325154c4 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-38325154c4

Comment 12 Fedora Update System 2022-06-29 01:33:36 UTC
FEDORA-2022-38325154c4 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-38325154c4`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-38325154c4

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Fedora Update System 2022-07-01 01:07:33 UTC
FEDORA-2022-38325154c4 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.