the 2.8.1 rc1 → rc2 update in https://bodhi.fedoraproject.org/updates/FEDORA-2024-e27e34f8dc introduced a regression: (re)starting NFS crashes rpc.statd. Reproducible: Always Steps to Reproduce: mkdir /home/foo /home/bar /mnt/test printf '/home/foo 127.0.0.0/24(rw)\n/home/bar 127.0.0.0/24(rw)\n' > /etc/exports systemctl restart nfs-server Actual Results: Process 1272 (rpc.statd) of user 0 dumped core. Module libpcre2-8.so.0 from rpm pcre2-10.44-1.fc40.x86_64 Module libz.so.1 from rpm zlib-ng-2.1.7-2.fc40.x86_64 Module libselinux.so.1 from rpm libselinux-3.7-5.fc40.x86_64 Module libcrypto.so.3 from rpm openssl-3.2.2-3.fc40.x86_64 Module libkeyutils.so.1 from rpm keyutils-1.6.3-3.fc40.x86_64 Module libkrb5support.so.0 from rpm krb5-1.21.3-2.fc40.x86_64 Module libcom_err.so.2 from rpm e2fsprogs-1.47.0-5.fc40.x86_64 Module libk5crypto.so.3 from rpm krb5-1.21.3-2.fc40.x86_64 Module libkrb5.so.3 from rpm krb5-1.21.3-2.fc40.x86_64 Module libgssapi_krb5.so.2 from rpm krb5-1.21.3-2.fc40.x86_64 Module libtirpc.so.3 from rpm libtirpc-1.3.6-0.fc40.x86_64 Module libcap.so.2 from rpm libcap-2.69-8.fc40.x86_64 Module rpc.statd from rpm nfs-utils-2.8.1-1.rc2.fc40.x86_64 Stack trace of thread 1272: #0 0x00007f0a3f23d664 __pthread_kill_implementation (libc.so.6 + 0x99664) #1 0x00007f0a3f1e4c4e raise (libc.so.6 + 0x40c4e) #2 0x00007f0a3f1cc902 abort (libc.so.6 + 0x28902) #3 0x00007f0a3f1cd767 __libc_message_impl.cold (libc.so.6 + 0x29767) #4 0x00007f0a3f2cb969 __fortify_fail (libc.so.6 + 0x127969) #5 0x00007f0a3f2cb304 __chk_fail (libc.so.6 + 0x127304) #6 0x00007f0a3f2ccaf5 __snprintf_chk (libc.so.6 + 0x128af5) #7 0x0000564ff5a417b6 nsm_atomic_write (rpc.statd + 0x87b6) #8 0x0000564ff5a3f230 main (rpc.statd + 0x6230) #9 0x00007f0a3f1ce088 __libc_start_call_main (libc.so.6 + 0x2a088) #10 0x00007f0a3f1ce14b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b) #11 0x0000564ff5a3faf5 _start (rpc.statd + 0x6af5) ELF object binary architecture: AMD x86-64 Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking.. Expected Results: Start works without a crash. Found in https://github.com/cockpit-project/cockpit/issues/21312
This also affects nfs-utils-1:2.8.1-1.rc2.fc41.x86_64 in Fedora 41, but https://bodhi.fedoraproject.org/updates/FEDORA-2024-39db8155bf went into updates too fast.
This is not just a cosmetical issue (even though such a crash by itself is bad enough). It breaks libvirt storage: https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-3b01b184-20241125-022308-fedora-40-updates-testing/log.html#26-2
Very interesting. I'm not seeing the NFS server crashes on F41, even after restart. But, maybe I have a different config.
I can not reproduce this regression in either rawhide, f41, or f40. Can you please explain the tests you are running?
cat /etc/exports #/home *.home.dicksonnet.net(rw,s2sc) #/home *.home.dicksonnet.net(rw) /home *(rw,sec=sys:krb5:krb5i:krb5p) /tmp *(rw,fsid=666,all_squash) /home/foo 127.0.0.0/24(rw) /home/bar 127.0.0.0/24(rw) f40# systemctl restart nfs-server f40# systemctl status nfs-server * nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; preset: disabled) Drop-In: /usr/lib/systemd/system/service.d `-10-timeout-abort.conf /run/systemd/generator/nfs-server.service.d `-order-with-mounts.conf Active: active (exited) since Mon 2024-11-25 05:33:32 EST; 20s ago Docs: man:rpc.nfsd(8) man:exportfs(8) Process: 48961 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Process: 48962 ExecStart=/bin/sh -c /usr/sbin/nfsdctl autostart || /usr/sbin/rpc.nfsd (co> Process: 48983 ExecStart=/bin/sh -c if systemctl -q is-active gssproxy; then systemctl re> Main PID: 48983 (code=exited, status=0/SUCCESS) CPU: 20ms Nov 25 05:33:32 f40.home.dicksonnet.net systemd[1]: Starting nfs-server.service - NFS server > Nov 25 05:33:32 f40.home.dicksonnet.net systemd[1]: Finished nfs-server.service - NFS server > f40# What do I need to do to reproduce this problem?
Reproducer out of thin air from a standard cloud image: curl -L -O https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2 # nothing fancy, just admin:foobar and root:foobar curl -L -O https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive file=Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2,if=virtio -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net user,hostfwd=tcp::2201-:22 Log into VT as root:foobar or admin:foobar, or for a more comfortable shell "ssh -p 2201 admin@localhost" and `sudo -i`. dnf install -y nfs-utils mkdir /home/foo /home/bar /mnt/test printf '/home/foo 127.0.0.0/24(rw)\n/home/bar 127.0.0.0/24(rw)\n' > /etc/exports systemctl restart nfs-server Then "systemctl status rpc.statd" shows the failed service, and "journalctl -b" shows the backtrace. nfs-server.service is indeed ok, but that's just an empty meta-unit.
(In reply to Martin Pitt from comment #6) > Reproducer out of thin air from a standard cloud image: > > curl -L -O > https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/ > Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2 > # nothing fancy, just admin:foobar and root:foobar > curl -L -O > https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso > qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive > file=Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2,if=virtio > -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net > user,hostfwd=tcp::2201-:22 This qemu command hangs... Can you give me access to the cloud you are seeing this problem with. Because this is the only environment that is seeing this problem > > Log into VT as root:foobar or admin:foobar, or for a more comfortable shell > "ssh -p 2201 admin@localhost" and `sudo -i`. This ssh also hangs....
This is what I'm seeing https://paste.centos.org/view/2cd7e24f
> This qemu command hangs... It's doing PXE boot because the disk failed. Did the Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2 download actually work, i.e. does the file have a reasonable size? Because right now it's gone, today's image is https://ftp-stud.hs-esslingen.de/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-Rawhide-20241126.n.0.x86_64.qcow2 . Just grab the current one from https://ftp-stud.hs-esslingen.de/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/ . Or use whichever testing image you have in your CI? > Can you give me access to the cloud you are seeing this problem with. It fails in Testing Farm, my laptop (where I ran the reproducer), and cockpit's CI on PSI OpenStack. This *really* isn't hardware specific. When you tried this in comment #5, can you (1) double-check that you have nfs-utils 2.8.1rc2 installed (*not* rc1), and did you check the journal and "systemctl status rpc.statd"?
Sorry, I posted the geolocation redirection. Current image from https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/
Or https://download.fedoraproject.org/pub/fedora/linux/development/41/Cloud/x86_64/images/ if you prefer to investigate on 41 instead of rawhide.
commit 8fcddae4437510137baf108f477d116ce345ce80 (HEAD -> master) Author: Benjamin Coddington <bcodding> Date: Wed Nov 27 06:32:46 2024 -0500 libnsm: fix the safer atomic filenames fix
Local build with patch over included fixed the problem here. An update would be nice :-)
commit ce17ca7f4093d1c760651a7fed92e3e741cb11aa (HEAD -> master) Author: Benjamin Coddington <bcodding> Date: Wed Nov 27 07:01:06 2024 -0500 libnsm(v2): fix the safer atomic filenames fix
f40-candidate build: https://koji.fedoraproject.org/koji/taskinfo?taskID=126395639 f41-candidate build: https://koji.fedoraproject.org/koji/taskinfo?taskID=126395734
FEDORA-2024-93dd1e473f (nfs-utils-2.8.1-2.rc2.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-93dd1e473f
FEDORA-2024-e47c860a1a (nfs-utils-2.8.1-2.rc2.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2024-e47c860a1a
FEDORA-2024-93dd1e473f has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-93dd1e473f` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-93dd1e473f See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-e47c860a1a has been pushed to the Fedora 41 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-e47c860a1a` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-e47c860a1a See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-e47c860a1a (nfs-utils-2.8.1-2.rc2.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-93dd1e473f (nfs-utils-2.8.1-2.rc2.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.
Steve, can you please upload this to rawhide as well? Thanks!
(In reply to Martin Pitt from comment #22) > Steve, can you please upload this to rawhide as well? Thanks! it is see nfs-utils-2.8.1
(In reply to Steve Dickson from comment #23) > (In reply to Martin Pitt from comment #22) > > Steve, can you please upload this to rawhide as well? Thanks! > > it is see nfs-utils-2.8.1 actually it is nfs-utils-2.8.2