Bug 2328627 - 2.8.1-1.rc2 regression: rpc.statd crashes with SIGABRT in nsm_atomic_write()
Summary: 2.8.1-1.rc2 regression: rpc.statd crashes with SIGABRT in nsm_atomic_write()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 40
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL: https://cockpit-logs.us-east-1.linode...
Whiteboard: CockpitTest
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-11-25 08:08 UTC by Martin Pitt
Modified: 2024-12-24 13:50 UTC (History)
4 users (show)

Fixed In Version: nfs-utils-2.8.1-2.rc2.fc41 nfs-utils-2.8.1-2.rc2.fc40
Clone Of:
Environment:
Last Closed: 2024-12-03 02:51:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Martin Pitt 2024-11-25 08:08:46 UTC
the 2.8.1 rc1 → rc2 update in https://bodhi.fedoraproject.org/updates/FEDORA-2024-e27e34f8dc introduced a regression: (re)starting NFS crashes rpc.statd.



Reproducible: Always

Steps to Reproduce:
mkdir /home/foo /home/bar /mnt/test
printf '/home/foo 127.0.0.0/24(rw)\n/home/bar 127.0.0.0/24(rw)\n' > /etc/exports
systemctl restart nfs-server
Actual Results:  
Process 1272 (rpc.statd) of user 0 dumped core.

Module libpcre2-8.so.0 from rpm pcre2-10.44-1.fc40.x86_64
Module libz.so.1 from rpm zlib-ng-2.1.7-2.fc40.x86_64
Module libselinux.so.1 from rpm libselinux-3.7-5.fc40.x86_64
Module libcrypto.so.3 from rpm openssl-3.2.2-3.fc40.x86_64
Module libkeyutils.so.1 from rpm keyutils-1.6.3-3.fc40.x86_64
Module libkrb5support.so.0 from rpm krb5-1.21.3-2.fc40.x86_64
Module libcom_err.so.2 from rpm e2fsprogs-1.47.0-5.fc40.x86_64
Module libk5crypto.so.3 from rpm krb5-1.21.3-2.fc40.x86_64
Module libkrb5.so.3 from rpm krb5-1.21.3-2.fc40.x86_64
Module libgssapi_krb5.so.2 from rpm krb5-1.21.3-2.fc40.x86_64
Module libtirpc.so.3 from rpm libtirpc-1.3.6-0.fc40.x86_64
Module libcap.so.2 from rpm libcap-2.69-8.fc40.x86_64
Module rpc.statd from rpm nfs-utils-2.8.1-1.rc2.fc40.x86_64
Stack trace of thread 1272:
#0  0x00007f0a3f23d664 __pthread_kill_implementation (libc.so.6 + 0x99664)
#1  0x00007f0a3f1e4c4e raise (libc.so.6 + 0x40c4e)
#2  0x00007f0a3f1cc902 abort (libc.so.6 + 0x28902)
#3  0x00007f0a3f1cd767 __libc_message_impl.cold (libc.so.6 + 0x29767)
#4  0x00007f0a3f2cb969 __fortify_fail (libc.so.6 + 0x127969)
#5  0x00007f0a3f2cb304 __chk_fail (libc.so.6 + 0x127304)
#6  0x00007f0a3f2ccaf5 __snprintf_chk (libc.so.6 + 0x128af5)
#7  0x0000564ff5a417b6 nsm_atomic_write (rpc.statd + 0x87b6)
#8  0x0000564ff5a3f230 main (rpc.statd + 0x6230)
#9  0x00007f0a3f1ce088 __libc_start_call_main (libc.so.6 + 0x2a088)
#10 0x00007f0a3f1ce14b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
#11 0x0000564ff5a3faf5 _start (rpc.statd + 0x6af5)
ELF object binary architecture: AMD x86-64

Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..

Expected Results:  
Start works without a crash.

Found in https://github.com/cockpit-project/cockpit/issues/21312

Comment 1 Martin Pitt 2024-11-25 08:18:20 UTC
This also affects nfs-utils-1:2.8.1-1.rc2.fc41.x86_64 in Fedora 41, but https://bodhi.fedoraproject.org/updates/FEDORA-2024-39db8155bf went into updates too fast.

Comment 2 Martin Pitt 2024-11-25 08:24:07 UTC
This is not just a cosmetical issue (even though such a crash by itself is bad enough). It breaks libvirt storage: https://cockpit-logs.us-east-1.linodeobjects.com/pull-0-3b01b184-20241125-022308-fedora-40-updates-testing/log.html#26-2

Comment 3 Bojan Smojver 2024-11-25 08:38:06 UTC
Very interesting. I'm not seeing the NFS server crashes on F41, even after restart. But, maybe I have a different config.

Comment 4 Steve Dickson 2024-11-25 10:30:05 UTC
I can not reproduce this regression in either rawhide, f41, or f40. Can you please explain the tests you are running?

Comment 5 Steve Dickson 2024-11-25 10:35:37 UTC
cat /etc/exports
#/home *.home.dicksonnet.net(rw,s2sc)
#/home *.home.dicksonnet.net(rw)
/home *(rw,sec=sys:krb5:krb5i:krb5p)
/tmp *(rw,fsid=666,all_squash)
/home/foo 127.0.0.0/24(rw)
/home/bar 127.0.0.0/24(rw)

f40# systemctl restart nfs-server
f40# systemctl status nfs-server
* nfs-server.service - NFS server and services
     Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             `-10-timeout-abort.conf
             /run/systemd/generator/nfs-server.service.d
             `-order-with-mounts.conf
     Active: active (exited) since Mon 2024-11-25 05:33:32 EST; 20s ago
       Docs: man:rpc.nfsd(8)
             man:exportfs(8)
    Process: 48961 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
    Process: 48962 ExecStart=/bin/sh -c /usr/sbin/nfsdctl autostart || /usr/sbin/rpc.nfsd (co>
    Process: 48983 ExecStart=/bin/sh -c if systemctl -q is-active gssproxy; then systemctl re>
   Main PID: 48983 (code=exited, status=0/SUCCESS)
        CPU: 20ms

Nov 25 05:33:32 f40.home.dicksonnet.net systemd[1]: Starting nfs-server.service - NFS server >
Nov 25 05:33:32 f40.home.dicksonnet.net systemd[1]: Finished nfs-server.service - NFS server >
f40# 


What do I need to do to reproduce this problem?

Comment 6 Martin Pitt 2024-11-25 11:14:10 UTC
Reproducer out of thin air from a standard cloud image:

curl -L -O https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2
# nothing fancy, just admin:foobar and root:foobar
curl -L -O https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso
qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive file=Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2,if=virtio -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net user,hostfwd=tcp::2201-:22

Log into VT as root:foobar or admin:foobar, or for a more comfortable shell "ssh -p 2201 admin@localhost" and `sudo -i`.

dnf install -y nfs-utils

mkdir /home/foo /home/bar /mnt/test
printf '/home/foo 127.0.0.0/24(rw)\n/home/bar 127.0.0.0/24(rw)\n' > /etc/exports
systemctl restart nfs-server

Then "systemctl status rpc.statd" shows the failed service, and "journalctl -b" shows the backtrace. nfs-server.service is indeed ok, but that's just an empty meta-unit.

Comment 7 Steve Dickson 2024-11-26 20:55:48 UTC
(In reply to Martin Pitt from comment #6)
> Reproducer out of thin air from a standard cloud image:
> 
> curl -L -O
> https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/
> Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2
> # nothing fancy, just admin:foobar and root:foobar
> curl -L -O
> https://github.com/cockpit-project/bots/raw/main/machine/cloud-init.iso
> qemu-system-x86_64 -cpu host -enable-kvm -nographic -m 2048 -drive
> file=Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2,if=virtio
> -snapshot -cdrom cloud-init.iso -net nic,model=virtio -net
> user,hostfwd=tcp::2201-:22
This qemu command hangs... 

Can you give me access to the cloud you are seeing this problem with.

Because this is the only environment that is seeing this problem

> 
> Log into VT as root:foobar or admin:foobar, or for a more comfortable shell
> "ssh -p 2201 admin@localhost" and `sudo -i`.
This ssh also hangs....

Comment 8 Steve Dickson 2024-11-26 21:03:27 UTC
This is what I'm seeing 

https://paste.centos.org/view/2cd7e24f

Comment 9 Martin Pitt 2024-11-26 21:13:11 UTC
> This qemu command hangs... 

It's doing PXE boot because the disk failed. Did the Fedora-Cloud-Base-Generic-41-20241025.n.0.x86_64.qcow2 download actually work, i.e. does the file have a reasonable size? Because right now it's gone, today's image is https://ftp-stud.hs-esslingen.de/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-Rawhide-20241126.n.0.x86_64.qcow2 . Just grab the current one from https://ftp-stud.hs-esslingen.de/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/ . Or use whichever testing image you have in your CI?

> Can you give me access to the cloud you are seeing this problem with.

It fails in Testing Farm, my laptop (where I ran the reproducer), and cockpit's CI on PSI OpenStack. This *really* isn't hardware specific. When you tried this in comment #5, can you (1) double-check that you have nfs-utils 2.8.1rc2 installed (*not* rc1), and did you check the journal and "systemctl status rpc.statd"?

Comment 10 Martin Pitt 2024-11-26 21:13:58 UTC
Sorry, I posted the geolocation redirection. Current image from https://download.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/

Comment 11 Martin Pitt 2024-11-27 09:08:11 UTC
Or https://download.fedoraproject.org/pub/fedora/linux/development/41/Cloud/x86_64/images/ if you prefer to investigate on 41 instead of rawhide.

Comment 12 Steve Dickson 2024-11-27 11:40:46 UTC
commit 8fcddae4437510137baf108f477d116ce345ce80 (HEAD -> master)
Author: Benjamin Coddington <bcodding>
Date:   Wed Nov 27 06:32:46 2024 -0500

    libnsm: fix the safer atomic filenames fix

Comment 13 Terje Rosten 2024-11-28 20:29:58 UTC
Local build with patch over included fixed the problem here. An update would be nice :-)

Comment 14 Steve Dickson 2024-11-30 13:11:10 UTC
commit ce17ca7f4093d1c760651a7fed92e3e741cb11aa (HEAD -> master)
Author: Benjamin Coddington <bcodding>
Date:   Wed Nov 27 07:01:06 2024 -0500

    libnsm(v2): fix the safer atomic filenames fix

Comment 16 Fedora Update System 2024-11-30 13:42:52 UTC
FEDORA-2024-93dd1e473f (nfs-utils-2.8.1-2.rc2.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-93dd1e473f

Comment 17 Fedora Update System 2024-11-30 13:42:53 UTC
FEDORA-2024-e47c860a1a (nfs-utils-2.8.1-2.rc2.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-e47c860a1a

Comment 18 Fedora Update System 2024-12-01 04:28:22 UTC
FEDORA-2024-93dd1e473f has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-93dd1e473f`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-93dd1e473f

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 19 Fedora Update System 2024-12-02 04:21:34 UTC
FEDORA-2024-e47c860a1a has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-e47c860a1a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-e47c860a1a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2024-12-03 02:51:53 UTC
FEDORA-2024-e47c860a1a (nfs-utils-2.8.1-2.rc2.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 21 Fedora Update System 2024-12-03 02:54:49 UTC
FEDORA-2024-93dd1e473f (nfs-utils-2.8.1-2.rc2.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 22 Martin Pitt 2024-12-03 06:54:45 UTC
Steve, can you please upload this to rawhide as well? Thanks!

Comment 23 Steve Dickson 2024-12-24 13:48:54 UTC
(In reply to Martin Pitt from comment #22)
> Steve, can you please upload this to rawhide as well? Thanks!

it is see nfs-utils-2.8.1

Comment 24 Steve Dickson 2024-12-24 13:50:22 UTC
(In reply to Steve Dickson from comment #23)
> (In reply to Martin Pitt from comment #22)
> > Steve, can you please upload this to rawhide as well? Thanks!
> 
> it is see nfs-utils-2.8.1

actually it is nfs-utils-2.8.2


Note You need to log in before you can comment on or make changes to this bug.