Bug 1790972
Summary: | Not creating report for rpc.stat startup segfault | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brian J. Murrell <brian> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | abrt-devel-list, bfields, jakub, jmilan, lnykryn, mhabrnal, michal.toman, mmarusak, msekleta, s, steved, systemd-maint, zbyszek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | systemd-246~rc1-1.fc33 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-09 08:50:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Brian J. Murrell
2020-01-14 15:53:59 UTC
Anything in the journal?… Well, of course there is a ton of (unrelated) stuff in the journal. Maybe let's start with this: # journalctl -u rpc-statd.service -- Reboot -- Jan 14 09:27:36 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Jan 14 09:27:36 pc.example.com rpc.statd[91470]: Version 2.4.2 starting Jan 14 09:27:36 pc.example.com rpc.statd[91470]: Flags: TI-RPC Jan 14 09:27:36 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking.. Jan 15 20:48:24 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV Jan 15 20:48:24 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'. Jan 15 20:51:13 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Version 2.4.2 starting Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Flags: TI-RPC Jan 15 20:51:14 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking.. Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'. Jan 15 20:51:32 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Version 2.4.2 starting Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Flags: TI-RPC Jan 15 20:51:32 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking.. Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'. Jan 15 20:52:25 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... Jan 15 20:52:25 pc.example.com rpc.statd[819612]: Version 2.4.2 starting Jan 15 20:52:25 pc.example.com rpc.statd[819612]: Flags: TI-RPC Jan 15 20:52:25 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking.. Along with: Dec 25 21:21:20 pc.example.com kernel: rpc.statd[8567]: segfault at 10 ip 000055f7a7c6c9cd sp 00007ffdd058e7a0 error 6 in rpc.statd[55f7a7c6a000+f000] Jan 05 06:29:43 pc.example.com kernel: rpc.statd[831720]: segfault at 10 ip 00005641b335e9cd sp 00007ffcd097b910 error 6 in rpc.statd (deleted)[5641b335c000+f000] Jan 06 21:03:53 pc.example.com kernel: rpc.statd[3732529]: segfault at 10 ip 0000559ecfc049dd sp 00007ffee8ab1790 error 6 in rpc.statd[559ecfc02000+f000] Jan 06 21:04:06 pc.example.com kernel: rpc.statd[3732569]: segfault at 10 ip 0000563e28a8b9dd sp 00007ffce7907630 error 6 in rpc.statd[563e28a89000+f000] Jan 06 22:21:22 pc.example.com kernel: rpc.statd[3737454]: segfault at 10 ip 000056490e96a9dd sp 00007fff381ded40 error 6 in rpc.statd[56490e968000+f000] Jan 06 22:21:49 pc.example.com kernel: rpc.statd[3737469]: segfault at 10 ip 000055594397f9dd sp 00007ffdeecdbfc0 error 6 in rpc.statd[55594397d000+f000] Jan 07 08:45:15 pc.example.com kernel: rpc.statd[3925915]: segfault at 10 ip 000055e0893609cd sp 00007ffcf44e1020 error 6 in rpc.statd[55e08935e000+f000] Jan 08 06:52:30 pc.example.com kernel: rpc.statd[4184575]: segfault at 10 ip 0000557ff0f6e9cd sp 00007ffcea326050 error 6 in rpc.statd (deleted)[557ff0f6c000+f000] Jan 08 11:16:23 pc.example.com kernel: rpc.statd[153364]: segfault at 10 ip 00005651281e49dd sp 00007ffef586fe00 error 6 in rpc.statd[5651281e2000+f000] Jan 15 20:48:24 pc.example.com kernel: rpc.statd[91470]: segfault at 10 ip 000055f3122d59dd sp 00007ffe0835b1f0 error 6 in rpc.statd[55f3122d3000+f000] Jan 15 20:51:14 pc.example.com kernel: rpc.statd[819459]: segfault at 10 ip 00005581e76a19dd sp 00007ffe70168d30 error 6 in rpc.statd[5581e769f000+f000] Jan 15 20:51:32 pc.example.com kernel: rpc.statd[819482]: segfault at 10 ip 000056337c5cd9dd sp 00007ffd98b439a0 error 6 in rpc.statd[56337c5cb000+f000] Happy to provide any more info as needed. Something that I considered... is abrt even running when rpc-statd.service is started? I’m not looking for logs from rpc.statd, but rather something related to ABRT or just a dump of it all for my perusal. Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do. I'm not going to dump my whole journal. That is way too big with everything that goes in there now, including the massive amounts of crap gnome-shell puts in it.
If there is a specific command or commands you like me to run to produce what you want out of the journal, I'm happy to do that.
> Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do.
Does abort use systemd-coredump? I thought abrt "overrode" that and replaced systemd-coredump with it's own catcher/handler.
But yes, I see:
# cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
So yes, good question. Does systemd-coredump even catch it? Perhaps not. Are you suggesting that this is a systemd issue then, not an abrt issue?
(In reply to Brian J. Murrell from comment #4) > I'm not going to dump my whole journal. That is way too big with everything > that goes in there now, including the massive amounts of crap gnome-shell > puts in it. Perhaps I wasn’t clear. I meant “around the time of crash, so that I could decide what is relevant or not”. > If there is a specific command or commands you like me to run to produce > what you want out of the journal, I'm happy to do that. > > > Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do. > > Does abort use systemd-coredump? I thought abrt "overrode" that and > replaced systemd-coredump with it's own catcher/handler. Yes, for a rather long time now. > But yes, I see: > > # cat /proc/sys/kernel/core_pattern > |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h > > So yes, good question. Does systemd-coredump even catch it? Perhaps not. > Are you suggesting that this is a systemd issue then, not an abrt issue? Again, I am not clairvoyant. Responding with a question does not help me help you. Is there anything in the journal coming from systemd-coredump? Do you see anything in coredumpctl? ABRT does not magically procure dumps on its own, but relies on systemd-coredump. If systemd does not put anything in the journal about the core dump, ABRT will not do anything. Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com systemd[1]: Starting Preprocess NFS configuration convertion...
Jan 15 20:51:13 pc.example.com systemd[1]: nfs-convert.service: Succeeded.
Jan 15 20:51:13 pc.example.com systemd[1]: Started Preprocess NFS configuration convertion.
Jan 15 20:51:13 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd">
Jan 15 20:51:13 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd" >
Jan 15 20:51:13 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Version 2.4.2 starting
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Flags: TI-RPC
Jan 15 20:51:14 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:14 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" e>
Jan 15 20:51:14 pc.example.com audit[819459]: ANOM_ABEND auid=4294967295 uid=29 gid=29 ses=4294967295 subj=system_u:system_r:rpcd_t:s0 pid=819459 comm="rpc.statd" exe=>
Jan 15 20:51:14 pc.example.com kernel: rpc.statd[819459]: segfault at 10 ip 00005581e76a19dd sp 00007ffe70168d30 error 6 in rpc.statd[5581e769f000+f000]
Jan 15 20:51:14 pc.example.com kernel: Code: 99 e6 ff ff 48 8d 3d 47 ca 00 00 31 c0 e8 8b 9f 00 00 e9 87 fe ff ff 48 8d 35 0f c8 00 00 bf 01 00 00 00 31 c0 e8 a3 9c 00>
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:14 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" ex>
Jan 15 20:51:17 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:18 pc.example.com kernel: nsm_monitor: 29 callbacks suppressed
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:22 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:23 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:27 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:28 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:32 pc.example.com systemd[1]: Starting Preprocess NFS configuration convertion...
Jan 15 20:51:32 pc.example.com systemd[1]: nfs-convert.service: Succeeded.
Jan 15 20:51:32 pc.example.com systemd[1]: Started Preprocess NFS configuration convertion.
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd">
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd" >
Jan 15 20:51:32 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Version 2.4.2 starting
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Flags: TI-RPC
Jan 15 20:51:32 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" e>
Jan 15 20:51:32 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:32 pc.example.com audit[819482]: ANOM_ABEND auid=4294967295 uid=29 gid=29 ses=4294967295 subj=system_u:system_r:rpcd_t:s0 pid=819482 comm="rpc.statd" exe=>
Jan 15 20:51:32 pc.example.com kernel: rpc.statd[819482]: segfault at 10 ip 000056337c5cd9dd sp 00007ffd98b439a0 error 6 in rpc.statd[56337c5cb000+f000]
Jan 15 20:51:32 pc.example.com kernel: Code: 99 e6 ff ff 48 8d 3d 47 ca 00 00 31 c0 e8 8b 9f 00 00 e9 87 fe ff ff 48 8d 35 0f c8 00 00 bf 01 00 00 00 31 c0 e8 a3 9c 00>
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" ex>
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:33 pc.example.com kernel: nsm_monitor: 36 callbacks suppressed
Jan 15 20:51:33 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:33 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:37 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:38 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:38 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:38 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:42 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:43 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:43 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:43 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:47 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:48 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:48 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:48 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:52 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:53 pc.example.com kernel: nsm_monitor: 40 callbacks suppressed
Jan 15 20:51:53 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:57 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
> ABRT does not magically procure dumps on its own, but relies on systemd-coredump. If systemd does not put anything in the journal about the core dump, ABRT will not do anything.
Fair enough. I will move this issue to the systemd component then, yes?
(In reply to Brian J. Murrell from comment #6) > Fair enough. I will move this issue to the systemd component then, yes? Not yet. What does `ulimit -c` tell you? # ulimit -c unlimited But that's local to my shell/session isn't it? Does that really tell me the ulimit for the session that starts the rpc-statd.service? In any case, coredumpctl is listing other things that have segfaulted: # coredumpctl list TIME PID UID GID SIG COREFILE EXE Wed 2019-12-18 06:20:45 EST 4000718 1001 1001 11 missing /usr/bin/pidgin Wed 2019-12-18 06:20:47 EST 5268 1001 1001 11 missing /usr/bin/gnome-shell Mon 2019-12-23 12:09:07 EST 1783071 1001 1001 11 missing /usr/bin/scanimage Tue 2020-01-07 08:45:52 EST 3926164 0 0 11 none /usr/bin/strace Tue 2020-01-07 08:47:27 EST 3927072 0 0 11 none /usr/bin/strace Fri 2020-01-10 14:41:31 EST 980368 0 0 6 missing /usr/sbin/mount.exfat-fuse Tue 2020-01-14 06:57:03 EST 2471 1001 1001 11 present /usr/bin/gnome-shell Tue 2020-01-14 06:59:02 EST 2965 1001 1001 6 present /usr/libexec/tracker-miner-fs Tue 2020-01-14 06:59:03 EST 857576 1001 1001 11 present /usr/bin/pidgin Wed 2020-01-15 11:35:05 EST 22350 1001 1001 6 present /usr/bin/pidgin so it's generally functional I would say. Wouldn't you agree? (In reply to Brian J. Murrell from comment #8) > # ulimit -c > unlimited > > But that's local to my shell/session isn't it? Does that really tell me the > ulimit for the session that starts the rpc-statd.service? No, I suppose not. I tried glancing at the code, but nothing popped up. We can try reassigning to systemd, but I get the feeling that it’s only marginally related. Moved to systemd as the problem here seems to be that systemd is not catching the segfault from rpc.statd and producing a coredump for abrt to use to report. systemd is catching other segfaults: $ coredumpctl list TIME PID UID GID SIG COREFILE EXE Wed 2019-12-18 06:20:45 EST 4000718 1001 1001 11 missing /usr/bin/pidgin Wed 2019-12-18 06:20:47 EST 5268 1001 1001 11 missing /usr/bin/gnome-shell Mon 2019-12-23 12:09:07 EST 1783071 1001 1001 11 missing /usr/bin/scanimage Tue 2020-01-07 08:45:52 EST 3926164 0 0 11 none /usr/bin/strace Tue 2020-01-07 08:47:27 EST 3927072 0 0 11 none /usr/bin/strace Fri 2020-01-10 14:41:31 EST 980368 0 0 6 missing /usr/sbin/mount.exfat-fuse Tue 2020-01-14 06:57:03 EST 2471 1001 1001 11 missing /usr/bin/gnome-shell Tue 2020-01-14 06:59:02 EST 2965 1001 1001 6 missing /usr/libexec/tracker-miner-fs Tue 2020-01-14 06:59:03 EST 857576 1001 1001 11 missing /usr/bin/pidgin Wed 2020-01-15 11:35:05 EST 22350 1001 1001 6 present /usr/bin/pidgin Thu 2020-01-16 10:24:52 EST 621543 1001 1001 6 present /usr/bin/pidgin so it is generally working. It just doesn't seem to catch rpc.statd core dumps. $ rpm -qf $(which coredumpctl) systemd-243.5-1.fc31.x86_64 Any thoughts here from the systemd maintainers? I'm pretty sure this is something specific that rpc.statd does to prevent coredumps from happening. The coredumping facilities in systemd are really really simple. There is some special-casing for pid1 and for systemd-journald, but otherwise it just saves the coredump if possible and not disabled through the per-process configuration. In the previous cases where the coredump didn't happen, it was always either because the process installed resource limits to disable coredumps, or had a custom signal handler that intercepted SEGV (which doesn't seem to apply in this case, since the kernel reports the crash). I'll reassign this back to nfs-utils. @nfs-utils: Looks like systemd is pointing the finger back at you. What say you? I disabled systemd-coredump (by setting /proc/kernel/core_pattern to "/tmp/core"), and did 'kill -SEGV ...' on a few different processes and rpc.statd. They all dump core, except for rpc.statd. So this is indeed not related to systemd. But I don't understand what is causing this difference. ulimit shows that everything is set high, the signal is not blocked, but for some reason the kernel does not initiate a coredump. Excellent debugging effort @Zbigniew! But where do we take this from here? Maybe an strace of rpc.statd to see if it's doing anything unusual on the way to the SEGV? Attaching gdb to rpc.statd to run prctl(PR_GET_DUMPABLE) shows (gdb) p (int)prctl(3) 0 that the "dumpable" flag is unset. I guess it's probably related to the transition from root to rpcuser done internally, but I'm not sure about the details. Let's bite the bullet: https://github.com/systemd/systemd/pull/15327. It might not help in cases when the crash happens before sysctl has had a chance to run. But rcp.statd is started later, so it should be covered. Very nice! (In reply to Zbigniew Jędrzejewski-Szmek from comment #17) > Let's bite the bullet: https://github.com/systemd/systemd/pull/15327. > > It might not help in cases when the crash happens before sysctl has had a > chance > to run. But rcp.statd is started later, so it should be covered. Sorry for coming to this a bit late... From the above comment it it appears not dropping a core is not an nfs-utils problem? Yes, it's not nfs-utils specific. The change is pretty big, so it'll need to be pushed to rawhide first, with a new release of systemd. Built in rawhide. |