Bug 1790972 - Not creating report for rpc.stat startup segfault
Summary: Not creating report for rpc.stat startup segfault
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-14 15:53 UTC by Brian J. Murrell
Modified: 2020-07-09 08:50 UTC (History)
13 users (show)

Fixed In Version: systemd-246~rc1-1.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-09 08:50:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Brian J. Murrell 2020-01-14 15:53:59 UTC
Description of problem:
When my F31 system starts, rpc.statd segfaults per bug 1763384.

But there is no abrt report filed for that, so resolving bug 1763384 is going to be difficult.  Given that that only fails on startup and is not reproducible once the system is up, that makes it even more difficult to try to fix without an abrt report.

Version-Release number of selected component (if applicable):
abrt-2.13.0-1.fc31.x86_64

How reproducible:
100%

Steps to Reproduce:
1. You need to be reproducing bug 1763384 first
2. Reboot

Actual results:
rpc.statd segfaults but no abrt report is made.

Expected results:
abrt report should be created.

Additional info:

Comment 1 ekulik 2020-01-15 07:01:21 UTC
Anything in the journal?…

Comment 2 Brian J. Murrell 2020-01-16 12:11:52 UTC
Well, of course there is a ton of (unrelated) stuff in the journal.  Maybe let's start with this:

# journalctl -u rpc-statd.service
-- Reboot --
Jan 14 09:27:36 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 14 09:27:36 pc.example.com rpc.statd[91470]: Version 2.4.2 starting
Jan 14 09:27:36 pc.example.com rpc.statd[91470]: Flags: TI-RPC
Jan 14 09:27:36 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:48:24 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:48:24 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:13 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Version 2.4.2 starting
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Flags: TI-RPC
Jan 15 20:51:14 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:32 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Version 2.4.2 starting
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Flags: TI-RPC
Jan 15 20:51:32 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:52:25 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:52:25 pc.example.com rpc.statd[819612]: Version 2.4.2 starting
Jan 15 20:52:25 pc.example.com rpc.statd[819612]: Flags: TI-RPC
Jan 15 20:52:25 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..

Along with:

Dec 25 21:21:20 pc.example.com kernel: rpc.statd[8567]: segfault at 10 ip 000055f7a7c6c9cd sp 00007ffdd058e7a0 error 6 in rpc.statd[55f7a7c6a000+f000]
Jan 05 06:29:43 pc.example.com kernel: rpc.statd[831720]: segfault at 10 ip 00005641b335e9cd sp 00007ffcd097b910 error 6 in rpc.statd (deleted)[5641b335c000+f000]
Jan 06 21:03:53 pc.example.com kernel: rpc.statd[3732529]: segfault at 10 ip 0000559ecfc049dd sp 00007ffee8ab1790 error 6 in rpc.statd[559ecfc02000+f000]
Jan 06 21:04:06 pc.example.com kernel: rpc.statd[3732569]: segfault at 10 ip 0000563e28a8b9dd sp 00007ffce7907630 error 6 in rpc.statd[563e28a89000+f000]
Jan 06 22:21:22 pc.example.com kernel: rpc.statd[3737454]: segfault at 10 ip 000056490e96a9dd sp 00007fff381ded40 error 6 in rpc.statd[56490e968000+f000]
Jan 06 22:21:49 pc.example.com kernel: rpc.statd[3737469]: segfault at 10 ip 000055594397f9dd sp 00007ffdeecdbfc0 error 6 in rpc.statd[55594397d000+f000]
Jan 07 08:45:15 pc.example.com kernel: rpc.statd[3925915]: segfault at 10 ip 000055e0893609cd sp 00007ffcf44e1020 error 6 in rpc.statd[55e08935e000+f000]
Jan 08 06:52:30 pc.example.com kernel: rpc.statd[4184575]: segfault at 10 ip 0000557ff0f6e9cd sp 00007ffcea326050 error 6 in rpc.statd (deleted)[557ff0f6c000+f000]
Jan 08 11:16:23 pc.example.com kernel: rpc.statd[153364]: segfault at 10 ip 00005651281e49dd sp 00007ffef586fe00 error 6 in rpc.statd[5651281e2000+f000]
Jan 15 20:48:24 pc.example.com kernel: rpc.statd[91470]: segfault at 10 ip 000055f3122d59dd sp 00007ffe0835b1f0 error 6 in rpc.statd[55f3122d3000+f000]
Jan 15 20:51:14 pc.example.com kernel: rpc.statd[819459]: segfault at 10 ip 00005581e76a19dd sp 00007ffe70168d30 error 6 in rpc.statd[5581e769f000+f000]
Jan 15 20:51:32 pc.example.com kernel: rpc.statd[819482]: segfault at 10 ip 000056337c5cd9dd sp 00007ffd98b439a0 error 6 in rpc.statd[56337c5cb000+f000]


Happy to provide any more info as needed.

Something that I considered... is abrt even running when rpc-statd.service is started?

Comment 3 ekulik 2020-01-16 12:22:19 UTC
I’m not looking for logs from rpc.statd, but rather something related to ABRT or just a dump of it all for my perusal.
Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do.

Comment 4 Brian J. Murrell 2020-01-16 12:39:22 UTC
I'm not going to dump my whole journal.  That is way too big with everything that goes in there now, including the massive amounts of crap gnome-shell puts in it.

If there is a specific command or commands you like me to run to produce what you want out of the journal, I'm happy to do that.

> Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do.

Does abort use systemd-coredump?  I thought abrt "overrode" that and replaced systemd-coredump with it's own catcher/handler.

But yes, I see:

# cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

So yes, good question.  Does systemd-coredump even catch it?  Perhaps not.  Are you suggesting that this is a systemd issue then, not an abrt issue?

Comment 5 ekulik 2020-01-16 12:58:47 UTC
(In reply to Brian J. Murrell from comment #4)
> I'm not going to dump my whole journal.  That is way too big with everything
> that goes in there now, including the massive amounts of crap gnome-shell
> puts in it.

Perhaps I wasn’t clear. I meant “around the time of crash, so that I could decide what is relevant or not”.

> If there is a specific command or commands you like me to run to produce
> what you want out of the journal, I'm happy to do that.
> 
> > Does systemd-coredump even catch the segfault? If not, then there’s really nothing for us to do.
> 
> Does abort use systemd-coredump?  I thought abrt "overrode" that and
> replaced systemd-coredump with it's own catcher/handler.

Yes, for a rather long time now.

> But yes, I see:
> 
> # cat /proc/sys/kernel/core_pattern
> |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> 
> So yes, good question.  Does systemd-coredump even catch it?  Perhaps not. 
> Are you suggesting that this is a systemd issue then, not an abrt issue?

Again, I am not clairvoyant. Responding with a question does not help me help you.
Is there anything in the journal coming from systemd-coredump? Do you see anything in coredumpctl?

ABRT does not magically procure dumps on its own, but relies on systemd-coredump. If systemd does not put anything in the journal about the core dump, ABRT will not do anything.

Comment 6 Brian J. Murrell 2020-01-16 13:07:03 UTC
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com systemd[1]: Starting Preprocess NFS configuration convertion...
Jan 15 20:51:13 pc.example.com systemd[1]: nfs-convert.service: Succeeded.
Jan 15 20:51:13 pc.example.com systemd[1]: Started Preprocess NFS configuration convertion.
Jan 15 20:51:13 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd">
Jan 15 20:51:13 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd" >
Jan 15 20:51:13 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:13 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Version 2.4.2 starting
Jan 15 20:51:14 pc.example.com rpc.statd[819459]: Flags: TI-RPC
Jan 15 20:51:14 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:14 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" e>
Jan 15 20:51:14 pc.example.com audit[819459]: ANOM_ABEND auid=4294967295 uid=29 gid=29 ses=4294967295 subj=system_u:system_r:rpcd_t:s0 pid=819459 comm="rpc.statd" exe=>
Jan 15 20:51:14 pc.example.com kernel: rpc.statd[819459]: segfault at 10 ip 00005581e76a19dd sp 00007ffe70168d30 error 6 in rpc.statd[5581e769f000+f000]
Jan 15 20:51:14 pc.example.com kernel: Code: 99 e6 ff ff 48 8d 3d 47 ca 00 00 31 c0 e8 8b 9f 00 00 e9 87 fe ff ff 48 8d 35 0f c8 00 00 bf 01 00 00 00 31 c0 e8 a3 9c 00>
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:14 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:14 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" ex>
Jan 15 20:51:17 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:18 pc.example.com kernel: nsm_monitor: 29 callbacks suppressed
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:18 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:22 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:23 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:23 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:27 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:28 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:28 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:32 pc.example.com systemd[1]: Starting Preprocess NFS configuration convertion...
Jan 15 20:51:32 pc.example.com systemd[1]: nfs-convert.service: Succeeded.
Jan 15 20:51:32 pc.example.com systemd[1]: Started Preprocess NFS configuration convertion.
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd">
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nfs-convert comm="systemd" >
Jan 15 20:51:32 pc.example.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Version 2.4.2 starting
Jan 15 20:51:32 pc.example.com rpc.statd[819482]: Flags: TI-RPC
Jan 15 20:51:32 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" e>
Jan 15 20:51:32 pc.example.com systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Jan 15 20:51:32 pc.example.com audit[819482]: ANOM_ABEND auid=4294967295 uid=29 gid=29 ses=4294967295 subj=system_u:system_r:rpcd_t:s0 pid=819482 comm="rpc.statd" exe=>
Jan 15 20:51:32 pc.example.com kernel: rpc.statd[819482]: segfault at 10 ip 000056337c5cd9dd sp 00007ffd98b439a0 error 6 in rpc.statd[56337c5cb000+f000]
Jan 15 20:51:32 pc.example.com kernel: Code: 99 e6 ff ff 48 8d 3d 47 ca 00 00 31 c0 e8 8b 9f 00 00 e9 87 fe ff ff 48 8d 35 0f c8 00 00 bf 01 00 00 00 31 c0 e8 a3 9c 00>
Jan 15 20:51:32 pc.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd comm="systemd" ex>
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Main process exited, code=killed, status=11/SEGV
Jan 15 20:51:32 pc.example.com systemd[1]: rpc-statd.service: Failed with result 'signal'.
Jan 15 20:51:33 pc.example.com kernel: nsm_monitor: 36 callbacks suppressed
Jan 15 20:51:33 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:33 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:34 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:37 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:38 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:38 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:38 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:39 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:42 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:43 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:43 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:43 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:44 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:47 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:48 pc.example.com kernel: nsm_monitor: 39 callbacks suppressed
Jan 15 20:51:48 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:48 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:49 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:52 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL
Jan 15 20:51:53 pc.example.com kernel: nsm_monitor: 40 callbacks suppressed
Jan 15 20:51:53 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:54 pc.example.com kernel: lockd: cannot monitor pc.example.com
Jan 15 20:51:57 pc.example.com goa-daemon[12771]: secret_password_lookup_sync() returned NULL

> ABRT does not magically procure dumps on its own, but relies on systemd-coredump. If systemd does not put anything in the journal about the core dump, ABRT will not do anything.

Fair enough.  I will move this issue to the systemd component then, yes?

Comment 7 ekulik 2020-01-16 13:13:16 UTC
(In reply to Brian J. Murrell from comment #6)
> Fair enough.  I will move this issue to the systemd component then, yes?

Not yet.

What does `ulimit -c` tell you?

Comment 8 Brian J. Murrell 2020-01-16 13:23:03 UTC
# ulimit -c
unlimited

But that's local to my shell/session isn't it?  Does that really tell me the ulimit for the session that starts the rpc-statd.service?

In any case, coredumpctl is listing other things that have segfaulted:

# coredumpctl list
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2019-12-18 06:20:45 EST  4000718  1001  1001  11 missing   /usr/bin/pidgin
Wed 2019-12-18 06:20:47 EST    5268  1001  1001  11 missing   /usr/bin/gnome-shell
Mon 2019-12-23 12:09:07 EST  1783071  1001  1001  11 missing   /usr/bin/scanimage
Tue 2020-01-07 08:45:52 EST  3926164     0     0  11 none      /usr/bin/strace
Tue 2020-01-07 08:47:27 EST  3927072     0     0  11 none      /usr/bin/strace
Fri 2020-01-10 14:41:31 EST  980368     0     0   6 missing   /usr/sbin/mount.exfat-fuse
Tue 2020-01-14 06:57:03 EST    2471  1001  1001  11 present   /usr/bin/gnome-shell
Tue 2020-01-14 06:59:02 EST    2965  1001  1001   6 present   /usr/libexec/tracker-miner-fs
Tue 2020-01-14 06:59:03 EST  857576  1001  1001  11 present   /usr/bin/pidgin
Wed 2020-01-15 11:35:05 EST   22350  1001  1001   6 present   /usr/bin/pidgin

so it's generally functional I would say.  Wouldn't you agree?

Comment 9 ekulik 2020-01-16 14:07:10 UTC
(In reply to Brian J. Murrell from comment #8)
> # ulimit -c
> unlimited
> 
> But that's local to my shell/session isn't it?  Does that really tell me the
> ulimit for the session that starts the rpc-statd.service?

No, I suppose not. I tried glancing at the code, but nothing popped up.

We can try reassigning to systemd, but I get the feeling that it’s only marginally related.

Comment 10 Brian J. Murrell 2020-01-17 12:34:41 UTC
Moved to systemd as the problem here seems to be that systemd is not catching the segfault from rpc.statd and producing a coredump for abrt to use to report.

systemd is catching other segfaults:

$ coredumpctl list
TIME                            PID   UID   GID SIG COREFILE  EXE
Wed 2019-12-18 06:20:45 EST  4000718  1001  1001  11 missing   /usr/bin/pidgin
Wed 2019-12-18 06:20:47 EST    5268  1001  1001  11 missing   /usr/bin/gnome-shell
Mon 2019-12-23 12:09:07 EST  1783071  1001  1001  11 missing   /usr/bin/scanimage
Tue 2020-01-07 08:45:52 EST  3926164     0     0  11 none      /usr/bin/strace
Tue 2020-01-07 08:47:27 EST  3927072     0     0  11 none      /usr/bin/strace
Fri 2020-01-10 14:41:31 EST  980368     0     0   6 missing   /usr/sbin/mount.exfat-fuse
Tue 2020-01-14 06:57:03 EST    2471  1001  1001  11 missing   /usr/bin/gnome-shell
Tue 2020-01-14 06:59:02 EST    2965  1001  1001   6 missing   /usr/libexec/tracker-miner-fs
Tue 2020-01-14 06:59:03 EST  857576  1001  1001  11 missing   /usr/bin/pidgin
Wed 2020-01-15 11:35:05 EST   22350  1001  1001   6 present   /usr/bin/pidgin
Thu 2020-01-16 10:24:52 EST  621543  1001  1001   6 present   /usr/bin/pidgin

so it is generally working.  It just doesn't seem to catch rpc.statd core dumps.

$ rpm -qf $(which coredumpctl)
systemd-243.5-1.fc31.x86_64

Any thoughts here from the systemd maintainers?

Comment 11 Zbigniew Jędrzejewski-Szmek 2020-02-04 20:09:44 UTC
I'm pretty sure this is something specific that rpc.statd does to prevent coredumps from happening.
The coredumping facilities in systemd are really really simple. There is some special-casing
for pid1 and for systemd-journald, but otherwise it just saves the coredump if possible
and not disabled through the per-process configuration. In the previous cases where the coredump
didn't happen, it was always either because the process installed resource limits to disable coredumps,
or had a custom signal handler that intercepted SEGV (which doesn't seem to apply in this case, since
the kernel reports the crash).

I'll reassign this back to nfs-utils.

Comment 12 Brian J. Murrell 2020-04-03 03:10:08 UTC
@nfs-utils: Looks like systemd is pointing the finger back at you.

What say you?

Comment 13 Zbigniew Jędrzejewski-Szmek 2020-04-03 14:46:21 UTC
I disabled systemd-coredump (by setting /proc/kernel/core_pattern to "/tmp/core"), and did 'kill -SEGV ...' on
a few different processes and rpc.statd. They all dump core, except for rpc.statd.
So this is indeed not related to systemd. But I don't understand what is causing this
difference. ulimit shows that everything is set high, the signal is not blocked, but for
some reason the kernel does not initiate a coredump.

Comment 14 Brian J. Murrell 2020-04-03 14:55:41 UTC
Excellent debugging effort @Zbigniew!

But where do we take this from here?

Comment 15 J. Bruce Fields 2020-04-03 15:26:06 UTC
Maybe an strace of rpc.statd to see if it's doing anything unusual on the way to the SEGV?

Comment 16 Zbigniew Jędrzejewski-Szmek 2020-04-03 15:42:08 UTC
Attaching gdb to rpc.statd to run prctl(PR_GET_DUMPABLE) shows

(gdb) p (int)prctl(3)
0

that the "dumpable" flag is unset. I guess it's probably related to the transition
from root to rpcuser done internally, but I'm not sure about the details.

Comment 17 Zbigniew Jędrzejewski-Szmek 2020-04-03 20:25:33 UTC
Let's bite the bullet: https://github.com/systemd/systemd/pull/15327.

It might not help in cases when the crash happens before sysctl has had a chance
to run. But rcp.statd is started later, so it should be covered.

Comment 18 Brian J. Murrell 2020-04-03 20:33:13 UTC
Very nice!

Comment 19 Steve Dickson 2020-04-13 12:42:17 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #17)
> Let's bite the bullet: https://github.com/systemd/systemd/pull/15327.
> 
> It might not help in cases when the crash happens before sysctl has had a
> chance
> to run. But rcp.statd is started later, so it should be covered.

Sorry for coming to this a bit late... From the above comment it
it appears not dropping a core is not an nfs-utils problem?

Comment 20 Zbigniew Jędrzejewski-Szmek 2020-04-14 17:52:11 UTC
Yes, it's not nfs-utils specific.

Comment 21 Zbigniew Jędrzejewski-Szmek 2020-04-17 12:53:19 UTC
The change is pretty big, so it'll need to be pushed to rawhide first, with a new release of systemd.

Comment 22 Zbigniew Jędrzejewski-Szmek 2020-07-09 08:50:08 UTC
Built in rawhide.


Note You need to log in before you can comment on or make changes to this bug.