Bug 1341829 - Systemd-coredump doesn't save any core files
Systemd-coredump doesn't save any core files
Status: MODIFIED
Product: Fedora
Classification: Fedora
Component: selinux-policy-targeted (Show other bugs)
24
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Lukas Vrabec
Ben Levenson
: Triaged
: 1365435 (view as bug list)
Depends On: 1365435
Blocks: 1309172 1405995
  Show dependency treegraph
 
Reported: 2016-06-01 15:59 EDT by Göran Uddeborg
Modified: 2017-03-04 03:43 EST (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Audit log with dontaudit disabled (1.35 KB, text/x-vhdl)
2016-08-16 12:44 EDT, Göran Uddeborg
no flags Details
Audit log from reboot with dontaudit disabled (10.02 KB, application/x-xz)
2016-08-21 11:23 EDT, Göran Uddeborg
no flags Details
ausearch -m USER_AVC,AVC -ts recent (11.13 KB, text/plain)
2016-12-08 13:07 EST, Michael Catanzaro
no flags Details

  None (edit)
Description Göran Uddeborg 2016-06-01 15:59:08 EDT
Description of problem:
There are no core files saved any more with systemd-coredump for me.  When running it generates an SELinux alert, which I assumed was the reason, see bug 1317927.  But as Christophe Fergeau recently found out, even without SELinux in permissive mode, it doesn't work.


Version-Release number of selected component (if applicable):
systemd-229-7.fc24.x86_64


How reproducible:
Every time


Steps to Reproduce:
1. sudo setenforce Permissive
2. sleep 30
3. # hit ^\ to generate a SIGABRT
4. coredumpctl
5. coredumpctl 13842

Actual results:
The listing in 4 ends with

ons 2016-06-01 21:56:30 CEST  13842  1003  1003   3   /usr/bin/sleep

Number 5 prints
           PID: 13842 (sleep)
           UID: 1003 (göran)
           GID: 1003 (göran)
        Signal: 3 (QUIT)
     Timestamp: ons 2016-06-01 21:56:30 CEST (1min 29s ago)
  Command Line: sleep 30
    Executable: /usr/bin/sleep
 Control Group: /user.slice/user-1003.slice/session-1806.scope
          Unit: session-1806.scope
         Slice: user-1003.slice
       Session: 1806
     Owner UID: 1003 (göran)
       Boot ID: 6a333211d9d34218a6b92e299146ee7a
    Machine ID: 606ba17eef1ffa5a76fdb50047756efd
      Hostname: mimmi
       Message: Process 13842 (sleep) of user 1003 dumped core.

Cannot retrieve coredump from journal nor disk.
Failed to retrieve core: No such file or directory



Expected results:
Step 5 should start up gdb with a core file attached.
Comment 1 Michael Catanzaro 2016-06-04 09:44:41 EDT
You have to set ulimit now (and disable SELinux, bug #1317927) for systemd-coredump to work in F24. This really sucks; neither was required in F23.

I was hoping that having coredumpctl enabled by default could be a F25 feature thanks to recent integration work by the ABRT team, but looks like that requires either setting ulimit systemwide (probably preferable) or reverting the change to respect ulimit.
Comment 2 Jakub Filak 2016-06-23 03:38:15 EDT
(In reply to Michael Catanzaro from comment #1)
> requires either setting ulimit systemwide (probably preferable)

Starting with systemd-229, 'ulimit -c' (RLIMIT_CORE) is "unlimited" for all process by default [1][2][3] (bug #1309172).


1: https://github.com/systemd/systemd/blob/master/src/core/main.c#L1500
2: https://github.com/systemd/systemd/blob/master/NEWS (CHANGES WITH 229)
3: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/HQ4JFTYLPT5GRW6AD4M2MWGMRAPE7ITN/
Comment 3 Christophe Fergeau 2016-06-23 05:04:41 EDT
Hmm, this seems not to be the case on my system, not sure why
/proc/1/limits shows
Max core file size        unlimited            unlimited            bytes 

but then most other processes have it set to 0

This is a system I've been upgrading from f21 or so, so maybe this is related to a change which was not properly done on upgrade? (don't think I have a clean f24 install around)
Comment 4 Christophe Fergeau 2016-06-23 05:20:20 EDT
(In reply to Jakub Filak from comment #2)
> (In reply to Michael Catanzaro from comment #1)
> > requires either setting ulimit systemwide (probably preferable)
> 
> Starting with systemd-229, 'ulimit -c' (RLIMIT_CORE) is "unlimited" for all
> process by default [1][2][3] (bug #1309172).
> 

On a fresh f24 install:

core file size          (blocks, -c) 0
Comment 5 Zbigniew Jędrzejewski-Szmek 2016-08-07 18:25:52 EDT
Indeed, it seems that systemd is screwing up somehow.

On a fresh installation of F24, "Max core file size" is unlimited for PID 1, but for various daemons it's set to 0.

This doesn't match what systemd thinks should be set for the service:
$ systemctl show -p LimitCORE avahi-daemon
LimitCORE=18446744073709551615

Locally I'm running F24 with systemd from git, and the limits are infinity as expected. But I don't see any relevant changes in code between v229 and my version.

I'm stumped as to what the cause of this discrepancy is, so I've reported the bug upstream.
Comment 6 Zbigniew Jędrzejewski-Szmek 2016-08-07 19:56:14 EDT
Oh, it seems to be selinux related... With "permissive", most daemons and my user session has no limit, while with "enforcing", most daemons and the user session all have 0.

But /var/log/audit/audit.log does not seem to contain any useful data. Maybe some don't audit rule? I'll reassign this to selinux.

--

short reproducer:
$ grep core /proc/$(pidof systemd-journald)/limits

This returns "Max core file size        unlimited            unlimited            bytes" under F24/targeted/permissive, and "Max core file size 0 0 bytes" with F24/targeted/enforcing.
Comment 7 Michael Catanzaro 2016-08-07 21:16:42 EDT
Hi Zbigniew, is this a different bug from bug #1317927?
Comment 8 Zbigniew Jędrzejewski-Szmek 2016-08-07 21:40:21 EDT
#1317927 is long and messy, but it seems that it's a separate issue: ProtectSystem=full and denying mounton which is used to implement it (https://bugzilla.redhat.com/show_bug.cgi?id=1317927#c18).

The effect of both is very similar (no core file), but it seems that there are two underlying causes.
Comment 9 Lennart Poettering 2016-08-14 06:38:36 EDT
Hmm, iirc selinux policy can actually affect rlimit setting. and I think the policy prohibits this for PID1 atm.

Note that PID 1 in systemd will bump RLIMIT_CORE to infinity early on, but it ignores failures on this. All services started during runtime simply inherit this then. If the bumping fails nothing will be inherited.

it would hence good to know if selinux permissive vs. enforcing has an effect on RLIMIT_CORE for PID 1 itself. Thi sis how it should look like:

$ grep core /proc/1/limits 
Max core file size        unlimited            unlimited            bytes     

This is on a permissive system. Question is, does it look like that on enforcing too? If not, then i figure all that's missing is an selinux policy change to permit PID 1 to bump RLIMIT_CORE for itself.

(And of course, it might make sense to change systemd to log at debug level if bumping fails, instead of being entirely quiet about it)
Comment 10 Michael Catanzaro 2016-08-14 17:50:03 EDT
(In reply to Lennart Poettering from comment #9)
> This is on a permissive system. Question is, does it look like that on
> enforcing too?

Unfortunately that is how it looks on my enforcing system.
Comment 11 Göran Uddeborg 2016-08-16 05:46:51 EDT
Indeed.  After a fresh reboot with selinux-policy-targeted-3.13.1-191.10.fc24 (the version supposed to fix bug #1317927?) I get the output below.  Retrying the test case, I still get the crash listed with "coredumpctl list", but "coredumpctl gdb <pid>" fails the same way as in comment 0.  (I guess everyone realized I meant "coredumpctl gdb <pid>" in step 5.)

mimmi$ sudo grep core /proc/1/limits
Max core file size        unlimited            unlimited            bytes     
mimmi$ sudo grep core /proc/`pidof systemd-journald`/limits
Max core file size        0                    unlimited            bytes
Comment 12 Lukas Vrabec 2016-08-16 11:25:28 EDT
Same issue here. 

Goran, 
Could you set selinux to permissve and also provide:
# semodule -DB

And then reproduce the issue? 
Could you see any AVCs then? 

THank you.
Comment 13 Göran Uddeborg 2016-08-16 12:44 EDT
Created attachment 1191322 [details]
Audit log with dontaudit disabled

No problem, I attach attach the audit log during the experiment.  There are three AVC:s.  Two of them I recognize as ones I usually see when I turn off dontaudit.  I'm less sure about the socket read/write attempt.  Could that be a clue?
Comment 14 Göran Uddeborg 2016-08-16 17:03:41 EDT
Sorry, my bad!  I was redoing my initial experiment with dontaudits disabled.  But that of course got hit by my login session having core size limit set to 0.  I guess you meant I should reboot the machine with dontaudits disabled, and hand you THAT list.  I'll get back to that, but I'll have to find a "service window" when I don't disturb.
Comment 15 Göran Uddeborg 2016-08-21 11:23 EDT
Created attachment 1192584 [details]
Audit log from reboot with dontaudit disabled

Ok, so here is a new try.  I attach the audit log file, starting at the boot with dontaudit rules disabled.
Comment 16 Göran Uddeborg 2016-08-21 11:29:11 EDT
Looking a bit, I started to think about all those "rlimitinh" denials.  That is something which seems to be dontaudited everywhere.  Wouldn't that have exactly this effect?  According to http://seedit.sourceforge.net/doc/access_vectors/ "If this is denied, signal state is cleared".  I'm not sure exactly what "cleared" means in this case.  But it sounds suspicious to me.
Comment 17 Göran Uddeborg 2016-08-22 04:20:53 EDT
I make many mistakes in this report. :-(  The quote should be "If this is denied, rlimit is cleared".
Comment 19 Lukas Vrabec 2016-09-15 11:31:44 EDT
Hi Göran, 

Could you test it with local policy? 

$ cat local.cil 
(allow init_t systemd_coredump_t(process (noatsecure rlimitinh)))

# semodule -i local.cil

And reproduce your issue.


Thanks.
Comment 20 Göran Uddeborg 2016-09-19 10:32:17 EDT
"cil", that was something new to me!

Anyway, I installed the module and rebooted.  I couldn't see any change.

I also wonder how it COULD have helped.  If I understand the module correctly, it will only affect the systemd-coredump process itself.  Maybe this report has become a bit confused.  So maybe it's appropriate to clarify MY understanding of the situation.

The problem is that by default, when a process gets a signal that normally would generate a core, no core is generated and collected by systemd-coredump.

The reason seems to be that the ulimit for core files is set to 0, again by default.

If I explicitly change the ulimit to unlimited in a shell, and retry the experiment, I DO get a core file saved.

According to comment 2, the intention is for the core ulimit to be unlimited.  The fact it isn't seems to be the problem.

My guess in comment 16 and comment 17 was that this could be because SELinux mostly deny the rlimitinh access.  These denials don't show up in the log since they are dontaudit:ed.  But they still foil systemd's attempt to allow core dumps in general.
Comment 21 Michael Catanzaro 2016-10-13 12:48:04 EDT
Since I consider coredumpctl to be a priority feature for Fedora Workstation, I am planning to propose disabling SELinux by default in Workstation until this can be fixed.
Comment 22 Zbigniew Jędrzejewski-Szmek 2016-10-18 13:27:27 EDT
Tested with a fresh copy of Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso:
- booting in the default configuration:
  ulimit -c is 0 in liveuser's gnome-terminal
  ulimit -c is 0 in xterm started from alt-f2
- booting with enforcing=0 on the kernel command line:
  ulimit -c is unlimited in gnome-terminal
  ulimit -c is unlimited in xterm started from alt-f2

If I raise ulimit I get a successful core dump:
[liveuser@localhost ~]$ ulimit -c unlimited
[liveuser@localhost ~]$ ulimit -c
unlimited
[liveuser@localhost ~]$ sudo systemctl stop abrt*
[liveuser@localhost ~]$ bash -c 'kill -SEGV $$'
Segmentation fault (core dumped)
[liveuser@localhost ~]$ coredumpctl 
TIME                            PID   UID   GID SIG PRESENT EXE
Tue 2016-10-18 13:26:17 EDT    2549  1000  1000  11 * /usr/bin/bash
[liveuser@localhost ~]$ coredumpctl gdb
(works)

So it issue seems to boil down to selinux rules.
Comment 23 Jakub Filak 2016-10-19 03:50:16 EDT
(In reply to Zbigniew Jędrzejewski-Szmek from comment #22)
> Tested with a fresh copy of Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso:
> - booting in the default configuration:
>   ulimit -c is 0 in liveuser's gnome-terminal
>   ulimit -c is 0 in xterm started from alt-f2
> - booting with enforcing=0 on the kernel command line:
>   ulimit -c is unlimited in gnome-terminal
>   ulimit -c is unlimited in xterm started from alt-f2

That means that SELinux is preventing systemd from updating its RLIMIT_CORE:
https://github.com/systemd/systemd/blob/master/src/core/main.c#L1516
  setrlimit(RLIMIT_CORE, &RLIMIT_MAKE_CONST(RLIM_INFINITY))

The issue was introduced in Fedora 24 because when systemd-229 was released and the default RLIMIT_CORE was changed to UNLIMITED (and because ABRT maintainers didn't know about this major change) ABRT has started laying core files all around file system. That proves that SELinux wasn't preventing systemd to update RLIMIT_CORE at that time.
Comment 24 Göran Uddeborg 2016-10-19 17:18:57 EDT
In reply to Jakub Filak from comment #23)
> That means that SELinux is preventing systemd from updating its RLIMIT_CORE
You don't think, as I suspected in comment #16, that it allows SETTING of the limit, but prevents children from INHERITING the new value?
Comment 25 Zbigniew Jędrzejewski-Szmek 2016-10-19 21:25:02 EDT
That seems to be happening. PID 1 has core=unlimited rlimit, but various child processes have core=0. The default for PID 1 seems to be core=0, that's what I see if I boot with init=/bin/bash, and strace reveals no setrlimit calls from bash. So it seems systemd successfully sets rlimit core=unlimited for itself, but this is not inherited as expected.
Comment 26 Paul W. Frields 2016-11-07 10:58:49 EST
(In reply to Michael Catanzaro from comment #21)
> Since I consider coredumpctl to be a priority feature for Fedora
> Workstation, I am planning to propose disabling SELinux by default in
> Workstation until this can be fixed.

Just noting I recall that SELinux enablement is a Fedora shipping default the Council previously stated was not variable per addition.  Which means we need to figure out how to resolve this issue between SELinux, systemd, and any other required developer teams.
Comment 27 Paul W. Frields 2016-11-07 10:59:15 EST
Sorry, *edition.
Comment 28 Jan Kurik 2016-11-30 11:13:05 EST
On the Evaluation meeting for Prioritized bugs we have agreed not to approve this bug for the "Priritized bugs list".
It seems likely that this will end up fixed as a dependency of other changes in Fedora Workstation, so we don't think we need to call it out as requiring special attention.
Comment 29 Lukas Vrabec 2016-12-06 07:09:58 EST
Hi, 
Could somebody test it with the latest selinux-policy rpm package? 
http://koji.fedoraproject.org/koji/buildinfo?buildID=822892

I added some changes there.
Comment 30 Michael Catanzaro 2016-12-08 09:21:33 EST
It's still broken:

Dec 08 08:19:14 victory-road systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Dec 08 08:19:14 victory-road systemd[1]: Started Process Core Dump (PID 5730/UID 0).
Dec 08 08:19:14 victory-road audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-5730-0 comm="systemd" exe="/usr/lib/systemd/systemd" 
Dec 08 08:19:14 victory-road systemd-coredump[5737]: Core Dumping has been disabled for process 2403 (epiphany).
Dec 08 08:19:14 victory-road systemd-coredump[5737]: Process 2403 (epiphany) of user 1000 dumped core.

I guess ulimit is still 0.
Comment 31 Lukas Vrabec 2016-12-08 09:55:38 EST
Do you see any AVC denials? 
Please run:
# semodule -DB
# reproduce the scenario
# ausearch -m USER_AVC,AVC -ts recent 

Thanks.
Comment 32 Michael Catanzaro 2016-12-08 13:06:43 EST
Yeah I see a bunch of denials for systemd, then (ironically?) a couple for setroubleshootd itself, then these two:

time->Thu Dec  8 11:59:51 2016
type=AVC msg=audit(1481219991.533:361): avc:  denied  { rlimitinh } for  pid=14725 comm="systemd-coredum" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:systemd_coredump_t:s0 tclass=process permissive=0
----
time->Thu Dec  8 11:59:51 2016
type=AVC msg=audit(1481219991.533:362): avc:  denied  { noatsecure } for  pid=14725 comm="systemd-coredum" scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:systemd_coredump_t:s0 tclass=process permissive=0

This is with selinux-policy-3.13.1-225.1.fc25 from updates-testing.
Comment 33 Michael Catanzaro 2016-12-08 13:07 EST
Created attachment 1229583 [details]
ausearch -m USER_AVC,AVC -ts recent
Comment 34 Paul W. Frields 2017-01-16 09:22:18 EST
We were discussing this in the Workstation working group meeting -- the bug is still present.  Lukas, any progress here?
Comment 35 Lukas Vrabec 2017-01-17 09:40:20 EST
Paul, 

Working on fix right now. I'll provide more info ASAP.
Comment 36 Lukas Vrabec 2017-01-17 10:26:52 EST
Okay, 

I have fix for this issue.

Quick workaround: 
1. # cat domain.cil                                                 
(allow init_t domain (process (rlimitinh)))

2. semodule -i domain.cil


Testing: 

# getenforce
Enforcing

# grep core /proc/`pidof systemd-journald`/limits
Max core file size        unlimited            unlimited            bytes

# sleep 30

# hit ^\ to generate a SIGABRT

#  coredumpctl
Tue 2017-01-17 16:25:38 CET    1207     0     0   3 present  /usr/bin/sleep

Build will be available ASAP.
Comment 37 Michael Catanzaro 2017-02-16 10:34:16 EST
*** Bug 1365435 has been marked as a duplicate of this bug. ***
Comment 38 Michael Catanzaro 2017-02-22 14:56:08 EST
Hi Lukas, will an update be available for this soon?
Comment 39 Michael Catanzaro 2017-02-28 09:09:29 EST
(In reply to Michael Catanzaro from comment #38)
> Hi Lukas, will an update be available for this soon?

Hi Lukas, the change deadline for this is March 3. It's been a month and a half since you identified a fix for this issue; can you please release an update?
Comment 40 Michael Catanzaro 2017-02-28 09:10:54 EST
(In reply to Michael Catanzaro from comment #39)
> Hi Lukas, the change deadline for this is March 3. It's been a month and a
> half since you identified a fix for this issue; can you please release an
> update?

Er, actually the deadline is today. March 3 is the date of the FESCo review meeting.
Comment 41 Lukas Vrabec 2017-03-03 16:52:11 EST
Hi, 

# sesearch -A -s init_t -t domain -c process | grep rlimi
   allow init_t domain : process { sigchld sigkill sigstop signull signal getpgid getattr setrlimit rlimitinh } ;

# rpm -q selinux-policy 
selinux-policy-3.13.1-241.fc26.noarch

This issue is already fixed in F26.
Comment 42 Michael Catanzaro 2017-03-03 17:11:37 EST
So if this is already fixed, surely the bug should be closed...?
Comment 43 Paul W. Frields 2017-03-03 17:41:16 EST
Here's how bug workflow happens: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status

MODIFIED means there's a fix available that the developer believes can be tested.  If, in the Fedora case, the reporter or QA tests and verifies the fix, they can report that back (VERIFIED) and then the bug can be closed by the assignee.
Comment 44 Michael Catanzaro 2017-03-03 18:20:51 EST
If there's an F25 update available, I'm happy to test that. But I trust that it works, given that I've verified that running Lukas's semodule command in comment #36 fixed the issue for me locally.
Comment 45 Göran Uddeborg 2017-03-04 03:43:15 EST
@Michael, as a reporter I'm also happy to test an F25 update.  Testing the F26 update will have to wait a little, though.

When it comes to trust, I'm more of the kind "I believe it when I see it". :-)

Note You need to log in before you can comment on or make changes to this bug.