Bug 1389159

Summary: RHCS 2 daemons can not dump core because PR_SET_DUMPABLE is set to 0 after setuid call
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Brad Hubbard <bhubbard>
Component: RADOSAssignee: Brad Hubbard <bhubbard>
Status: CLOSED ERRATA QA Contact: Vidushi Mishra <vimishra>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: bhubbard, ceph-eng-bugs, dzafman, hnallurv, kchai, kdreyer, vumrao
Target Milestone: rcKeywords: Regression
Target Release: 2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.5-7.el7cp Ubuntu: ceph_10.2.5-3redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:46:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brad Hubbard 2016-10-27 03:34:30 UTC
Description of problem:
When ceph-* drops drops privileges via setuid, core dumps are no longer
generated because its DUMPABLE flag is cleared.

Version-Release number of selected component (if applicable):
ceph-10.2.2-38.el7cp.x86_64

How reproducible:
100%

Steps to Reproduce:

This can cause EPERM errors on anything that calls PTRACE_ATTACH.

set DefaultLimitCORE=infinity in /etc/systemd/system.conf

$ sudo su - ceph

$ strace -p 49596
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

$ gstack 49596
$

$ gcore 49596
ptrace: Operation not permitted.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create core.49596

$ sudo systemd-coredumpctl
No coredumps found.


After installing an rpm including Patrick's patch.

$ strace -p 48740
Process 48740 attached
futex(0x7f040334d9d0, FUTEX_WAIT, 48767, NULL^CProcess 48740 detached
 <detached ...>

$ gstack 48740|tail -5
#0  0x00007f04158edef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f04179a16e0 in Thread::join(void**) ()
#2  0x00007f0417a7dd62 in DispatchQueue::wait() ()
#3  0x00007f0417995f2b in SimpleMessenger::wait() ()
#4  0x00007f04172b3d66 in main ()

$ gcore 48740
...
Saved corefile core.48740

$ sudo kill -SIGSEGV 48740
$ sudo systemd-coredumpctl list
TIME                            PID   UID   GID SIG PRESENT EXE
Wed 2016-10-26 23:00:54 EDT   48740   167   167  11 * /usr/bin/ceph-osd

Comment 1 Brad Hubbard 2016-10-27 03:38:33 UTC
(In reply to Brad Hubbard from comment #0)
> 
> $ sudo systemd-coredumpctl
> No coredumps found.

Should read...

$ sudo kill -SIGSEGV 49596

$ sudo systemd-coredumpctl
No coredumps found.

Comment 5 Brad Hubbard 2016-12-30 23:34:59 UTC
https://github.com/ceph/ceph/pull/11736

Comment 6 Brad Hubbard 2016-12-31 23:17:48 UTC
Harald Klein came up with the following workaround to get around this.

"The following steps should enable core dump functionality if you did not adjust any settings in that regard already:

1) backup /lib/systemd/system/ceph-osd@.service
2) edit /lib/systemd/system/ceph-osd@.service and add the following in the [Service] section:

LimitCORE=infinity

3) adjust sysctl:

# sysctl -w fs.suid_dumpable=2
# sysctl -w kernel.core_uses_pid=1
# sysctl -w kernel.core_pattern=/tmp/core-%e-sig%s-user%u-group%g-pid%p-time%t

4) do a systemctl daemon-reload
5) verify that max core file size is unlimited, e.g. for osd id 1 in my test env:

# ps auxw | grep ceph-osd
ceph      2420  0.9  1.6 1234124 407088 ?      Ssl  07:37   0:03 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
# cat /proc/2420/limits | grep core
Max core file size        unlimited            unlimited            bytes

6) the current settings should lead to a coredump being written as in the following example when the osd process segfaults:

# ls -ltr /tmp/core*
-rw-------. 1 root ceph 1048887296 Dec 30 07:48 /tmp/core-ceph-osd-sig11-user167-group167-pid2714-time1483102135"

Comment 16 errata-xmlrpc 2017-03-14 15:46:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html