Description of problem: When ceph-* drops drops privileges via setuid, core dumps are no longer generated because its DUMPABLE flag is cleared. Version-Release number of selected component (if applicable): ceph-10.2.2-38.el7cp.x86_64 How reproducible: 100% Steps to Reproduce: This can cause EPERM errors on anything that calls PTRACE_ATTACH. set DefaultLimitCORE=infinity in /etc/systemd/system.conf $ sudo su - ceph $ strace -p 49596 strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted $ gstack 49596 $ $ gcore 49596 ptrace: Operation not permitted. You can't do that without a process to debug. The program is not being run. gcore: failed to create core.49596 $ sudo systemd-coredumpctl No coredumps found. After installing an rpm including Patrick's patch. $ strace -p 48740 Process 48740 attached futex(0x7f040334d9d0, FUTEX_WAIT, 48767, NULL^CProcess 48740 detached <detached ...> $ gstack 48740|tail -5 #0 0x00007f04158edef7 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f04179a16e0 in Thread::join(void**) () #2 0x00007f0417a7dd62 in DispatchQueue::wait() () #3 0x00007f0417995f2b in SimpleMessenger::wait() () #4 0x00007f04172b3d66 in main () $ gcore 48740 ... Saved corefile core.48740 $ sudo kill -SIGSEGV 48740 $ sudo systemd-coredumpctl list TIME PID UID GID SIG PRESENT EXE Wed 2016-10-26 23:00:54 EDT 48740 167 167 11 * /usr/bin/ceph-osd
(In reply to Brad Hubbard from comment #0) > > $ sudo systemd-coredumpctl > No coredumps found. Should read... $ sudo kill -SIGSEGV 49596 $ sudo systemd-coredumpctl No coredumps found.
https://github.com/ceph/ceph/pull/11736
Harald Klein came up with the following workaround to get around this. "The following steps should enable core dump functionality if you did not adjust any settings in that regard already: 1) backup /lib/systemd/system/ceph-osd@.service 2) edit /lib/systemd/system/ceph-osd@.service and add the following in the [Service] section: LimitCORE=infinity 3) adjust sysctl: # sysctl -w fs.suid_dumpable=2 # sysctl -w kernel.core_uses_pid=1 # sysctl -w kernel.core_pattern=/tmp/core-%e-sig%s-user%u-group%g-pid%p-time%t 4) do a systemctl daemon-reload 5) verify that max core file size is unlimited, e.g. for osd id 1 in my test env: # ps auxw | grep ceph-osd ceph 2420 0.9 1.6 1234124 407088 ? Ssl 07:37 0:03 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph # cat /proc/2420/limits | grep core Max core file size unlimited unlimited bytes 6) the current settings should lead to a coredump being written as in the following example when the osd process segfaults: # ls -ltr /tmp/core* -rw-------. 1 root ceph 1048887296 Dec 30 07:48 /tmp/core-ceph-osd-sig11-user167-group167-pid2714-time1483102135"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0514.html