Bug 1423417

Summary:	RGW daemons can not dump core because PR_SET_DUMPABLE is set to 0 after setuid call
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vidushi Mishra <vimishra>
Component:	RGW	Assignee:	Brad Hubbard <bhubbard>
Status:	CLOSED ERRATA	QA Contact:	Vidushi Mishra <vimishra>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.2	CC:	bhubbard, cbodley, ceph-eng-bugs, hnallurv, kbader, mbenjamin, owasserm, sweil, tserlin, vakulkar, vumrao
Target Milestone:	rc
Target Release:	2.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-10.2.5-32.el7cp Ubuntu: ceph_10.2.5-24redhat1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-14 15:49:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vidushi Mishra 2017-02-17 09:40:13 UTC

Description of problem:

As a ceph user, core dumps are not getting generated for "rgw-daemon" likely as the result of the below actions and clearing away of its DUMPABLE flag.


Version-Release number of selected component (if applicable):
ceph version 10.2.5-26.el7cp (99b2480b95cab20252d1e06445208d5787da4699)

How reproducible:
100%

Steps to Reproduce:

This can cause EPERM errors on anything that calls PTRACE_ATTACH.

set DefaultLimitCORE=infinity in /etc/systemd/system.conf

$ sudo su - ceph
$

$ ps aux | grep radosgw
ceph       46948  0.1  0.1 7579320 43300 ?       Ssl  Feb15   3:37 /usr/bin/radosgw -f --cluster ceph1 --name client.rgw.host --setuser ceph --setgroup ceph
ceph      462885  0.0  0.0 112652   960 pts/0    S+   09:38   0:00 grep --color=auto radosgw
$ 

$ strace -p 46948
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
$ 

$ gstack 46948
$

$ gcore 46948
ptrace: Operation not permitted.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create core.46948
$ 

Expected results:
Core should be generated and strace, gcore should work.
  

Additional info:
As a root, coredumps are getting generated and working as expected.

Comment 3 Brad Hubbard 2017-02-17 23:25:11 UTC

I am currently working on this. The workaround is to set the sysctl fs.suid_dumpable to 1 or 2 as documented in https://bugzilla.redhat.com/show_bug.cgi?id=1389159#c6 Matt and I have discussed a possible patch and I am testing it. I will then check whether master is still affected by this and post upstream accordingly.

Comment 4 Brad Hubbard 2017-02-20 09:23:20 UTC

At this stage I believe I've convinced myself that there is a problem here. The essence of it is that after doing the following it should be possible to send the rgw daemon SIGSEGV and get a core show up when doing a "sudo coredumpctl list".

sudo systemctl stop ceph-radosgw.target                                                                                                                                                                                                      
sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u %g s %t %e'                                                                                                                                                     
sudo sed -i -e 's/\[Service\]/\[Service\]\nLimitCORE=infinity/' /lib/systemd/system/ceph-radosgw\@.service                                                                                                                                   
sudo systemctl daemon-reload                                                                                                                                                                                                                 
sudo systemctl start ceph-radosgw.target

This does *not* generate a core with 10.2.5-26.el7cp (or earlier) however, with the addition of this patch to 10.2.5-26.el7cp it is possible to dump core (thanks for the position advice Matt).

http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?h=private-bhubbard-wip-rgw-set-dumpable-flag-after-setuid-1&id=aeef05a4dd013aab8f3c52950229ffd90db0f8b9

The next step is to work out how this affects upstream.

Comment 5 Brad Hubbard 2017-02-22 20:49:26 UTC

(In reply to Brad Hubbard from comment #4)                                  
> 
> sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u
> %g s %t %e'

Should be.

$ sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e'

Comment 6 Brad Hubbard 2017-02-23 00:42:41 UTC

The problem can be demonstrated with a systemtap script. With the patch for bz1389159 we see the following output for rgw (option=0x4 is PR_SET_DUMPABLE) when starting the daemon.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{print("
Ready\n")}'
Ready
1487806337276844415 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fbc237e0700 arg4=0x0 arg5=0x0
1487806337425109295 - sys_setuid: uid=0xa7
1487806337425127520 - sys_setuid: uid=0xa7
1487806337425135480 - sys_setuid: uid=0xa7
1487806337425143105 - sys_setuid: uid=0xa7
1487806337425149809 - sys_setuid: uid=0xa7
...
1487806337425421682 - sys_setuid: uid=0xa7
1487806337425430265 - sys_setuid: uid=0xa7
1487806337425437577 - sys_setuid: uid=0xa7
1487806337425444765 - sys_setuid: uid=0xa7
1487806337425451672 - sys_setuid: uid=0xa7
1487806337425458336 - sys_setuid: uid=0xa7
1487806337425464797 - sys_setuid: uid=0xa7
1487806337425471866 - sys_setuid: uid=0xa7
1487806337425478957 - sys_setuid: uid=0xa7
1487806337425485887 - sys_setuid: uid=0xa7
1487806337425492401 - sys_setuid: uid=0xa7
1487806337425499140 - sys_setuid: uid=0xa7
1487806337425505562 - sys_setuid: uid=0xa7
1487806337425512795 - sys_setuid: uid=0xa7
1487806337425520707 - sys_setuid: uid=0xa7
1487806337425533842 - sys_setuid: uid=0xa7

Whereas for the osd we see this.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{printReady\n")}'
Ready
1487806453801840303 - sys_setuid: uid=0xa7
1487806453801853293 - sys_setuid: uid=0xa7
1487806453801862885 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fe04598a010 arg4=0x0 arg5=0x0

The call to prctl needs to come after the last call to setuid of course. With the new patch we see the following.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{print
Ready\n")}'
Ready
1487810070544291689 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fb669a3c700 arg4=0x0 arg5=0x0
1487810070700573549 - sys_setuid: uid=0xa7
1487810070700590993 - sys_setuid: uid=0xa7
1487810070700600077 - sys_setuid: uid=0xa7
1487810070700609206 - sys_setuid: uid=0xa7
1487810070700617631 - sys_setuid: uid=0xa7
...
1487810070701084570 - sys_setuid: uid=0xa7
1487810070701092730 - sys_setuid: uid=0xa7
1487810070701100523 - sys_setuid: uid=0xa7
1487810070701108297 - sys_setuid: uid=0xa7
1487810070701118095 - sys_setuid: uid=0xa7
1487810070701135526 - sys_setuid: uid=0xa7
1487810070704708059 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x1 arg4=0x7ffe1a76ebb8 arg5=0x0

So that should resolve the issue and the osd results of course remain unchanged.

Comment 12 errata-xmlrpc 2017-03-14 15:49:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html